EMCA: EFFICIENT MULTISCALE CHANNEL ATTENTION MODULE FOR DEEP NEURAL NETWORKS
Eslam Mohamed Ali;
Abstract
Over the years, CNN architectures have developed many ideas to better deal with spatial image features. For instance, deeper architectures emerged that stack multiple convolution layers, known as backbone or encoder. As we go deeper in the network, the feature maps get smaller, while their content represents a broader region in the space, which puts us closer to better semantics of the image contents. Attention mechanisms have been explored with CNNs across the spatial and channel dimensions to improve the learned representations from an arbitrary CNN backbone. However, all the existing methods devote the attention modules to capturing local interactions from a uni-scale. This work tackles the following question: can one consolidate multi-scale aggregation while learning channel attention more efficiently? To this end, we avail channel-wise attention over multiple feature scales, which empirically shows its aptitude to replace the limited local and uni-scale attention modules. EMCA is lightweight and can efficiently model the global context further; it is easily integrated into any feed-forward CNN architectures and trained end-to-end. We validate our novel architecture through comprehensive experiments on image classification, object detection, and instance segmentation with different backbones. Our experiments show consistent gains in performance against their counterparts, where our proposed module, named EMCA, outperforms other channel attention techniques in accuracy and latency trade-off. We explore our module effect for the classification task on the ImageNet dataset by integrating it into three different variants of ResNet architectures, i.e., ResNet-18, ResNet-34, and ResNet-50. We also explore the three variants of our proposed module for each ResNet variant, i.e., EMCA-SE, EMCA-ECA, and EMCA- SRM. For ResNet-18, the EMCA-ECA variant achieves the best accuracy in terms of the Top-1 score, 71.04%, while the original ECA achieves 70.75%. While for ResNet-34, the EMCA-ECA variant achieves the best accuracy in terms of the Top-1 score, 74.46%, whereas the original ECA achieves 74.13%. Finally, regarding ResNet-50, the EMCA-SE variant achieves the best accuracy in the Top-1 score, 77.33%, while the original SE achieves 76.80%. Applying our EMCA module to downstream tasks such as detection and instance segmentation has boosted the accuracy significantly. For instance, by adapting the Faster R-CNN detector, the accuracy, in terms of average precision (AP), increased from 37.7% to 38.1%. Using Mask R-CNN as base instance segmentation architecture and enhancing it by integrating our EMCA module has paid off, as the accuracy increased from 35.4% to 35.7% despite using fewer parameters and achieving better performance
w.r.t the runtime. We also conduct experiments that probe the robustness of the learned
representations. The conducted experiments have been done a wide cross-range of different datasets recorded via various sensors and tackle other tasks. Our code is publicly available on: https://github.com/eslambakr/EMCA
w.r.t the runtime. We also conduct experiments that probe the robustness of the learned
representations. The conducted experiments have been done a wide cross-range of different datasets recorded via various sensors and tackle other tasks. Our code is publicly available on: https://github.com/eslambakr/EMCA
Other data
| Title | EMCA: EFFICIENT MULTISCALE CHANNEL ATTENTION MODULE FOR DEEP NEURAL NETWORKS | Other Titles | وحدات انتباه دقيقة متعددة الخصائص مدمجه فى معالجات الصور الرقمية عن طريق تعليم االله العميق | Authors | Eslam Mohamed Ali | Issue Date | 2022 |
Attached Files
| File | Size | Format | |
|---|---|---|---|
| BB13749.pdf | 1.66 MB | Adobe PDF | View/Open |
Similar Items from Core Recommender Database
Items in Ain Shams Scholar are protected by copyright, with all rights reserved, unless otherwise indicated.