Abstract
Convolutional Neural Networks (CNNs) model long-range dependencies by deeply stacking convolution operations with small window sizes, which makes the optimizations difficult. This paper presents region-based non-local (RNL) operations
as a family of self-attention mechanisms, which can directly capture long-range dependencies without using a deep stack of local operations. Given an intermediate feature map, our method recalibrates the feature at a position by aggregating the information from the neighboring regions of all positions. By combining a channel attention module with the proposed RNL, we design an attention chain, which can be integrated into the off-the-shelf CNNs for end-to-end training. We evaluate our
method on two video classification benchmarks. The experimental results of our method outperform other attention mechanisms, and we achieve state-of-the-art performance on the SomethingSomething V1 dataset.
as a family of self-attention mechanisms, which can directly capture long-range dependencies without using a deep stack of local operations. Given an intermediate feature map, our method recalibrates the feature at a position by aggregating the information from the neighboring regions of all positions. By combining a channel attention module with the proposed RNL, we design an attention chain, which can be integrated into the off-the-shelf CNNs for end-to-end training. We evaluate our
method on two video classification benchmarks. The experimental results of our method outperform other attention mechanisms, and we achieve state-of-the-art performance on the SomethingSomething V1 dataset.
Original language | English |
---|---|
Title of host publication | Proceedings of the International Conference on Pattern Recognition (ICPR) |
Place of Publication | Milan, Italy |
Publisher | IEEE |
Pages | 10010-10017 |
Number of pages | 8 |
ISBN (Electronic) | 9781728188089 |
ISBN (Print) | 9781728188096 |
DOIs | |
Publication status | Published - 5 May 2021 |