Region-based Non-local Operation for Video Classification

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Convolutional Neural Networks (CNNs) model long-range dependencies by deeply stacking convolution operations with small window sizes, which makes the optimizations difficult. This paper presents region-based non-local (RNL) operations
as a family of self-attention mechanisms, which can directly capture long-range dependencies without using a deep stack of local operations. Given an intermediate feature map, our method recalibrates the feature at a position by aggregating the information from the neighboring regions of all positions. By combining a channel attention module with the proposed RNL, we design an attention chain, which can be integrated into the off-the-shelf CNNs for end-to-end training. We evaluate our
method on two video classification benchmarks. The experimental results of our method outperform other attention mechanisms, and we achieve state-of-the-art performance on the SomethingSomething V1 dataset.
Original languageEnglish
Title of host publicationProceedings of the International Conference on Pattern Recognition (ICPR)
Place of PublicationMilan, Italy
PublisherIEEE
Pages10010-10017
Number of pages8
ISBN (Electronic)9781728188089
ISBN (Print)9781728188096
DOIs
Publication statusPublished - 5 May 2021

Bibliographical note

© IEEE 2020. This is an author-produced version of the published paper. Uploaded in accordance with the publisher’s self-archiving policy. Further copying may not be permitted; contact the publisher for details

Cite this