Projects per year
Abstract
A rich video data representation can be realized by
means of spatio-temporal frequency analysis. In this research
study we show that a video can be disentangled, following
the learning of video characteristics according to their spatiotemporal
properties, into two complementary information components,
dubbed Busy and Quiet. The Busy information characterizes
the boundaries of moving regions, moving objects, or
regions of change in movement. Meanwhile, the Quiet information
encodes global smooth spatio-temporal structures defined by
substantial redundancy. We design a trainable Motion Band-Pass
Module (MBPM) for separating Busy and Quiet-defined information,
in raw video data. We model a Busy-Quiet Net (BQN) by
embedding the MBPM into a two-pathway CNN architecture.
The efficiency of BQN is determined by avoiding redundancy
in the feature spaces defined by the two pathways. While
one pathway processes the Busy features, the other processes
Quiet features at lower spatio-temporal resolutions reducing both
memory and computational costs. Through experiments we show
that the proposed MBPM can be used as a plug-in module in
various CNN backbone architectures, significantly boosting their
performance. The proposed BQN is shown to outperform many
recent video models on Something-Something V1, Kinetics400,
UCF101 and HMDB51 datasets. The code for the implementation
is available1
means of spatio-temporal frequency analysis. In this research
study we show that a video can be disentangled, following
the learning of video characteristics according to their spatiotemporal
properties, into two complementary information components,
dubbed Busy and Quiet. The Busy information characterizes
the boundaries of moving regions, moving objects, or
regions of change in movement. Meanwhile, the Quiet information
encodes global smooth spatio-temporal structures defined by
substantial redundancy. We design a trainable Motion Band-Pass
Module (MBPM) for separating Busy and Quiet-defined information,
in raw video data. We model a Busy-Quiet Net (BQN) by
embedding the MBPM into a two-pathway CNN architecture.
The efficiency of BQN is determined by avoiding redundancy
in the feature spaces defined by the two pathways. While
one pathway processes the Busy features, the other processes
Quiet features at lower spatio-temporal resolutions reducing both
memory and computational costs. Through experiments we show
that the proposed MBPM can be used as a plug-in module in
various CNN backbone architectures, significantly boosting their
performance. The proposed BQN is shown to outperform many
recent video models on Something-Something V1, Kinetics400,
UCF101 and HMDB51 datasets. The code for the implementation
is available1
Original language | English |
---|---|
Pages (from-to) | 4966-4979 |
Number of pages | 14 |
Journal | IEEE Transactions on Image Processing |
Volume | 31 |
DOIs | |
Publication status | Published - 19 Jul 2022 |
Bibliographical note
© 2022 IEEE. This is an author-produced version of the published paper. Uploaded in accordance with the publisher’s self-archiving policy. Further copying may not be permitted; contact the publisher for detailsProjects
- 1 Active
-
Cooperative Underwater Surveillance Networks (COUSIN)
Mitchell, P. D., Bors, A. G. & Zakharov, Y.
1/03/21 → 29/02/24
Project: Research project (funded) › Research