Where to Focus: Investigating Hierarchical Attention Relationship for Fine-Grained Visual Classification

Yang Liu, Lei Zhou, Pencheng Zhang, Bai Xiao, Lin Gu, Xiaohan Yu, Jun Zhou, Edwin R Hancock

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Object categories are often grouped into a multi-granularity taxonomic hierarchy. Classifying objects at coarser-grained hierarchy requires global and common characteristics, while finer-grained hierarchy classification relies on local and discriminative features. Therefore, humans should also subconsciously focus on different object regions when classifying different hierarchies. This granularity-wise attention is confirmed by our collected human real-time gaze data on different hierarchy classifications. To leverage this mechanism, we propose a Cross-Hierarchical Region Feature (CHRF) learning framework. Specifically, we first design a region feature mining module that imitates humans to learn different granularity-wise attention regions with multi-grained classification tasks. To explore how human attention shifts from one hierarchy to another, we further present a cross-hierarchical orthogonal fusion module to enhance the region feature representation by blending the original feature and an orthogonal component extracted from adjacent hierarchies. Experiments on five hierarchical fine-grained datasets demonstrate the effectiveness of CHRF compared with the state-of-the-art methods. Ablation study and visualization results also consistently verify the advantages of our human attention-oriented modules. The code and dataset are available at https://github.com/visiondom/CHRF.
Original languageEnglish
Title of host publicationProceedings ECCV 2022
EditorsShai Avidan, Gabriel Brostow, Moustapha Cissé, Giovanni Maria Farinella, Tal Hassner
Place of PublicationCham
PublisherSpringer
Pages57-73
Number of pages17
ISBN (Print)9783031200533
DOIs
Publication statusPublished - 6 Nov 2022

Publication series

NameLecture Notes in Computer Science (LNCS)
PublisherSpringer
Volume13684
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Bibliographical note

This is an author-produced version of the published paper. Uploaded in accordance with the publisher’s self-archiving policy. Further copying may not be permitted; contact the publisher for details

Cite this