Deep learning based camera localization from a single image has been explored recently since these methods are computationally efficient. However, existing methods only provide general global representations, from which an accurate pose estimation can not be reliably derived. We claim that effective feature representations for accurate pose estimation shall be both "informative" (focusing on geometrically meaningful regions) and "discriminative" (accounting for different poses of similar images). Therefore, we propose a novel multi-layer factorized bilinear pooling module for feature aggregation. Specifically, informative features are selected via bilinear pooling, and discriminative features are highlighted via multi-layer fusion. We develop a new network for camera localization using the proposed feature pooling module. The effectiveness of our approach is demonstrated by experiments on an outdoor Cambridge Landmarks dataset and an indoor 7 Scenes dataset. The results show that focusing on discriminative features significantly improves the network performance of camera localization in most cases. Codes will be available soon.
|Number of pages||12|
|Publication status||Accepted/In press - 23 Jul 2019|
|Event||British Machine Vision Conference - Cardiff, Cardiff, United Kingdom|
Duration: 9 Sep 2019 → 12 Sep 2019
|Conference||British Machine Vision Conference|
|Abbreviated title||BMVC 2019|
|Period||9/09/19 → 12/09/19|