We present FoveaBox, an accurate, flexible and completely anchor-free framework for object detection. While almost all state-of-the-art object detectors utilize the predefined anchors to enumerate possible locations, scales and aspect ratios for the search of the objects, their performance and generalization ability are also limited to the design of anchors. Instead, FoveaBox directly learns the object existing possibility and the bounding box coordinates without anchor reference. This is achieved by: (a) predicting category-sensitive semantic maps for the object existing possibility, and (b) producing category-agnostic bounding box for each position that potentially contains an object. The scales of target boxes are naturally associated with feature pyramid representations for each input image. Without bells and whistles, FoveaBox achieves state-ofthe-art single model performance of 42.1 AP on the standard COCO detection benchmark. Specially for the objects with arbitrary aspect ratios, FoveaBox brings in significant improvement compared to the anchor-based detectors. More surprisingly, when it is challenged by the stretched testing images, FoveaBox shows great robustness and generalization ability to the changed distribution of bounding box shapes. The code will be made publicly available.
anchor free设计上挺有意思的,带来了一些新的思考,正如作者提到的问题一样:is the anchor boxes scheme the optimal way to guide the search of the objects? 还是需要看哪种prior假设+手调参数更加符合数据的真实分布,比如有anchor,但设置得很不好,比如牙刷这种ratio相当不常见的物体,anchor相当于提供了一个不准确的初始化,这样还不如让网路自己学习角点到中心点的偏差;但如果一些物体本身中心会很难定义是哪一点,比如一个挡住一部分的桌子,让网路来选择中心加学习偏差可能又会很confused了,反而不如提供了一个好的anchor来的要好。(小孩子才做选择,成年人直接两个都要,都接上然后fuse?233~)
[1] Zhou, Xinyu, et al. "EAST: an efficient and accurate scene text detector." Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2017.
[2] Wang, Jiaqi, et al. "Region proposal by guided anchoring." arXiv preprint arXiv:1901.03278 (2019).