分享最近讀到的一篇論文Bag of Tricks and A Strong Baseline for Deep Person Re-identification,這篇文章對person reid問題中的訓練技巧進行了一個很好的總結,並提出了一個性能強大的baseline,在Market1501數據集上實現了rank-1 = 94.5%的準確率。

論文鏈接:arxiv.org/pdf/1903.0707

代碼鏈接:github.com/michuanhaoha

作者:Hao Luo1?, Youzhi Gu1?, Xingyu Liao2?, Shenqi Lai3, Wei Jiang1 1 Zhejiang University, 2 Chinese Academy of Sciences, 3 Xi』an Jiaotong University

Motivation

[1] We surveyed many works published on top conferences and found most of them were expanded on poor baselines.

對person re-id領域目前性能最好的方法進行了調研,發現大多數方法的baseline都比較低。

[2] For the academia, we hope to provide a strong baseline for researchers to achieve higher accuracies in person ReID.

對學術界,希望能夠提供一個強大的baseline。

[3] For the community, we hope to give reviewers some references that what tricks will affect the performance of the ReID model. We suggest that when comparing the performance of the different methods, reviewers need to take these tricks into account.

對學術圈,希望讓reviewer了解到trick的重要性。

[4] For the industry, we hope to provide some effective tricks to acquire better models without too much extra consumption.

對工業界,希望能夠提供一個簡單而有效的模型。

Contribution

[1] We collect some effective training tricks for person ReID. Among them, we design a new neck structure named as BNNeck. In addition, we evaluate the improvements from each trick on two widely used datasets.

本文總結了person re-id任務的一些訓練技巧。同時提出了一個結構,BNNeck。

[2] We provide a strong ReID baseline, which achieves 94.5% and 85.9% mAP on Market1501. It is worth mentioned that the results are obtained with global features provided by ResNet50 backbone. To our best knowledge, it is the best performance acquired by global features in person ReID.

提出了一個強大的baseline並在Market1501上實現rank-1 = 94.5%, mAP = 85.9%。

[3] As a supplement, we evaluate the influences of the image size and the number of batch size on the performance of ReID models.

進行實驗探究了圖片尺寸和batch size大小對性能的影響。

Standard Baseline

[1] We initialize the ResNet50 with pre-trained parameters on ImageNet and change the dimension of the fully connected layer to N. N denotes the number of identities in the training dataset. 採用ImageNet上預訓練過的ResNet50作為backbone。

[2] We randomly sample P identities and K images of per person to constitute a training batch. Finally the batch size equals to B = P×K. In this paper, we set P = 16 and K = 4.

為了使用triplet loss,每個batch中包括16個人,每個人4張圖。

[3] We resize each image into 256 × 128 pixels and pad the resized image 10 pixels with zero values. Then randomly crop it into a 256 × 128 rectangular image.

圖片預處理中採用了resize和random crop。

[4] Each image is flipped horizontally with 0.5 probability.

圖片預處理中還採用了隨機水平翻轉。

[5] Each image is decoded into 32-bit floating point raw pixel values in [0, 1]. Then we normalize RGB channels by subtracting 0.485, 0.456, 0.406 and dividing by 0.229, 0.224, 0.225, respectively.

圖片預處理中採用了歸一化,使像素值分布滿足均值為0,方差為1。

[6] The model outputs ReID features f and ID prediction logits p.

模型的輸出包括特徵f和預測ID概率p。

[7] ReID features f is used to calculate triplet loss. ID prediction logits p is used to calculated cross entropy loss. The margin m of triplet loss is set to be 0.3.

模型的輸出f用於計算triplet loss,p用於計算交叉熵損失。

[8] Adam method is adopted to optimize the model. The initial learning rate is set to be 0.00035 and is decreased by 0.1 at the 40th epoch and 70th epoch respectively. Totally there are 120 training epochs.

優化器採用Adam,另一篇總結video based reid的文章[2]也使用了Adam作為優化器。

Training Tricks

Warmup Learning Rate

如圖所示,就是前幾輪的學習率有一個逐漸增大的過程,之前在一些其他的文章[1]里也看到有提到這一方法。

Random Erasing Augmentation

Zhun Zhong等人在[3]中提出的數據增強手段,本文設置參數 p = 0.5, 0.02 <Se < 0.4, r1 = 0.3, r2 = 3.33。

Label Smoothing

由[4]提出,目的是使標籤更為平滑。本文將ε設置為0.1。

Last Stride

將ResNet50最後一個卷積層的步長由2改為1,從而增大輸出feature map的尺寸。這一做法增加的計算量極少且不會增加訓練參數,但對性能提升有明顯幫助。

BNNeck

大多數結合ID loss和triplet loss的方法都採用了上圖(a)所示的結構,兩個損失函數對同一特徵f進行約束。

作者則指出,前置研究發現,ID loss本質是在特徵空間中學習幾個超平面,將不同類別的特徵分配到不同子空間里,將特徵歸一化到超球面,再採用ID loss進行優化會取得更好的效果。Triplet loss則適合在自由的歐式空間中進行約束。

因此作者提出了BNNeck,如上圖(b)所示。BNNeck中triplet loss優化的特徵仍然是原先的特徵,即圖中的ft。ID loss優化的則是ft經一個BN層歸一化後生成的特徵fi,通過歸一化使得fi近似在超球面表面分布。

Center Loss

Triplet loss存在一個缺點,即只考慮樣本對之間的相對距離,沒有考慮到絕對距離。作者提出再增加Center loss,其數學形式如上所示,即使特徵ft與該類特徵的中心更為接近。

Experimental Results

Influences of Each Trick(Same domain)

各個trick帶來的性能提升。其中可以看到random erasing和BNNeck帶來提升較大,各有2-3%。Random erasing筆者在image reid上使用時也有明顯提升,但換到video reid上沒有效果。BNNeck值得嘗試。

Analysis of BNNeck

主要對加BNNeck後的ft、fi兩個特徵分別用歐式距離和餘弦距離給出了實驗結果。結果顯示兩個特徵分別在兩種metric下測出的四個性能相當,均相比於不加BNNeck有2個點左右的提升。

Influences of Each Trick (Cross domain)

same domain下可能會出現過擬合問題,性能提升的說服力有限,因此作者還進行了cross domain的實驗。

結果表明,warmup、label smooth、BNNeck對cross domain下性能提升有較明顯的幫助。Random erasing則影響了性能,去除之後性能更高,原因推測是過擬合了。

Comparison of State-of-the-Arts

和state-of-the-art相比,本文最終實現了rank 1 = 94.5%的性能,而且只用了全局特徵。

Influences of the Number of Batch Size

本文還對batch size的不同設置進行了測試。整體來看batch size越大性能越好,但是在達到64之後感覺提升不大。

Influences of Image Size

本文還對image size對性能的影響進行了實驗,最終結果顯示image size對最終性能基本無影響。我之前實驗的感受是,resize之後的尺寸小於原始尺寸時,增大size對性能提升有幫助,超過原始尺寸後繼續增大就沒什麼效果了。

Summary

總的來說,本文對Reid任務中的神經網路設計及訓練的trick作了很好的總結,提出的BNNeck值得嘗試。

參考文獻

[1] Bag of Tricks for Image Classification with Convolutional Neural Networks

[2] Revisiting Temporal Modeling for Video-based Person ReID

[3] Random Erasing Data Augmentation

[4] Rethinking the inception architecture for computer vision


文章作者也在知乎,這是作者本人的文章:

羅浩.ZJU:一個更加強力的ReID Baseline?

zhuanlan.zhihu.com圖標
推薦閱讀:

相关文章