論文分享--A Strong Baseline for Re-ID

分享最近讀到的一篇論文Bag of Tricks and A Strong Baseline for Deep Person Re-identification，這篇文章對person reid問題中的訓練技巧進行了一個很好的總結，並提出了一個性能強大的baseline，在Market1501數據集上實現了rank-1 = 94.5%的準確率。

論文鏈接：https://arxiv.org/pdf/1903.07071.pdf

代碼鏈接：https://github.com/michuanhaohao/reid-strong-baseline

作者：Hao Luo1?, Youzhi Gu1?, Xingyu Liao2?, Shenqi Lai3, Wei Jiang1 1 Zhejiang University, 2 Chinese Academy of Sciences, 3 Xi』an Jiaotong University

Motivation

[1] We surveyed many works published on top conferences and found most of them were expanded on poor baselines.

對person re-id領域目前性能最好的方法進行了調研，發現大多數方法的baseline都比較低。

[2] For the academia, we hope to provide a strong baseline for researchers to achieve higher accuracies in person ReID.

對學術界，希望能夠提供一個強大的baseline。

[3] For the community, we hope to give reviewers some references that what tricks will affect the performance of the ReID model. We suggest that when comparing the performance of the different methods, reviewers need to take these tricks into account.

對學術圈，希望讓reviewer了解到trick的重要性。

[4] For the industry, we hope to provide some effective tricks to acquire better models without too much extra consumption.

對工業界，希望能夠提供一個簡單而有效的模型。

Contribution

[1] We collect some effective training tricks for person ReID. Among them, we design a new neck structure named as BNNeck. In addition, we evaluate the improvements from each trick on two widely used datasets.

本文總結了person re-id任務的一些訓練技巧。同時提出了一個結構，BNNeck。

[2] We provide a strong ReID baseline, which achieves 94.5% and 85.9% mAP on Market1501. It is worth mentioned that the results are obtained with global features provided by ResNet50 backbone. To our best knowledge, it is the best performance acquired by global features in person ReID.

提出了一個強大的baseline並在Market1501上實現rank-1 = 94.5%， mAP = 85.9%。

[3] As a supplement, we evaluate the influences of the image size and the number of batch size on the performance of ReID models.

進行實驗探究了圖片尺寸和batch size大小對性能的影響。

Standard Baseline

[1] We initialize the ResNet50 with pre-trained parameters on ImageNet and change the dimension of the fully connected layer to N. N denotes the number of identities in the training dataset. 採用ImageNet上預訓練過的ResNet50作為backbone。

[2] We randomly sample P identities and K images of per person to constitute a training batch. Finally the batch size equals to B = P×K. In this paper, we set P = 16 and K = 4.

為了使用triplet loss，每個batch中包括16個人，每個人4張圖。

[3] We resize each image into 256 × 128 pixels and pad the resized image 10 pixels with zero values. Then randomly crop it into a 256 × 128 rectangular image.

圖片預處理中採用了resize和random crop。

[4] Each image is flipped horizontally with 0.5 probability.

圖片預處理中還採用了隨機水平翻轉。

[5] Each image is decoded into 32-bit floating point raw pixel values in [0, 1]. Then we normalize RGB channels by subtracting 0.485, 0.456, 0.406 and dividing by 0.229, 0.224, 0.225, respectively.

圖片預處理中採用了歸一化，使像素值分布滿足均值為0，方差為1。

[6] The model outputs ReID features f and ID prediction logits p.

模型的輸出包括特徵f和預測ID概率p。

[7] ReID features f is used to calculate triplet loss. ID prediction logits p is used to calculated cross entropy loss. The margin m of triplet loss is set to be 0.3.

模型的輸出f用於計算triplet loss，p用於計算交叉熵損失。

[8] Adam method is adopted to optimize the model. The initial learning rate is set to be 0.00035 and is decreased by 0.1 at the 40th epoch and 70th epoch respectively. Totally there are 120 training epochs.

優化器採用Adam，另一篇總結video based reid的文章[2]也使用了Adam作為優化器。