PoseNet，ICCV_2015

來自專欄學術論文筆記

PoseNet: A convolutional Network for real-time 6-DOF camera relocalization.

abstract: 訓練了一個神經網路可以從單張圖像端到端的回歸出相機的位置和姿態，有別於SLAM，不再需要額外的工程操作或者圖優化。室外精度達到2m，3°。室內精度達到0.5m，5°偏差。網路是23層，利用transfer learning from recognition to re-localization 在目標分類的網路上pre-train的模型。比依賴sift關鍵點提取，匹配的方法更robust。

Contribution:

1）利用transfer learning 任務從目標識別，遷移到re-localization。

2) 利用structure from motion根據圖像序列/視頻，自動生成訓練label（camera pose），減少了人類標註的工作。

3）避免傳統SLAM的pipeline: 比如需要存儲densely spaced keyframes, appearance-based localization, landmarked-based pose estimation, frame-to-frame feature correspondence.

Loss function:

作者實驗發現把位置和姿態分成兩個網路進行訓練的效果並不好，猜測是位置和姿態的耦合關係，所以還是要放在一起訓練。

網路結構：

GoogLeNet pre-trained for classification 改造成regression問題。

1) replace the three softmax classfiers with affine regressors.

2) 在最終輸出層之前插入了一個全連接層，before the regressor.用來當作local feature vector.