如何評價 CVPR 2020的論文接收結果？有哪些亮點論文？

有點激動，投了11篇(整個小組大家辛辛苦苦攢了大半年的工作)。

list一放就去挨個查了一下中了7篇，感謝大家的努力，感謝同事們的支持！

我先把題目發一下:

GhostNet: More Features from Cheap Operations（超越Mobilenet v3的架構）
AdderNet: Do We Really Need Multiplications in Deep Learning? （加法神經網路）
Frequency Domain Compact 3D Convolutional Neural Networks （3dCNN壓縮）
A Semi-Supervised Assessor of Neural Architectures （神經網路精度預測器 NAS）
Hit-Detector: Hierarchical Trinity Architecture Search for Object Detection （NAS 檢測，這個我提一句，這個是backbone-neck-head一起搜索，三位一體哈哈）
CARS: Contunuous Evolution for Efficient Neural Architecture Search (連續進化的NAS，高效，具備可微和進化的多重優勢，且能輸出帕累託前研)
On Positive-Unlabeled Classification in GAN （PU+GAN，這個題目倒是很直觀哈哈）

再講一下得分情況，供大家參考（我就不一一對應了），11篇稿子分數分別為:

123，124，224，234，234，234，234，234，2334，144，334

可以看到大多數時候，我們的稿子都是borderline。接下來我們看他們的最終結果（2.27更新）：

122，224，234，222，224，234，224，444，2323，244，224

加粗的是中了的，神不神奇，意不意外？（上下對比可以看到分數的變動，有的全部改成2，也有全部改成4，ACs都給了詳細的意見。）334中了，224沒中，當然最後的分數我會在cmt系統更新後再刷新，期待會有oral~

那我們先自薦幾個值得關注的稿子：

GhostNet: More Features from Cheap Operations，我們利用了一個很巧妙的結構，搭建了超越了MobileNet v3的輕量級神經網路，論文地址：

https://arxiv.org/pdf/1911.11907?

arxiv.org

這個模型我們也放出來了，大家可以跑跑看，在ARM CPU上的表現是很驚人的：

https://github.com/iamhankai/ghostnet?

github.com

We beat other SOTA lightweight CNNs such as MobileNetV3 and FBNet.

2. AdderNet: Do We Really Need Multiplications in Deep Learning? 這個工作之前引起了大家的關注，這個確實挺好玩的，是我幾年前的一個idea，幾經輾轉找到了靠譜的實現方式和訓練手段，在大規模神經網路和數據集上取得了非常好的表現，這個論文是可以給大家先睹為快的：

https://arxiv.org/pdf/1912.13200?

arxiv.org

下面的鏈接是之前Reddit熱議的帖子，對我們未來的研究方向也有了很大的啟發。

https://www.reddit.com/r/MachineLearning/comments/ekw2s1/r_addernet_do_we_really_need_multiplications_in/?

www.reddit.com

那這個東西我知道大家可能更關心他的開源，因為這個需要審批，我們已經申請了一個多月了，一週之內就可以下來，大家可以關注另外一個帖子，等開源代碼放出來我就去回答（有小夥伴cue我很久了。。。）：

如何評價Reddit熱議的論文AdderNet？?

www.zhihu.com

開源代碼（3.16更新，Addernet喜提Oral）：

https://github.com/huawei-noah/AdderNet?

github.com

3. CARS: Contunuous Evolution for Efficient Neural Architecture Search：

https://arxiv.org/pdf/1909.04977?

arxiv.org

開源代碼（很快會上傳）：

https://github.com/huawei-noah/CARS?

github.com

其他論文還沒有掛arxiv和開源，我們會儘快弄起來！

最後，歡迎大家多關注諾亞方舟實驗室！謝謝大家支持！

推薦一篇來自騰訊優圖實驗室的文章，該文章已經被CVPR2020錄取。我和我的高中同班同學為共同一作（跨越多年的友誼，哈哈）。

論文鏈接：Filter Grafting for Deep Neural Networks

代碼鏈接：https://github.com/fxmeng/filter-grafting

我們知道訓練好的神經網路存在很多無效的filter（l1 norm很小），而filter pruning的技術就是用來移除這些無效(不重要)的filter來加速網路的前向推理能力。我們思考，假如不對這些filter進行移除操作，而是重新激活這些filter是不是會增加網路的表達能力從而提升模型性能呢？於是我們研究了激活這些filter的多種方式。在所有的嘗試中，發現引入外部信息來激活filter可以達到最佳的效果。我們把這個激活方式叫做filter grafting（濾波器嫁接）。簡單來說，就是把其他網路有效的filter的信息（weight）嫁接到self-network無效的filter上。多個網路互相彼此嫁接來共同促進進步，如圖所示：

值得注意的是嫁接並不改變網路結構，我們只是把其他網路有效filter的weight按照一定的比例加到self-network上。比例由我們設計的一個adaptive function決定。測試的時候只用嫁接後的一個network進行測試。兩個相同結構的網路彼此嫁接的結果如下：

我們同時發現嫁接時增加互助網路的數量可以進一步提升模型性能：

我們對嫁接後的網路進行實驗，發現嫁接確實可以減少無效filter的數量：

具體的實現細節可以在文章和代碼裏進行了解。我們希望grafting可以使更多研究者關注到神經網路filter-specific training的設計上。其實在這篇文章投稿後我們又觀察到grafting技術和一個learning子領域有著很有趣的聯繫。我們將相關的實驗和分析已經整理成paper投稿到另一個會議上，希望好運。特別感謝文章裏的作者：珂珂，大師，紀老師等對文章的修改及幫助。希望疫情快點結束，在家快憋壞了。。。

最後附個人cv，歡迎小夥伴們來上海優圖實習鴨～

SGAS （CVPR 2020）

更新：

也歡迎大家關注下我們和Intel ISL合作的工作SGAS(SGAS: Sequential Greedy Architecture Search），通過貪心的搜索方式減輕了NAS中模型排名在搜索和最後評估不一致的問題。更優更快的網路結構搜索演算法，另外同時支持CNN和GCN的搜索。代碼已開源，想在圖像，點雲，生物圖數據上做網路結構搜索的同學都可以試一試。有問題歡迎聯繫我和 @登高居士。

項目：https://www.deepgcns.org/auto/sgas

Paper: https://arxiv.org/abs/1912.00195

Github: https://github.com/lightaime/sgas

4.15更新：

第一次Rebuttal成功，沒有太多的總結性經驗，就把自己的Rebuttal貼出來，很多情況不一定適用，僅供大家參考。

（為了具有一般性，我將一些論文專用的術語去敏了，R1，R2，R3分別代表審稿人123）

We thank the reviewers for their valuable feedback, especially the comments on our novelty, extending XXX to XXX, and reporting results across multiple runs. We believe that reporting results in such a way is crucial for evaluating proposed search methods.

（盡然初始review分不高，但審稿人有肯定論文的點，第一句先重複下他們肯定的地方）

Common Response Hyper-parameters New ablation studies are not allowed (PAMI-TC policy). We add a discussion to explain choices of parameters here. More ablation studies will be added to the final version. Three parameters are introduced: (1) XXX (2) XXX (3) XXX
(1) Since XXX, choosing XXX leads to stable results. We simply set the XXX to X. (2) For XXX experiments, the XXX is chosen to be X. Since XXX. For a fair comparison to our baseline XXX, we want the XXX to last up to X, which is the length of X. For XXX architectures, in order to XXX, we XXX. Thus, XXX. Similarly, to keep XXX, we set the XXX to be X.

(3) The XXX is always set as X, which is simply chosen to be slightly smaller than the X.

（如果論文引入了超參數，審稿人總會問超參是如何選擇的，這裡在Common Response給出了參數選擇的理由）

R1 Code availability We follow the best practices for scientific research on XXX suggested in XXX, i.e. we will release all the code (search and train), the pre-trained models, and all details needed to reproduce our results once the paper is published. We hope to benefit both XXX and XXX research communities.

（R1給了Boardline分，主要問了參數選擇和代碼是否會公佈的問題，這裡承諾論文接收後代碼會公佈）

R2) Incomplete related work We will add this paper.

（為了節省空間，對於審稿人建議引用的論文，我們簡單回復）

R2) Hand-wavy claim We strongly disagree with the reviewer on this point for the following reasons.

（R2給了Strong Reject，主要對質疑我們論文的出發點成立不成立。這裡分三點回答，第一點，重複下論文裏的Claim；第二點，引用論文中的實驗來說明Claim的成立；第三點，通過引用最近的相關工作來進一步佐證它的成立；第四點，論證所提出方法的必要性）

(i) We did not make this claim for XXX specifically. To clarify, we claim that XXX is a universal problem in XXX methods that need to XXX or use XXX (see LXXX, LXXX).
(ii) In the paper, we show that XXX may end up XXX (see Figure X). This empirically validates the XXX problem.(iii) Our claim is further validated by other work, e.g. the two recent XXX papers cite{XXX, XXX}, which studied XXX. In fact, experiments in cite{XXX} show that XXX methods with a XXX mechanism result in XXX, and perform better when XXX is not used (we also mentioned this in LXXX). In cite{XXX}, the XXX is found to degrade the XXX leading to worse performance. (iv) The reviewer suggests that XXX to resolve these issues. This option is most often not computationally viable and the reason why XXX are used instead. Even if we use XXX directly, the XXX in XXX remains an issue that causes XXX. Therefore, our claim is a widely accepted open problem. In this paper, we propose an effective solution to this problem.

R2) XXX metric The reviewer suggests using XXX as the metric to XXX is incorrect and recommends to use XXX metric.

（審稿人這裡質疑一個評價指標的使用正確性，建議了另外一個指標，我們先說該指標是常用的，再給出即使用了所建議的指標，論文結論不變）

(i) Using the XXX metric on XXX to evaluate XXX methods has been used before (e.g. in cite{XXX,XXX} mentioned above).

(ii) We find that there is minimal difference in using XXX or XXX in the XXX metric in our experiments. In fact, if we use XXX rather than XXX to measure the XXX for XXX, the XXX changes minorly from X to X. More importantly, our method still outperforms X by the same amount.
Therefore, we prefer to use X, since it is commonly used and better aligns with the final goal of learning.

R2) Not an impressive result Unlike previous works, the average result we report is obtained through multiple runs. Although this tends to reduce the overall performance gain, we believe this is a better scientific way to report results (it was appreciated by the other reviewers). It is worth mentioning that the best performance increase we have on XXX is X against our baseline XXX). As for the introduced hyper-parameters, please refer to the common response. We do not tweak them for each dataset. Besides, other SOTA methods also introduce extra hyper-parameters. XXX introduces at least X extra hyper-parameters (X, X, X, X).

（審稿人這裡說，我們引入了額外超參，但結果不是很impressive，我們回答因為結果是多次運行的平均，另外其他被接收的相關工作也引入了很多超參）

Based on the clarification above, we do not see any major point to justify this strong reject decision.

（最後總結一下對R2的回復）

R3 More ablation studies Thanks for your advice. We will add the ablation. XXX and XXX are combined into XXX, since the algorithm will be agnostic to XXX, if we only consider XXX. In this case, XXX be selected with a sub-optimal operation at early epochs. On the other hand, we need to XXX for a fair comparison with XXX. Only considering XXX may fail to XXX, since XXX may XXX.

（R3給了Weak Accept，主要想看更多的Ablation，這裡承諾會給出實驗和做適當的解釋）

————————————————————————————————————

更新：Meta reviews出了，從235到223。非常感謝審稿人們在最後討論階段的負責工作！第一次和審稿人冰釋前嫌

圖出處 @魏秀參，感謝大神分享的rebuttal經驗。

————

逆天改命 235中了深深感受到了rebuttal的重要性有空分享下rebuttal經驗

恭喜團隊四篇論文被cvpr2020接收。尤其可賀的是成功地把我近期力推的AET用在了圖模型和GAN網路的無監督自訓練上，取得了突破性的進展。至此，我們的AET (Auto-Encoding Transformations) 已經形成了一個完整系列的系統工作，從圖像分類、物體檢測、圖模型、GAN網路，並有了從資訊理論到李代數的一整套解釋和理論。之後我們會開發一套完整的工具包方便大家使用和研究。

下面主要介紹下AET模型在圖模型和GAN網路無監督訓練上的突破。

圖模型：GraphTER: Unsupervised learning of Graph Transformation Equivariant Representations via Auto-Encoding Node-wise Transformations [pdf]

這個方法是通過對Graph 中node進行全局或局部變換與擾動，並通過對node-wise transformations進行預測來實現GNN網路的self-training。學習得到的特徵既可以是node-wise feature，也可以是對整個graph的feature。這種方法的思想是好的graph特徵應當可以很好地對graph地鏈接與拓撲結構進行編碼，進而能夠從中提取出作用在graph拓撲結構上的各種變換。雖然我們在這篇文章中是以3D點雲相應地graph為研究對象，但所用的自監督graph網路訓練方法具有通用性，可以用在很多其他的graph 任務上。

齊國君：【GraphTER】圖神經網路變換無監督共變特徵學習?

zhuanlan.zhihu.com

GAN模型： Transformation GAN for unsupervised Image Synthesis and Representation Learning

這篇論文中，我們把AET 思想用來訓練GAN模型。這裡用AET對應的loss作為正則化項來更好的訓練GAN中的discriminator。眾所周知的是，GAN 中的discriminator訓練極容易過擬合，而加入各種新的變換後，discriminator 網路可以更好的感知到在不同的變換下，真實樣本和虛假樣本之間的區別，進而可以更好的訓練出更好的generator網路。傳統的數據增強需要假設變換後的樣本仍然具有高度的真實性。但大強度的變換往往會引入各種distortion，使得一個真實的圖像變得扭曲而不再真實。通過AET loss，我們不再直接把變換後的圖像作為正例來訓練discriminator，而僅僅通過預測transformation本身來對discrminator的訓練進行正則化。這種方法，可以使用更大範圍地變換，進而獲得更好地性能。

附上AET 的原創論文：AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transformations rather than Data [pdf]

以及期刊版本： Learning Generalized Transformation Equivariant Representations via Autoencoding Transformations [pdf] （這個版本包含更多結果）

有希望瞭解更多AET內容的同學，可以參看我另外一個回答

如何評價Kaiming He的Momentum Contrast for Unsupervised??

www.zhihu.com

我同時計劃用一個系列八篇文章來比較系統的介紹下以AET為代表的研究變換對稱性的模型從無監督、半監督到全監督各個層面所起到的重要作用，有興趣的同學可以收藏和專註下這個系列，目前剛更新了第一期。

齊國君：人工智慧中的對稱性：從變換到對稱的歷史（一）序言：什麼是對稱性？?

zhuanlan.zhihu.com

簡單來說，AET 是通過對變換本身進行自編碼實現自監督學習(self-supervised)的一種通用方法和架構。在最近一些無監督或自監督的方法中，我們注意到各種變換(transformations)在其中起到的核心作用，這其中包括了Hinton自己公佈的新方法SimpleCLR。基於contrastive loss的方法其實還是在間接的使用transformation來獲得單個樣本的多個copy，而我們提出的AET是一種更加直接地利用對變換本身的預測來實現無監督學習地方法。

我們目前在物體檢測任務上已經可以beat全監督學習到的模型。下一步我們會在我的團隊github主頁上陸陸續續放出更多的結果與代碼，歡迎大家關注

https://github.com/maple-research-lab?

github.com

同時，也歡迎大家關注我的知乎賬號

齊國君?

www.zhihu.com