台湾 || 语言: 大陆简体港澳繁體台灣正體

Dual Attention Network for Scene Segmentation论文笔记

雪花台湾 2019-05-07 02:54

??@Jimmy 2019-03-12 15:01:40

一、基本信息

标题：《Dual Attention Network for Scene Segmentation》

时间：2019

出版源：CVPR 2019

论文领域：语义分割（Object Detection）

主要链接：

homepage: None
arXiv(Paper): https://arxiv.org/abs/1809.02983
github(Official): https://github.com/junfu1115/DANet

二、研究背景

问题： 为了有效地完成场景分割的任务，我们需要区分一些混淆的类别，并考虑不同外观的对象。例如，草原与牧场有时候是很难区分的，公路上的车也存在尺度、视角、遮挡与亮度等的变化。因此，像素级识别需要提高特征表示的识别能力。 In order to accomplish the task of scene segmentation effectively, we need to distinguish some confusing categories and take into account objects with different appearance.
现有解决：

多尺度上下文信息融合 (multi-scale context fusion) : PSPNet etc.
通过使用分解结构或在网路顶部引入有效的编码层来增大内核大小，从而获取更丰富的全局上下文信息
encoder-decoder 结构问：这样做的缺点是什么？答：以上方法可以捕获不同尺度的目标，但是它没有利用目标之间的关系，这些对于场景表达也是重要的。
使用递归神经网路来捕捉长期依赖关系: 例如2D的LSTM。问：这样做的缺点是什么？

答：有效性在很大程度上依赖于长期记忆的学习结果。

三、创新点

3.1 概述

要点： 这篇论文通过基于Self Attention mechanism来捕获上下文依赖，并提出了Dual Attention Networks (DANet)来自适应地整合局部特征和全局依赖。该方法能够自适应地聚合长期上下文信息，从而提高了场景分割的特征表示。
组成： 在一贯的dilated FCN中加入两种类型地attention module。其中position attention module选择性地通过所有位置的加权求和聚集每个位置的特征，channel attention module通过所有channle的feature map中的特征选择性地强调某个特征图。最后将两种attention module的output 求和得到最后的特征表达。 The position attention module selectively aggregates the features at each position by a weighted sum of the features at all positions. Similar features would be related to each other regardless of their distances. Meanwhile, the channel attention module selectively emphasizes interdependent channel maps by integrating associated features among all channel maps. We sum the outputs of the two attention modules to further improve feature representation which contributes to more precise segmentation results.
贡献：
提出了Dual Attention Networks (DANet)在spatial和channle维度来捕获全局特征依赖。
提出position attention module去学习空间特征的相关性，提出channel attention module去建模channle的相关性。
在三个数据集Cityscapes， PASCAL Context和COCO Stuff上实现了state-of-the-art的结果。

3.2 详解

网路构架如下图：

简图

简图

相关文章