本文是對 DeepLab 系列的概括,主要討論模型的設計和改進,附 Pytorch 實現代碼,略去訓練細節以及性能細節,這些都可以在原論文中找到。

原論文地址:

DeepLabv1

arxiv.org/pdf/1412.7062DeepLabv2

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

DeepLabv3Rethinking Atrous Convolution for Semantic Image SegmentationDeepLabv3+Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

DeepLabv1

DeepLabv1 模型結構很容易理解:

  1. 首先它是個 VGG-16
  2. 然後為了使圖像語義分割更準確,5 個 max-pooling 層 skip 了後兩個(具體實現上,看G站上的代碼,似乎沒有去除,而是保留了後兩個 max-pooling ,只是將 stride = 2 改為 stride = 1,kernal = 3),最後卷積層的輸出整體 stride 從 32x 下降至 8x。
  3. 參考 Uno Whoiam:空洞卷積(Dilated Convolution):有之以為利,無之以為用 ,由於後兩個 max-pooling 影響了其後的卷積層,使其視野分別下降了 2x 和 4x,為了保持其原來的視野,便將其改成空洞卷積,dilation 分別為 2 和 4,理念與DRN一致:

4. 當然,它也是一個全卷積網路 Uno Whoiam:FCN:從圖片分類到像素分類 ,即將全連接層替換成 1	imes 1 的卷積層,輸出和原圖大小一致的特徵圖,對每個像素分類。

5. 使用雙線性插值上採樣 8x 得到和原圖大小一致的像素分類圖。

6. 使用 CRF(條件隨機場)使最後分類結果的邊緣更加精細:

啥是 CRF 呢?這裡只給論文里的公式,不深究,v3 以及之後就沒用這玩意了:

E(mathbb x)=sum_i 	heta_i(x_i)+sum_{ij} 	heta_{ij}(x_i,x_j) \ 	heta(x_i)=-log(P(x_i)) \ 	heta_{ij}(x_i,x_j)=mu(x_i,x_j)[w_1 exp(-frac{||p_i-p_j||^2}{2sigma_{alpha}^2}-frac{||I_i-I_j||^2}{2sigma_{eta}^{2}}+w_2exp(-frac{||p_i-p_j||^2}{2sigma^2_{gamma}}))]

其中 P(x_i) 為 DCNN 輸出的置信度; mu(x_i,x_j)= egin{array}  left { egin{array}{lll} 1,&if x_i
eq x_j \ 0, &if x_i = x_j end{array}  end{array} ,p 表示像素的位置,I 表示像素的 RGB 數值;如何理解這玩意呢?簡單來說就是在對一個像素做分類時,不光考慮 DCNN 輸出的結果,還要考慮周圍像素的意見尤其像素值比較接近的,這樣得出的語義分割結果會有更好的邊緣。

7. 多尺寸預測,希望獲得更好的邊界信息,與FCN skip layer類似,具體實現上,在輸入圖片與前四個 max pooling 後添加 128	imes3	imes 3128	imes 1	imes 1 的卷積層,這四個預測結果與最終模型輸出拼接(concatenate)到一起,相當於多了128*5=640個channel。雖然效果不如dense CRF,但也有一定提高。最終模型是結合了Desne CRF與Multi-scale Prediction。

一個簡單的 Pytorch 實現如下,使用 ResNet,第一層為 7	imes7 普通卷積,stride = 2,緊跟著 stride = 2 的 max-pooling,爾後一個普通的 bottleneck ,一個 stride = 2 的 bottleneck,然後 dilation =2、dilation =4 的bottleneck。

參考鏈接:kazuto1011/deeplab-pytorch

from __future__ import absolute_import, print_function

import torch
import torch.nn as nn
import torch.nn.functional as F

class DeepLabV1(nn.Sequential):
"""
DeepLab v1: Dilated ResNet + 1x1 Conv
Note that this is just a container for loading the pretrained COCO model and not mentioned as "v1" in papers.
"""

def __init__(self, n_classes, n_blocks):
super(DeepLabV1, self).__init__()
ch = [64 * 2 ** p for p in range(6)]
self.add_module("layer1", _Stem(ch[0]))
self.add_module("layer2", _ResLayer(n_blocks[0], ch[0], ch[2], 1, 1))
self.add_module("layer3", _ResLayer(n_blocks[1], ch[2], ch[3], 2, 1))
self.add_module("layer4", _ResLayer(n_blocks[2], ch[3], ch[4], 1, 2))
self.add_module("layer5", _ResLayer(n_blocks[3], ch[4], ch[5], 1, 4))
self.add_module("fc", nn.Conv2d(2048, n_classes, 1))

try:
from encoding.nn import SyncBatchNorm

_BATCH_NORM = SyncBatchNorm
except:
_BATCH_NORM = nn.BatchNorm2d

_BOTTLENECK_EXPANSION = 4

class _ConvBnReLU(nn.Sequential):
"""
Cascade of 2D convolution, batch norm, and ReLU.
"""

BATCH_NORM = _BATCH_NORM

def __init__(
self, in_ch, out_ch, kernel_size, stride, padding, dilation, relu=True
):
super(_ConvBnReLU, self).__init__()
self.add_module(
"conv",
nn.Conv2d(
in_ch, out_ch, kernel_size, stride, padding, dilation, bias=False
),
)
self.add_module("bn", _BATCH_NORM(out_ch, eps=1e-5, momentum=0.999))

if relu:
self.add_module("relu", nn.ReLU())

class _Bottleneck(nn.Module):
"""
Bottleneck block of MSRA ResNet.
"""

def __init__(self, in_ch, out_ch, stride, dilation, downsample):
super(_Bottleneck, self).__init__()
mid_ch = out_ch // _BOTTLENECK_EXPANSION
self.reduce = _ConvBnReLU(in_ch, mid_ch, 1, stride, 0, 1, True)
self.conv3x3 = _ConvBnReLU(mid_ch, mid_ch, 3, 1, dilation, dilation, True)
self.increase = _ConvBnReLU(mid_ch, out_ch, 1, 1, 0, 1, False)
self.shortcut = (
_ConvBnReLU(in_ch, out_ch, 1, stride, 0, 1, False)
if downsample
else lambda x: x # identity
)

def forward(self, x):
h = self.reduce(x)
h = self.conv3x3(h)
h = self.increase(h)
h += self.shortcut(x)
return F.relu(h)

class _ResLayer(nn.Sequential):
"""
Residual layer with multi grids
"""

def __init__(self, n_layers, in_ch, out_ch, stride, dilation, multi_grids=None):
super(_ResLayer, self).__init__()

if multi_grids is None:
multi_grids = [1 for _ in range(n_layers)]
else:
assert n_layers == len(multi_grids)

# Downsampling is only in the first block
for i in range(n_layers):
self.add_module(
"block{}".format(i + 1),
_Bottleneck(
in_ch=(in_ch if i == 0 else out_ch),
out_ch=out_ch,
stride=(stride if i == 0 else 1),
dilation=dilation * multi_grids[i],
downsample=(True if i == 0 else False),
),
)

class _Stem(nn.Sequential):
"""
The 1st conv layer.
Note that the max pooling is different from both MSRA and FAIR ResNet.
"""

def __init__(self, out_ch):
super(_Stem, self).__init__()
self.add_module("conv1", _ConvBnReLU(3, out_ch, 7, 2, 3, 1))
self.add_module("pool", nn.MaxPool2d(3, 2, 1, ceil_mode=True))

class _Flatten(nn.Module):
def forward(self, x):
return x.view(x.size(0), -1)

if __name__ == "__main__":
model = DeepLabV1(n_classes=21, n_blocks=[3, 4, 23, 3])
model.eval()
image = torch.randn(1, 3, 513, 513)

print(model)
print("input:", image.shape)
print("output:", model(image).shape)

DeepLabv2

DeepLabv2 相對於 v1 最大的改動是增加了受 SPP(Spacial Pyramid Pooling) 啟發得來的 ASPP(Atrous Spacial Pyramid Pooling),在模型最後進行像素分類之前增加一個類似 Inception 的結構,包含不同 rate(空洞間隔) 的 Atrous Conv(空洞卷積),增強模型識別不同尺寸的同一物體的能力:

DeepLabv2 Pytorch 實現:

from __future__ import absolute_import, print_function

import torch
import torch.nn as nn
import torch.nn.functional as F

class _ASPP(nn.Module):
"""
Atrous spatial pyramid pooling (ASPP)
"""

def __init__(self, in_ch, out_ch, rates):
super(_ASPP, self).__init__()
for i, rate in enumerate(rates):
self.add_module(
"c{}".format(i),
nn.Conv2d(in_ch, out_ch, 3, 1, padding=rate, dilation=rate, bias=True),
)

for m in self.children():
nn.init.normal_(m.weight, mean=0, std=0.01)
nn.init.constant_(m.bias, 0)

def forward(self, x):
return sum([stage(x) for stage in self.children()])

class DeepLabV2(nn.Sequential):
"""
DeepLab v2: Dilated ResNet + ASPP
Output stride is fixed at 8
"""

def __init__(self, n_classes, n_blocks, atrous_rates):
super(DeepLabV2, self).__init__()
ch = [64 * 2 ** p for p in range(6)]
self.add_module("layer1", _Stem(ch[0]))
self.add_module("layer2", _ResLayer(n_blocks[0], ch[0], ch[2], 1, 1))
self.add_module("layer3", _ResLayer(n_blocks[1], ch[2], ch[3], 2, 1))
self.add_module("layer4", _ResLayer(n_blocks[2], ch[3], ch[4], 1, 2))
self.add_module("layer5", _ResLayer(n_blocks[3], ch[4], ch[5], 1, 4))
self.add_module("aspp", _ASPP(ch[5], n_classes, atrous_rates))

def freeze_bn(self):
for m in self.modules():
if isinstance(m, _ConvBnReLU.BATCH_NORM):
m.eval()

try:
from encoding.nn import SyncBatchNorm

_BATCH_NORM = SyncBatchNorm
except:
_BATCH_NORM = nn.BatchNorm2d

_BOTTLENECK_EXPANSION = 4

class _ConvBnReLU(nn.Sequential):
"""
Cascade of 2D convolution, batch norm, and ReLU.
"""

BATCH_NORM = _BATCH_NORM

def __init__(
self, in_ch, out_ch, kernel_size, stride, padding, dilation, relu=True
):
super(_ConvBnReLU, self).__init__()
self.add_module(
"conv",
nn.Conv2d(
in_ch, out_ch, kernel_size, stride, padding, dilation, bias=False
),
)
self.add_module("bn", _BATCH_NORM(out_ch, eps=1e-5, momentum=0.999))

if relu:
self.add_module("relu", nn.ReLU())

class _Bottleneck(nn.Module):
"""
Bottleneck block of MSRA ResNet.
"""

def __init__(self, in_ch, out_ch, stride, dilation, downsample):
super(_Bottleneck, self).__init__()
mid_ch = out_ch // _BOTTLENECK_EXPANSION
self.reduce = _ConvBnReLU(in_ch, mid_ch, 1, stride, 0, 1, True)
self.conv3x3 = _ConvBnReLU(mid_ch, mid_ch, 3, 1, dilation, dilation, True)
self.increase = _ConvBnReLU(mid_ch, out_ch, 1, 1, 0, 1, False)
self.shortcut = (
_ConvBnReLU(in_ch, out_ch, 1, stride, 0, 1, False)
if downsample
else lambda x: x # identity
)

def forward(self, x):
h = self.reduce(x)
h = self.conv3x3(h)
h = self.increase(h)
h += self.shortcut(x)
return F.relu(h)

class _ResLayer(nn.Sequential):
"""
Residual layer with multi grids
"""

def __init__(self, n_layers, in_ch, out_ch, stride, dilation, multi_grids=None):
super(_ResLayer, self).__init__()

if multi_grids is None:
multi_grids = [1 for _ in range(n_layers)]
else:
assert n_layers == len(multi_grids)

# Downsampling is only in the first block
for i in range(n_layers):
self.add_module(
"block{}".format(i + 1),
_Bottleneck(
in_ch=(in_ch if i == 0 else out_ch),
out_ch=out_ch,
stride=(stride if i == 0 else 1),
dilation=dilation * multi_grids[i],
downsample=(True if i == 0 else False),
),
)

class _Stem(nn.Sequential):
"""
The 1st conv layer.
Note that the max pooling is different from both MSRA and FAIR ResNet.
"""

def __init__(self, out_ch):
super(_Stem, self).__init__()
self.add_module("conv1", _ConvBnReLU(3, out_ch, 7, 2, 3, 1))
self.add_module("pool", nn.MaxPool2d(3, 2, 1, ceil_mode=True))

if __name__ == "__main__":
model = DeepLabV2(
n_classes=21, n_blocks=[3, 4, 23, 3], atrous_rates=[6, 12, 18, 24]
)
model.eval()
image = torch.randn(1, 3, 513, 513)

print(model)
print("input:", image.shape)
print("output:", model(image).shape)

另外,DeepLabv2 採用了 Poly 的訓練策略:

\ lr _{iter}= lr_0cdot(1-frac{iter}{max\_iter})^{power}

power=0.9 時,模型效果要比普通的分段學習率策略要高 1.17\% ,Pytorch 實現如下:

github.com/kazuto1011/d

from torch.optim.lr_scheduler import _LRScheduler

class PolynomialLR(_LRScheduler):
def __init__(self, optimizer, step_size, iter_max, power, last_epoch=-1):
self.step_size = step_size
self.iter_max = iter_max
self.power = power
super(PolynomialLR, self).__init__(optimizer, last_epoch)

def polynomial_decay(self, lr):
return lr * (1 - float(self.last_epoch) / self.iter_max) ** self.power

def get_lr(self):
if (
(self.last_epoch == 0)
or (self.last_epoch % self.step_size != 0)
or (self.last_epoch > self.iter_max)
):
return [group["lr"] for group in self.optimizer.param_groups]
return [self.polynomial_decay(lr) for lr in self.base_lrs]

DeepLabv3

DeepLabv3 的主要變化如下:

  1. 使用了Multi-Grid 策略,即在模型後端多加幾層不同 rate 的空洞卷積:

2. 將 batch normalization 加入到 ASPP模塊.

3. 具有不同 atrous rates 的 ASPP 能夠有效的捕獲多尺度信息。不過,論文發現,隨著sampling rate的增加,有效filter特徵權重(即有效特徵區域,而不是補零區域的權重)的數量會變小,極端情況下,當空洞卷積的 rate 和 feature map 的大小一致時, 3	imes 3 卷積會退化成 1	imes 1

為了保留較大視野的空洞卷積的同時解決這個問題,DeepLabv3 的 ASPP 加入了 全局池化層+conv1x1+雙線性插值上採樣 的模塊:

DeepLabv3 的Pytorch 實現如下:

from __future__ import absolute_import, print_function

from collections import OrderedDict

import torch
import torch.nn as nn
import torch.nn.functional as F

class _ImagePool(nn.Module):
def __init__(self, in_ch, out_ch):
super().__init__()
self.pool = nn.AdaptiveAvgPool2d(1)
self.conv = _ConvBnReLU(in_ch, out_ch, 1, 1, 0, 1)

def forward(self, x):
_, _, H, W = x.shape
h = self.pool(x)
h = self.conv(h)
h = F.interpolate(h, size=(H, W), mode="bilinear", align_corners=False)
return h

class _ASPP(nn.Module):
"""
Atrous spatial pyramid pooling with image-level feature
"""

def __init__(self, in_ch, out_ch, rates):
super(_ASPP, self).__init__()
self.stages = nn.Module()
self.stages.add_module("c0", _ConvBnReLU(in_ch, out_ch, 1, 1, 0, 1))
for i, rate in enumerate(rates):
self.stages.add_module(
"c{}".format(i + 1),
_ConvBnReLU(in_ch, out_ch, 3, 1, padding=rate, dilation=rate),
)
self.stages.add_module("imagepool", _ImagePool(in_ch, out_ch))

def forward(self, x):
return torch.cat([stage(x) for stage in self.stages.children()], dim=1)

class DeepLabV3(nn.Sequential):
"""
DeepLab v3: Dilated ResNet with multi-grid + improved ASPP
"""

def __init__(self, n_classes, n_blocks, atrous_rates, multi_grids, output_stride):
super(DeepLabV3, self).__init__()

# Stride and dilation
if output_stride == 8:
s = [1, 2, 1, 1]
d = [1, 1, 2, 4]
elif output_stride == 16:
s = [1, 2, 2, 1]
d = [1, 1, 1, 2]

ch = [64 * 2 ** p for p in range(6)]
self.add_module("layer1", _Stem(ch[0]))
self.add_module("layer2", _ResLayer(n_blocks[0], ch[0], ch[2], s[0], d[0]))
self.add_module("layer3", _ResLayer(n_blocks[1], ch[2], ch[3], s[1], d[1]))
self.add_module("layer4", _ResLayer(n_blocks[2], ch[3], ch[4], s[2], d[2]))
self.add_module(
"layer5", _ResLayer(n_blocks[3], ch[4], ch[5], s[3], d[3], multi_grids)
)
self.add_module("aspp", _ASPP(ch[5], 256, atrous_rates))
concat_ch = 256 * (len(atrous_rates) + 2)
self.add_module("fc1", _ConvBnReLU(concat_ch, 256, 1, 1, 0, 1))
self.add_module("fc2", nn.Conv2d(256, n_classes, kernel_size=1))

try:
from encoding.nn import SyncBatchNorm

_BATCH_NORM = SyncBatchNorm
except:
_BATCH_NORM = nn.BatchNorm2d

_BOTTLENECK_EXPANSION = 4

class _ConvBnReLU(nn.Sequential):
"""
Cascade of 2D convolution, batch norm, and ReLU.
"""

BATCH_NORM = _BATCH_NORM

def __init__(
self, in_ch, out_ch, kernel_size, stride, padding, dilation, relu=True
):
super(_ConvBnReLU, self).__init__()
self.add_module(
"conv",
nn.Conv2d(
in_ch, out_ch, kernel_size, stride, padding, dilation, bias=False
),
)
self.add_module("bn", _BATCH_NORM(out_ch, eps=1e-5, momentum=0.999))

if relu:
self.add_module("relu", nn.ReLU())

class _Bottleneck(nn.Module):
"""
Bottleneck block of MSRA ResNet.
"""

def __init__(self, in_ch, out_ch, stride, dilation, downsample):
super(_Bottleneck, self).__init__()
mid_ch = out_ch // _BOTTLENECK_EXPANSION
self.reduce = _ConvBnReLU(in_ch, mid_ch, 1, stride, 0, 1, True)
self.conv3x3 = _ConvBnReLU(mid_ch, mid_ch, 3, 1, dilation, dilation, True)
self.increase = _ConvBnReLU(mid_ch, out_ch, 1, 1, 0, 1, False)
self.shortcut = (
_ConvBnReLU(in_ch, out_ch, 1, stride, 0, 1, False)
if downsample
else lambda x: x # identity
)

def forward(self, x):
h = self.reduce(x)
h = self.conv3x3(h)
h = self.increase(h)
h += self.shortcut(x)
return F.relu(h)

class _ResLayer(nn.Sequential):
"""
Residual layer with multi grids
"""

def __init__(self, n_layers, in_ch, out_ch, stride, dilation, multi_grids=None):
super(_ResLayer, self).__init__()

if multi_grids is None:
multi_grids = [1 for _ in range(n_layers)]
else:
assert n_layers == len(multi_grids)

# Downsampling is only in the first block
for i in range(n_layers):
self.add_module(
"block{}".format(i + 1),
_Bottleneck(
in_ch=(in_ch if i == 0 else out_ch),
out_ch=out_ch,
stride=(stride if i == 0 else 1),
dilation=dilation * multi_grids[i],
downsample=(True if i == 0 else False),
),
)

class _Stem(nn.Sequential):
"""
The 1st conv layer.
Note that the max pooling is different from both MSRA and FAIR ResNet.
"""

def __init__(self, out_ch):
super(_Stem, self).__init__()
self.add_module("conv1", _ConvBnReLU(3, out_ch, 7, 2, 3, 1))
self.add_module("pool", nn.MaxPool2d(3, 2, 1, ceil_mode=True))

if __name__ == "__main__":
model = DeepLabV3(
n_classes=21,
n_blocks=[3, 4, 23, 3],
atrous_rates=[6, 12, 18],
multi_grids=[1, 2, 4],
output_stride=8,
)
model.eval()
image = torch.randn(1, 3, 513, 513)

print(model)
print("input:", image.shape)
print("output:", model(image).shape)

DeepLabv3+

V3+ 最大的改進是將 DeepLab 的 DCNN 部分看做 Encoder,將 DCNN 輸出的特徵圖上採樣成原圖大小的部分看做 Decoder ,構成 Encoder+Decoder 體系,雙線性插值上採樣便是一個簡單的 Decoder,而強化 Decoder 便可使模型整體在圖像語義分割邊緣部分取得良好的結果。

具體來說,DeepLabV3+ 在 stride = 16 的DeepLabv3 模型輸出上採樣 4x 後,將 DCNN 中 0.25x 的輸出使用 1	imes 1 的卷積降維後與之連接(concat)再使用 3	imes 3 卷積處理後雙線性插值上採樣 4 倍後得到相對於 DeepLabv3 更精細的結果。

DeepLabv3的其他改進還有:

  1. 借鑒MobileNet,使用 Depth-wise 空洞卷積+ 1	imes1 卷積:

2. 使用修改過的 Xception:

使用 Pytorch 的DeepLabv3+ 實現如下:

from __future__ import absolute_import, print_function

from collections import OrderedDict

import torch
import torch.nn as nn
import torch.nn.functional as F

class _ASPP(nn.Module):
"""
Atrous spatial pyramid pooling with image-level feature
"""

def __init__(self, in_ch, out_ch, rates):
super(_ASPP, self).__init__()
self.stages = nn.Module()
self.stages.add_module("c0", _ConvBnReLU(in_ch, out_ch, 1, 1, 0, 1))
for i, rate in enumerate(rates):
self.stages.add_module(
"c{}".format(i + 1),
_ConvBnReLU(in_ch, out_ch, 3, 1, padding=rate, dilation=rate),
)
self.stages.add_module("imagepool", _ImagePool(in_ch, out_ch))

def forward(self, x):
return torch.cat([stage(x) for stage in self.stages.children()], dim=1)

class DeepLabV3Plus(nn.Module):
"""
DeepLab v3+: Dilated ResNet with multi-grid + improved ASPP + decoder
"""

def __init__(self, n_classes, n_blocks, atrous_rates, multi_grids, output_stride):
super(DeepLabV3Plus, self).__init__()

# Stride and dilation
if output_stride == 8:
s = [1, 2, 1, 1]
d = [1, 1, 2, 4]
elif output_stride == 16:
s = [1, 2, 2, 1]
d = [1, 1, 1, 2]

# Encoder
ch = [64 * 2 ** p for p in range(6)]
self.layer1 = _Stem(ch[0])
self.layer2 = _ResLayer(n_blocks[0], ch[0], ch[2], s[0], d[0])
self.layer3 = _ResLayer(n_blocks[1], ch[2], ch[3], s[1], d[1])
self.layer4 = _ResLayer(n_blocks[2], ch[3], ch[4], s[2], d[2])
self.layer5 = _ResLayer(n_blocks[3], ch[4], ch[5], s[3], d[3], multi_grids)
self.aspp = _ASPP(ch[5], 256, atrous_rates)
concat_ch = 256 * (len(atrous_rates) + 2)
self.add_module("fc1", _ConvBnReLU(concat_ch, 256, 1, 1, 0, 1))

# Decoder
self.reduce = _ConvBnReLU(256, 48, 1, 1, 0, 1)
self.fc2 = nn.Sequential(
OrderedDict(
[
("conv1", _ConvBnReLU(304, 256, 3, 1, 1, 1)),
("conv2", _ConvBnReLU(256, 256, 3, 1, 1, 1)),
("conv3", nn.Conv2d(256, n_classes, kernel_size=1)),
]
)
)

def forward(self, x):
h = self.layer1(x)
h = self.layer2(h)
h_ = self.reduce(h)
h = self.layer3(h)
h = self.layer4(h)
h = self.layer5(h)
h = self.aspp(h)
h = self.fc1(h)
h = F.interpolate(h, size=h_.shape[2:], mode="bilinear", align_corners=False)
h = torch.cat((h, h_), dim=1)
h = self.fc2(h)
h = F.interpolate(h, size=x.shape[2:], mode="bilinear", align_corners=False)
return h

try:
from encoding.nn import SyncBatchNorm

_BATCH_NORM = SyncBatchNorm
except:
_BATCH_NORM = nn.BatchNorm2d

_BOTTLENECK_EXPANSION = 4

class _ConvBnReLU(nn.Sequential):
"""
Cascade of 2D convolution, batch norm, and ReLU.
"""

BATCH_NORM = _BATCH_NORM

def __init__(
self, in_ch, out_ch, kernel_size, stride, padding, dilation, relu=True
):
super(_ConvBnReLU, self).__init__()
self.add_module(
"conv",
nn.Conv2d(
in_ch, out_ch, kernel_size, stride, padding, dilation, bias=False
),
)
self.add_module("bn", _BATCH_NORM(out_ch, eps=1e-5, momentum=0.999))

if relu:
self.add_module("relu", nn.ReLU())

class _Bottleneck(nn.Module):
"""
Bottleneck block of MSRA ResNet.
"""

def __init__(self, in_ch, out_ch, stride, dilation, downsample):
super(_Bottleneck, self).__init__()
mid_ch = out_ch // _BOTTLENECK_EXPANSION
self.reduce = _ConvBnReLU(in_ch, mid_ch, 1, stride, 0, 1, True)
self.conv3x3 = _ConvBnReLU(mid_ch, mid_ch, 3, 1, dilation, dilation, True)
self.increase = _ConvBnReLU(mid_ch, out_ch, 1, 1, 0, 1, False)
self.shortcut = (
_ConvBnReLU(in_ch, out_ch, 1, stride, 0, 1, False)
if downsample
else lambda x: x # identity
)

def forward(self, x):
h = self.reduce(x)
h = self.conv3x3(h)
h = self.increase(h)
h += self.shortcut(x)
return F.relu(h)

class _ResLayer(nn.Sequential):
"""
Residual layer with multi grids
"""

def __init__(self, n_layers, in_ch, out_ch, stride, dilation, multi_grids=None):
super(_ResLayer, self).__init__()

if multi_grids is None:
multi_grids = [1 for _ in range(n_layers)]
else:
assert n_layers == len(multi_grids)

# Downsampling is only in the first block
for i in range(n_layers):
self.add_module(
"block{}".format(i + 1),
_Bottleneck(
in_ch=(in_ch if i == 0 else out_ch),
out_ch=out_ch,
stride=(stride if i == 0 else 1),
dilation=dilation * multi_grids[i],
downsample=(True if i == 0 else False),
),
)

class _Stem(nn.Sequential):
"""
The 1st conv layer.
Note that the max pooling is different from both MSRA and FAIR ResNet.
"""

def __init__(self, out_ch):
super(_Stem, self).__init__()
self.add_module("conv1", _ConvBnReLU(3, out_ch, 7, 2, 3, 1))
self.add_module("pool", nn.MaxPool2d(3, 2, 1, ceil_mode=True))

if __name__ == "__main__":
model = DeepLabV3Plus(
n_classes=21,
n_blocks=[3, 4, 23, 3],
atrous_rates=[6, 12, 18],
multi_grids=[1, 2, 4],
output_stride=16,
)
model.eval()
image = torch.randn(1, 3, 513, 513)

print(model)
print("input:", image.shape)
print("output:", model(image).shape)

參考鏈接:

blog.csdn.net/junparado

清歡守護者:精讀深度學習論文(20) DeepLab V1

blog.csdn.net/u01197463語義分割論文-DeepLab系列語義分割中 CRF 的運用blog.csdn.net/u01197463

PS:

廣告時間啦~

理工狗不想被人文素養拖後腿?不妨關注微信公眾號:

歡迎掃碼關注~

推薦閱讀:

相关文章