物體檢測技術是解決許多業務問題的關鍵性技術,如ADAS中的FCW(前碰預警)中,需要藉助物體檢測技術檢測和識別前方的車輛和行人;又如人臉識別閘機,需要藉助人臉檢測器檢測需要過閘機的人員,並把人臉ROI返回給人臉識別模塊進行驗證等等。本專欄的文章和內容主要面向嵌入式

端的AI演算法,所以本文主要跟大家討論如何去DIY能在嵌入式設備上近實時運行的Mobilenet-SSD。

這裡我們採用的是google開源的Object detection API,安裝過程大家可以參考文檔,這裡不再進行贅述。我們在使用Object detection API訓練自己的物體檢測器的時候,一般需要先經過如下幾個步驟:訓練數據(tfrecord)的準備、預處理函數設置、模型選擇、訓練參數配置,在完成以上步驟後,我們就可以按照官方提供的訓練步驟進行訓練。下面我們來跟大家聊一聊,如何去進行模型DIY的準備。

  • 數據準備

這個一般根據大家自己的標註數據,可以是任何格式,一般的形式為一張圖片內對應不同類別的bounding boxes。大家在生成tfrecord的時候,只需要對應Object detection api提供tfrecord格式。如下為簡單的實例代碼,你可以跟進自己的標註文件格式進行自定義的解析。

def create_tf_example(example):
# TODO(user): Populate the following variables from your example.
height = example[height] # Image height
width = example[width] # Image width
filename = example[filename] # Filename of the image. Empty if image is not from file
encoded_image_data = example[image] # Encoded image bytes
image_format = example[format] # bjpeg or bpng

xmins = example[xmin] # List of normalized left x coordinates in bounding box (1 per box)
xmaxs = example[xmax] # List of normalized right x coordinates in bounding box(1 per box)
ymins = example[ymin] # List of normalized top y coordinates in bounding box (1 per box)
ymaxs = example[ymax] # List of normalized bottom y coordinates in bounding box(1 per box)
classes_text = example[text] # List of string class name of bounding box (1 per box)
classes = example[label] # List of integer class id of bounding box (1 per box)

tf_example = tf.train.Example(features=tf.train.Features(feature={
image/height: dataset_util.int64_feature(height),
image/width: dataset_util.int64_feature(width),
image/filename: dataset_util.bytes_feature(filename),
image/source_id: dataset_util.bytes_feature(filename),
image/encoded: dataset_util.bytes_feature(encoded_image_data),
image/format: dataset_util.bytes_feature(image_format),
image/object/bbox/xmin: dataset_util.float_list_feature(xmins),
image/object/bbox/xmax: dataset_util.float_list_feature(xmaxs),
image/object/bbox/ymin: dataset_util.float_list_feature(ymins),
image/object/bbox/ymax: dataset_util.float_list_feature(ymaxs),
image/object/class/text: dataset_util.bytes_list_feature(classes_text),
image/object/class/label: dataset_util.int64_list_feature(classes),
}))
return tf_example

所以你需要根據自己的標準格式,進行解析,然後逐個的裝配近python dict中的image/height、image/width、image/filename等等。其中需要注意的是,xmin、xmax、ymin、ymax為數組,且其內元素的值域範圍是[0, 1],值都已經用圖像的width和height進行了歸一化。如下為裝配tfrecord的實例代碼,

import cv2
import glob
import os
import xml.etree.ElementTree as ET
import tensorflow as tf
from object_detection.utils import dataset_util
from PIL import Image
import io
import random

train_writer = tf.python_io.TFRecordWriter("./dms_train.tfrecords")
test_writer = tf.python_io.TFRecordWriter("./dms_test.tfrecords")
examples = []

annotation_files = [
"./annotations/2019-06-13.xml",
"./annotations/2019-06-25.xml"
]

for annotation_file in annotation_files:
# annotation_file = "./annotations/1468_imglab.xml";
root = ET.parse(annotation_file).getroot()
image_folder = annotation_file.replace("./annotations/", "")
image_folder = image_folder.replace(".xml", "")

for image in root.findall("images/image"):
filename = image.get("file")
filename = os.path.join("./image-data/", image_folder, filename)
boxes = image.findall("box")

with tf.gfile.GFile(filename, rb) as fid:
encoded_jpg = fid.read()
encoded_jpg_io = io.BytesIO(encoded_jpg)
img = Image.open(encoded_jpg_io)
width, height = img.size

xmins = []
ymins = []
xmaxs = []
ymaxs = []
texts = []
labels = []
example = {}

for box in boxes:
x = int(box.get("left")) / float(width)
y = int(box.get("top")) / float(height)
w = int(box.get("width")) / float(width)
h = int(box.get("height")) / float(height)
label_text = box.findall("label")[0].text
label = -1

xmin = x
ymin = y
xmax = x + w
ymax = y + h
if xmin < 0.0:
xmin = 0.0
if ymin < 0.0:
ymin = 0.0
if xmax > 1.0:
xmax = 1.0
if ymax > 1.0:
ymax = 1.0

if "face" == label_text:
label = 1
labels.append(label)
texts.append("face")
xmins.append(xmin)
ymins.append(ymin)
xmaxs.append(xmax)
ymaxs.append(ymax)
elif "phone" == label_text:
label = 2
labels.append(label)
texts.append("phone")
xmins.append(xmin)
ymins.append(ymin)
xmaxs.append(xmax)
ymaxs.append(ymax)
elif "smoke" == label_text:
label = 3
labels.append(label)
texts.append("smoke")
xmins.append(xmin)
ymins.append(ymin)
xmaxs.append(xmax)
ymaxs.append(ymax)

if len(xmins) > 0:
filename = filename.encode(utf8)
example[filename] = filename
example[image] = encoded_jpg
example[format] = bjpg
example[height] = height
example[width] = width
example[xmin] = xmins
example[xmax] = xmaxs
example[ymin] = ymins
example[ymax] = ymaxs
example[label]= labels
example[text] = texts
examples.append(example)

idx = 0
random.shuffle(examples)
for example in examples:
tf_example = create_tf_example(example)
if idx < 100:
test_writer.write(tf_example.SerializeToString())
else:
train_writer.write(tf_example.SerializeToString())
idx += 1

train_writer.close()
test_writer.close()

  • 預處理函數設置

Object detection API本身把預處理的函數寫死了,默認採用的是 value*2 /255 -1的操作,映射到[-1, 1]空間中。這裡若對預處理函數有特殊需求的,例如需要採用ImageNet std-mean values進行白化的,大家可以參考如下代碼,對models/ssd_mobilenet_v1_feature_extractor.py中的preprocess進行修改:

def preprocess(self, resized_inputs):
"""SSD preprocessing.

Maps pixel values to the range [-1, 1].

Args:
resized_inputs: a [batch, height, width, channels] float tensor
representing a batch of images.

Returns:
preprocessed_inputs: a [batch, height, width, channels] float tensor
representing a batch of images.
"""
means = tf.constant((123.00,123.00,123.00), dtype=tf.float32)
deriv = tf.constant((58.000,58.000,58.000), dtype=tf.float32)
output = tf.subtract(resized_inputs, means)
output = tf.divide(output, deriv)
return output
#return (2.0 / 255.0) * resized_inputs - 1.0

  • 模型選擇

模型選擇其實就是選擇適合你業務場景的Mobilenet-SSD模型參數,這個模型參數我們一般在模型config文件中進行配置,目前可調整模型大小的參數為輸入數據的width、height,每個depthwise輸出的通道控制參數depth_multiplier,以及anchor_generator的內部參數。例如,我們如果針對近距離人臉檢測的場景,其實輸入可以很小,224x224的輸入尺度,以及depth_multiplier取0.5就可以滿足我們的業務需求。該模型幾乎可以近實時的運行在目前的中低端嵌入式設備中,如RK3288、RK3399等;對於anchor_generator可調整的參數就比較多了,比如你可以減少aspect_ratios的item個數(對應減少輸出的anchor的個數)、調整min_scale和max_scale(會影響對大小物體的敏感程度)、num_layers(輸入object detection layers的特徵層數,mobilenet-ssd中默認輸入6個特徵層)。

  • 訓練參數配置

訓練參數的配置主要影響檢測器的檢測效果,不影響檢測器的速度;訓練參數配置可選的比較多,一般都針對優化演算法、batch、learning-rate、data-augmentation進行調整。特別是data-augmentation,大家在進行訓練的時候,可多進行嘗試,如隨機水平翻轉、隨機圖像值變化、隨機裁剪等等。若合理的進行data-augmentation,可以最大化你訓練數據的利用程度。如下是一個實例的配置文件,關於模型選擇訓練參數配置的細節大家可以參考如下配置文件:

#SSD with Mobilenet v1, configured for traffic Dataset.
# Users should configure the fine_tune_checkpoint field in the train config as
# well as the label_map_path and input_path fields in the train_input_reader and
# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that
# should be configured.
# TPU-compatible

model {
ssd {
num_classes: 1
box_coder {
faster_rcnn_box_coder {
y_scale: 10.0
x_scale: 10.0
height_scale: 5.0
width_scale: 5.0
}
}
matcher {
argmax_matcher {
matched_threshold: 0.5
unmatched_threshold: 0.5
ignore_thresholds: false
negatives_lower_than_unmatched: true
force_match_for_each_row: true
}
}
similarity_calculator {
iou_similarity {
}
}
anchor_generator {
ssd_anchor_generator {
num_layers: 6
#min_scale: 0.2
#max_scale: 0.95
#aspect_ratios: 1.0
#aspect_ratios: 2.0
#aspect_ratios: 0.5
#aspect_ratios: 3.0
#aspect_ratios: 0.3333
min_scale: 0.2
max_scale: 0.95
aspect_ratios: 1.0
aspect_ratios: 2.0
aspect_ratios: 0.5
aspect_ratios: 3.0
aspect_ratios: 0.3333
}
}
image_resizer {
fixed_shape_resizer {
height: 224
width: 224
}
}
box_predictor {
convolutional_box_predictor {
min_depth: 0
max_depth: 0
num_layers_before_predictor: 0
use_dropout: false
dropout_keep_probability: 0.8
kernel_size: 1
box_code_size: 4
apply_sigmoid_to_scores: false
conv_hyperparams {
activation: RELU_6,
regularizer {
l2_regularizer {
weight: 0.00004
}
}
initializer {
truncated_normal_initializer {
stddev: 0.03
mean: 0.0
}
}
batch_norm {
train: true,
scale: true,
center: true,
decay: 0.9997,
epsilon: 0.001,
}
}
}
}
feature_extractor {
type: ssd_mobilenet_v1
min_depth: 16
depth_multiplier: 0.5
conv_hyperparams {
activation: RELU_6,
regularizer {
l2_regularizer {
weight: 0.00004
}
}
initializer {
truncated_normal_initializer {
stddev: 0.03
mean: 0.0
}
}
batch_norm {
train: true,
scale: true,
center: true,
decay: 0.9997,
epsilon: 0.001,
}
}
}
loss {
localization_loss {
weighted_smooth_l1 {
}
}
classification_loss {
weighted_sigmoid {
}
}
hard_example_miner {
num_hard_examples: 3000
iou_threshold: 0.99
loss_type: BOTH
max_negatives_per_positive: 3
min_negatives_per_image: 3
}
classification_weight: 1.0
localization_weight: 1.0
}
normalize_loss_by_num_matches: true
post_processing {
batch_non_max_suppression {
score_threshold: 1e-8
iou_threshold: 0.6
max_detections_per_class: 50
max_total_detections: 50
}
score_converter: SIGMOID
}
}
}

train_config: {
batch_size: 32
num_batch_queue_threads: 1
batch_queue_capacity: 2000
optimizer {
rms_prop_optimizer: {
learning_rate: {
exponential_decay_learning_rate {
initial_learning_rate: 0.001
decay_steps: 18750
decay_factor: 0.5
}
}
momentum_optimizer_value: 0.9
decay: 0.9
epsilon: 1.0
}
}
#fine_tune_checkpoint: "/home/shuai/models/ssd_mobilenet_v1_0.75_depth_300x300_coco14_sync_2018_07_03/model.ckpt"
fine_tune_checkpoint: "/home/data/zhangxd/train_face_0.5_224x224/model.ckpt-200058"
from_detection_checkpoint: true
load_all_detection_checkpoint_vars: false
# Note: The below line limits the training process to 200K steps, which we
# empirically found to be sufficient enough to train the pets dataset. This
# effectively bypasses the learning rate schedule (the learning rate will
# never decay). Remove the below line to train indefinitely.
num_steps: 200000
data_augmentation_options {
random_adjust_brightness{
}
}
data_augmentation_options {
random_image_scale {
}
}
data_augmentation_options {
random_jitter_boxes {
}
}
data_augmentation_options {
ssd_random_crop {
}
}
data_augmentation_options {
random_horizontal_flip {
}
}
max_number_of_boxes: 50
unpad_groundtruth_tensors: false
}

train_input_reader: {
tf_record_input_reader {
input_path: "/home/data/zhangxd/face_det_data/face_det_train.tfrecords"
}
label_map_path: "/home/data/zhangxd/face_det_data/face_label_map.pbtxt"
num_readers: 1
prefetch_size: 256
read_block_length: 32

}

eval_config: {
num_examples: 1000
# Note: The below line limits the evaluation process to 10 evaluations.
# Remove the below line to evaluate indefinitely.
visualization_export_dir: /home/data/zhangxd/visualization
visualize_groundtruth_boxes: True
min_score_threshold: 0.5
num_visualizations: 100
include_metrics_per_category: True
}

eval_input_reader: {
tf_record_input_reader {
input_path: "/home/data/zhangxd/face_det_data/face_det_test.tfrecords"
}
label_map_path: "/home/data/zhangxd/face_det_data/face_label_map.pbtxt"
shuffle: false
num_readers: 1
prefetch_size: 32
read_block_length: 16
}


  • 最後

在進行完上述的準備工作以後,你就可以按照官方提供的訓練腳本進行訓練了。至此,你已經成功設計並訓練了一個可在大多數嵌入式設備中近實時運行的檢測器。大家如果對嵌入式CNN部署感興趣的,也可以參考我專欄裏之前的文章,嘗試去部署你訓練好的輕型Mobilenet-SSD。鏈接我會放在參考中,同時也歡迎大家留言討論、關注專欄。謝謝大家!

  • 參考

糖心他爸:實戰MNN之Mobilenet SSD部署(含源碼)?

zhuanlan.zhihu.com圖標糖心他爸:詳解MNN的tf-MobilenetSSD-cpp部署流程?

zhuanlan.zhihu.com
圖標
糖心他爸:使用NNAPI加速android-tflite的Mobilenet分類器?

zhuanlan.zhihu.com
圖標
糖心他爸:使用TVM在android中進行Mobilenet SSD部署?

zhuanlan.zhihu.com
圖標

推薦閱讀:

相關文章