プログラミング練習: image segmentation

論文読み 2015, 2016, DeepLab v1/v2

元ネタ:
Liang-Chieh Chen and George Papandreou and Iasonas Kokkinos and Kevin Murphy and Alan L Yuille, Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs, ICLR, 2015.

Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, Alan L. Yuille, DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs, arXiv preprint, 2016

“DeepLab”の初出とその直後に出た論文．最新のDeepLabv3は現在のstate-of-the-artである．

DeepLabの特徴を3つ挙げる.
1. poolingによるdownsamplingのかわりにAtrous Convolution(fig.1)を使って，downsamplingのときに得られる一つ一つの特徴量がより広い範囲の入力元を反映することになる.Atrous ConvolutionはDilated Convolutionとも呼ばれる． PyTorchではtorch.nn.Conv2dのdilation引数で設定できる.
enter image description here figure 1. from vdumoulin/conv_arithmetic
2. 画面いっぱいに写っていようと画面の片隅に写っていようと猫は猫であるように，image segmentationではscale invarianceを考慮しなければならない．著者はAtrous Convolutionのdilationを様々に設定することでこれに対処している(fig.2). 著者はこの技法を”atrous spatial pyramid pooling”(ASPP)と呼んでいる.
enter image description here
figure 2, from Liang-Chieh et al. 2016
3. bilinaer upsamplingによってfeature mapを引き伸ばすのだが，その直後にfully-connected Conditional Random Field (CRF)を用いる． CRFはdeep-learning以前のimage segmentationでよく用いられてきた技法である.画像の隣り合った画素のペアを考えて，その2つが同じobjectに入るか否かを計算していくのがUnary(Basic) CRFである(fig.3a). Fully Connecetd CRFでは文字通りある画素とその他の任意の画素のペアが同じobjectに入るか否かを考慮する(fig.3b)．当然Fully Connected CRFはそのままだと極めて計算量が大きいが，Krahenbuhl and Koltun, 2011．は平均場近似を導入して高速化を行い，exhaustiveな計算に近い結果を得た(fig.4).

enter image description here
figure 3. from https://www.cs.auckland.ac.nz/courses/compsci708s1c/lectures/glect-html/topic4c708fsc.htm

figure 4, from Krahenbuhl and Koltun, 2011

モデル概観, from Liang-Chieh et al. 2015

似たアーキテクチャで注目されているのにZheng et al. Conditional Random Fields as Recurrent Neural Networks がある．

(詳しい議論はDeepLabv3の論文読みで)

論文読み 2015, SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation

元ネタ: Vijay Badrinarayanan, Alex Kendall, Roberto Cipolla,
“SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation.” PAMI, 2017.
(arxivで2015なのにPAMIで2017ってどういうことなの・・・)

自動運転のAIのために開発されたimage segmentation技術．
構造はU-Netによく似ている(fig.1)が，feature mapの引き伸ばしをmax-unpooling( PyTorch)によって行うことでモデルを単純化し，リアルタイムな処理に向いたアルゴリズムになっている
figure 1, SegNetウェブサイトより

論文読み 2015, U-Net: Convolutional Networks for Biomedical Image Segmentation

元ネタ: Ronneberger, Olaf; Fischer, Philipp; Brox, Thomas,
U-Net: Convolutional Networks for Biomedical Image Segmentation. Medical Image Computing and Computer-Assisted Intervention (MICCAI), Springer, LNCS, Vol.9351: 234–241, 2015

細胞や臓器といった生物学の画像に特化したimage segmentationの技法．
FCNがfeature mapの縦横を急激に縮めたり引き伸ばすのに対して，U-Netでは縮めるのは1/2倍づつ，引き伸ばすのも2倍づつとしている．またfeature mapの縦横をa倍するたびにfeature mapの数を1/a倍している. fig.1から見て取れるように，U-Netを図示するとほとんど左右対称になっていて，左側の，特徴を抽出しつつ画像を圧縮していく部分をencoder,特徴量に対応したsegmentationを保持しながら画像を引き伸ばしていく部分をdecoderという． Encoderで得られた特徴量は縦横比が一致するように切り抜かれてから対応するfeature volumeと結合し，さらにupsamplingされていく．引き伸ばしには”up-convolution”(おそらくFCNと同じヤツ)を使う．
enter image description here

生物学的画像の特徴として，かなりグニャグニャ歪めてもその実体(細胞・臓器)はクラスとして不変だから，そのようなdata augmentationを工夫している．

わかりやすい実装, ZijunDeng

2017年11月27日月曜日

論文読み 2015, 2016, DeepLab v1/v2

論文読み 2015, SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation

論文読み 2015, U-Net: Convolutional Networks for Biomedical Image Segmentation