プログラミング練習: 論文読み 2015, Rethinking the Inception Architecture for Computer Vision

元ネタ: Szegedy, Christian & Vanhoucke, Vincent & Ioffe, Sergey & Shlens, Jon & Wojna, ZB. (2015). Rethinking the Inception Architecture for Computer Vision. . 10.1109/CVPR.2016.308.

Introduction

Inception1を改良して，Inception-v2並びにv3を開発した．

Inception-v3はInception-v2のAuxiliary classifierにbatch-normalizationを追加したモデルと言える.

General Design Principles

Inception2の改良を試みた際に得たいくつかの経験則を挙げている.
1. layerを急に狭める(representational bottleneck)ことを避ける. ネットワークを通して結局は同じ次元に落とし込むにしても，あるlayerの次のlayerに移る時に急に次元がさげてはならない.
2. 高次元のlayerは局所的に処理しやすい. convoluationにおいてtileごとのactivationを増加させると,抽出される特徴がよりdisentangledになる.
3. 低次元な埋め込みによって，空間的なaggregationをほとんどあるいは全くロスなくおこなえる．例えば(3x3)のconvolutionを行う前に1x1のconvolutionでそのlayerへの入力を低次元に埋め込み,より効率的に3x3のconvolutionを行えて，かつ情報のロスはわずかである．
4. ネットワークの幅と深さのバランスをとる．片方を増加させることでNNの性能を向上させることは可能だが，療法を同時に増加させることで計算量の増大を抑えつつ性能を向上させられる．

Factorizing Convolutions with Larger Filter Size

Inception3では1x1のconvolutionで次元削減した上で5x5や3x3のconvolutionを行ったが，他のfactorizing convolutionを試している.

3.1 Factorization into smaller convolutions

5x5 や 7x7のconvolutionは非常に計算のコストが大きいので，より小さいconvolutionを複数重ねることで似たようなconvolutionを実現する.(fig.1)
enter image description here

3.2 Spatial Factorization into Asymmetric Convolutions

nxnのconvolutionを1xnとnx1のconvolutionの2段構えで処理する(fig.2)と計算効率は飛躍的に向上する．実験では，lower layers(inputに近いlayer)でこれを行うとうまく行かないが，12x12や20x20のconvolutionでは非常に効果的であった.

4. Utility of Auxiliary Classifiers

Inception4で複数のclassifierを導入して勾配消失を防ごうとしたが，Lee et al[^2]と対照的に著者は複数のclassifierによる収束の高速化は確認できなかった一方で，classifierが一つであるよりも僅かに分類性能の向上を認めた．著者らは複数のclassifierがregularizerとして働いているのではないかと考えている．

5. Efficient Grid Size Reduction

fig.2の方法でfeature mapを効率的に増やしているらしい
enter image description here
figure 2

7. Model Regularization via Label Smoothing (LSR)

training setにおけるラベル $k \in \{1,...,K\}$ の分布 $u(k)$ を,example $x$ とは関係なく得る(traing setにおける $k$ の出現確率かと思ったがそうでもないらしい)，またsmoothing parameter $\epsilon$ を定義する． ground truth $y$ のexample $x$ に対して，ground truthの分布 $q(k|x)=\delta_{k,y}$ :(ディラックのデルタ)を，
$q'(k|x)=(1-\epsilon) \delta_{k,y} + \epsilon u(k)$ によって書き換える．
例えば $u(k)=1/K$ とすれば
$q'(k)=(1-\epsilon)\delta_{k,y} + \frac{\epsilon}{K}$
である．

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich, Going Deeper with Convolutions, In CVPR, 2014 ↩
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich, Going Deeper with Convolutions, In CVPR, 2014 ↩
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich, Going Deeper with Convolutions, In CVPR, 2014 ↩
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich, Going Deeper with Convolutions, In CVPR, 2014 ↩

プログラミング練習

2017年11月22日水曜日

論文読み 2015, Rethinking the Inception Architecture for Computer Vision