狠狠撸

ChainerでDeep Learningを
試す為に必要なこと
株式会社レトリバ
西鳥羽二郎

自己紹介
? 西鳥羽二郎
? ID: jnishi
? 略歴
? 東京大学情報理工学系研究科コンピュータ科学専攻修士課程卒業
? 2006年 Preferred Infrastructureに創業メンバーとして参画
? プロトタイプ開発
? プロフェッショナルサービス?サポートサービス
? 研究開発
? 2016年レトリバ創業
? 取締役?リサーチャーとして研究開発に従事
? 主に音声認識や自然言語処理を担当

Deep Learning(DL)への取り組み
? 2015年 3月頃に音声認識でDLが使えそうなことを知る
? 2015年 6月からChainerを用いて音声認識エンジンの開発開始
? 最適化関数 NesterovAG
? 活性化関数 ClippedReLU
? 損失関数 Connectionist Temporal Classification
Torch7: Baiduが
2016年1月に公開
TensorFlow:
2016年2月に搭載
Chainer: 2015年
10月に搭載

Deep Learningの手法をためそう!

Deep Learningの手法をためそう!
tterance at atime with better results than evaluating with alargebatch.
ples of varying length posesomealgorithmic challenges. Onepossible solution is
opagation through time [68], so that all examples have the same sequence length
2]. However, this can inhibit the ability to learn longer term dependencies. Other
that presenting examples in order of dif?culty can accelerate online learning [6,
theme in many sequence learning problems including machine translation and
n isthat longer examples tend to bemorechallenging [11].
ction that weuseimplicitly depends on thelength of theutterance,
L(x, y; ?) = ? log
X
`2 Align(x,y)
TY
t
pctc(`t |x; ?). (9)
is the set of all possible alignments of the characters of the transcription y to
under theCTC operator. In equation 9, theinner term isaproduct over time-steps
which shrinks with the length of the sequence since pctc(`t |x; ?) < 1. This moti-
OK実装だ!

見るべきところ
? BaiduのDeep Specch2

? Googleの音声認識

? Microsoftの画像認識

Deep Learningのシステムを実装する際
? きちんと処理を理解するには数式を理解することが大事
? 実際に処理を記述する際には構造を図示したグラフを見ること
が多い

ニューラルネットワークの基本単位
x1
x2
xn
…
n個の入力 1個の出力
w1
w2
wn
u = w1x1 + w2x2 + …+ wnxn
ユニット

ニューラルネットワークの基本
x1
x2
xn
…
n個の入力 m個の出力
…
入力を同じとするユニットをたくさん並べる

ニューラルネットワーク(全結合)
x1
x2
xn
…
n個の入力 m個の入力
…
入
力
Linear

活性化関数
x1
x2
xn
…
u
出力にスケーリングや制限を
かける処理を行うことがある
活性化関数の例
? ReLU: 負の時は0にする
? sigmoid: 大小関係を維したまま0?1にする
? tanh: 大小関係を維持したまま-1?1にする

活性化関数も同様に表せる
入
力
Linear
ReLU

ネットワークとして示す
? ニューラルネットワーク以下のものをコンポーネントとする
ネットワークで表すことができる
? 入力
? Linear
? 活性化関数
? 損失関数
? Convolution層
? 正則化関数
? etc.

ネットワークの読み方

入力

Convolution層を
3段つなげる

RNNを7層

BatchNormalization
を正則化として用いる

Linearを1層用いる

CTCという
損失関数を用いる

Deep Learningを行う際に必要なこと
? forward処理
? back propagation
? 行列計算
? 微分計算
? 処理に用いる関数
? 入出力の関係
? 入力の大きさ
? 出力の大きさ
フレームワークが実行
フレームワークを用いて
実装する時に考えること

Chainerのexampleコード
class MLP(Chain):
def __init__(self, n_units=100, n_out=10):
super(MLP, self).__init__(
l1=L.Linear(None, n_units),
l3=L.Linear(None, n_out),
)
def __call__(self, x):
h1 = F.relu(self.l1(x))
h2 = F.relu(self.l2(h1))
y = self.l3(h2)
return y
Linear
MNIST画像
Linear
Linear
784(28x28)
100
100
0?９の判定
10

layer1
class MLP(Chain):
)
y = self.l3(h2)
return y
Linear
MNIST画像
Linear
Linear
784(28x28)
100
100
0?９の判定
10
l1

layer2
class MLP(Chain):
)
y = self.l3(h2)
return y
Linear
MNIST画像
Linear
Linear
784(28x28)
100
100
0?９の判定
10
l1
l2

layer3
class MLP(Chain):
)
y = self.l3(h2)
return y
Linear
MNIST画像
Linear
Linear
784(28x28)
100
100
0?９の判定
10
l1
l2
l3

forward処理
class MLP(Chain):
)
y = self.l3(h2)
return y
Linear
MNIST画像
Linear
Linear
x
h1
h2
0?９の判定
y
l1
l2
l3

まとめ
? Deep Learningを行う際にはネットワーク構造が大事
? 構造が決まれば後はフレームワークが処理を行う
? Chainerの場合、MNISTのtrain_example.pyの例がシンプル
? Chainerに限らない

狠狠撸

ChainerでDeep Learningを試すために必要なこと

Recommended

More Related Content

What's hot (20)

Similar to ChainerでDeep Learningを試すために必要なこと (20)

More from Retrieva inc. (16)

Recently uploaded (8)

ChainerでDeep Learningを試すために必要なこと