狠狠撸

Attention-Based Recurrent Neural Network Models for
Joint Intent Detection and Slot Filling
東京?学?学院?学系研究科
技術経営戦略学専攻
松尾研究室
?野峻典

モチベーション：チャットボットを作りたい
? 客「ピザの注?がしたいです。」
– → ? …{ Intent: ピザ注? }
? ~~ピザ注? 開始~~ {必要Entities: “種類”, “場所”, “時間”}
? ?「ピザの種類, 配達場所, 時間を教えてください。」
? 客「種類はマルゲリータピザで、東京都OO-XX-OOにお願い。」
– → ? …{Entities: {種類: “マルゲリータピザ”, 場所: “東京都OO-XX-OO”, 時間: “”}}
? ?「マルゲリータピザで, 東京都OO-XX-OOですね。時間を教えてくださ
い。」
? 客「時間, 19時半で。」
– → ? …{Entities: {種類: “マルゲリータピザ”, 場所: “東京都OO-XX-OO”, 時間:
“19:30”}}
? ?「マルゲリータピザで, 東京都OO-XX-OO, 19時で注?を受け付けまし
た。」
? 必要なこと: テキスト?のIntent理解+各単語に対応するentity labelの理解.
2

書誌情報
3
? 論?名：“Attention-Based Recurrent Neural Network Models for Joint
Intent Detection and Slot Filling”
– https://arxiv.org/pdf/1609.01454.pdf
? 著者：Bing Liu , Ian Lane
– Carnegie Mellon University
? 公開?：6 Sep 2016
? Accepted at Interspeech 2016
? ※ 特に断りが無い場合は, 上記論?, 狠狠撸, Videoから引?

Abstract
? 本研究では, intent detectionとslot fillingを?うattention-based NN モデ
ルを提案.
? 従来の機械翻訳や発話理解の会話システムと異なり, slot fillingでは?字の
alignment(順番)が明確.
– その順番に関する情報をencoder-decoderフレームに組み込む作戦を?々考えた.
? attentionに関する情報は, intentの分類と, slotのラベルの予測に活?する.
? 結果: Intent分類のerror rateと, slot fillingのF1 scoreにおいて, ATISタス
クでSotA達成.
– Intent分類では, 0.56%のエラー改善, slot fillingでは, 0.23%の改善を得た.
? キーワード: Spoken Language Understanding, Slot Filling, Intent
Detection, Recurrent Neural Network, Attention Model
4

Introduction
? 2つのタイプのSequenceモデルを説明する.
– ① Intent detection & slot filling (from spoken language understanding)
– ② Attention-based encoder-decoder(from machine translation/speech
recognition)
5

Introduction
? ① Spoken Language Understanding(SLU) の２つの重要なタスク
– Intent detection/classification: 話者のintent分類(特定)
? 会話の意味分類問題.
? SVM, DNNで解かれてきた.
– Slot Filling: 意味的に重要な構成要素抽出
? Sequence labeling task.
? Maximum entropy Markov models, conditional random fields, RNNsなどで解かれてきた.
– 近年, intent detectionとslot fillingの2タスクを１つのモデルで?うjoint modelが提案
された.
6

Introduction
? ② 機械翻訳, speech認識におけるattention構造を持つEncoder-decoderモ
デル
– Input sequenceをベクトル表現にencodeし, それをdecodeしてoutput sequenceを?
成 (sequence learning)
– “Neural machine translation by jointly learning to align and translate,” (D.
Bahdanau, K. Cho, and Y. Bengio, ) [12]
? Encoder-decoderモデルがattention構造により, align(語順)とdecodeを同時に学習できる
ものが提案されている
7

Introduction
? 以上, sequenceモデルの強さをまとめると,
– Attention-based encoder-decoderモデルは, alignment情報が無い中で, 異なる?さ
のsequenceをmappingすることができる. (②)
– Slot-fillingでは, alignment情報は明?的に与えられ, alignment-based RNNモデルが
機能する. (①)
? 本論?では, 上記①②を組み合わせることを考える.
– Slot-fillingにおけるalignment情報が, encoder-decoderモデルでどう活?できるか
– Encoder-decoderモデルにおけるattention構造が, slot-fillingでどう活?できるか
– そして, そうした活?をした上で, slot-fillingとintent-detectionのjointモデルをいかに
設計するか
8

Background > RNN for Slot Filling
? Slot filling
– Input sequence X → Label sequence Y のマッピングを?う関数fを学習
– xとyの?さは同じで, alignmentは明確.
? RNNでは, slot fillingの各タイムステップごとに, 1単語を読み, 対応する 1
つのslot labelを返す.
– ここでは, その??単語と, これまで出?されたsequenceから全ての情報?いて, slot
labelの推測がされる.
– 数式にすると, 以下の尤度最?化するようなθを学習している.
? x: input word sequence, y1
t-1: 1番?からt-1番?までのoutput label sequence
– 推論時は??xに対して以下を満たすようなy^を?つける.
9

Background > RNN Encoder-Decoder
? Encoder:
– input sequence (x1, …, xT) → vector c
– vector cに??sequence全体の意味がencodeされる.
? Decoder:
– Vector cからtarget sequence?成.
– Decoderでは, output sequenceの確率を以下で定義.
? 前ページで?た sequence labelingのためのRNNと異なり, encoder-
decoderモデルでは, 異なる?さのsequence to sequenceのmappingがで
き, 明?的な?出?間のalignment情報は無い.
? → “Neural machine translation by jointly learning to align and translate,” (D.
Bahdanau, K. Cho, and Y. Bengio, ) [12] では, encoder-decoderモデルがsoftな
alignmentを学習し, 同時にdecodeできるようなattention構造を提案.
10

Proposed Methods
? 以下２つのアプローチを紹介
– ①Alignment情報を, slot-fillingとintent-detectionタスクを遂?するためにencoder-
decoder構造に統合するアプローチ
– ②Encoder-decoder構造におけるattention構造を, alignment-based RNNモデルに適
?するアプローチ
11

Proposed Methods > Encoder-Decoder Model with Aligned
Inputs ①
? ①Alignment情報を, slot-fillingとintent-detectionタスクを遂?するため
にencoder-decoder構造に統合するアプローチ
12

Inputs ①
? Spot filling: input words x=(x1,…,xT) → label y=(y1, …, yT)
? Encoderにはbidirectional RNNを?いた.
– Forward, Backwardの両?の向きで??sequenceを読む.
– Forward: hidden state fhi を各タイムステップで?成.
– Backward: 後ろから読み, hidden states (bhT,…,bh1) を?成.
– 各セルの最終的なhidden stateの値 hiは, fhiとbhiをconcatし得る. (i.e. hi=[fhi, bhi])
? RNNのユニットにはLSTMを?いた.
? Backward encoder RNN の最後のstateを, decoderの最初のhidden state
とする[12]
– Forward, backward encoder RNNの最後のstateが?全体の情報を持つ.
13

Inputs ①
? Decoder はunidirectional RNN.
– 各タイムステップで, decoder state siは, 前のsi-1, label yi-1, aligned encoder hidden
state hi, context vector ciから計算される. (hiは, 各decoding stepで明?的なaligned
inputに.)
– Context vector ciは, encoder states h=(h1, …, hT)の重み付けされた和で計算される.
? ???章の中でdecoderが注意(attention)すべき箇所を?唆してる
? αは, 以下で計算される. gは, feed-forward neural network.
14

Inputs ①
? Intent detectionとslot fillingを共に?うjoint モデルにするため, intent
detection?のdecoderを追加する. (Fig2のアーキテクチャの右上のセル)
– encoder部分はslot-fillingと共有.
– 単?の出?出すだけなので, alignment情報は要らない.
– Slot-fillingのときの初期の隠れ値s0(?全体encodeしてる)と, context vector cintent(?
??章の中でdecoderが注意すべき箇所を?唆してる)を??に持つ関数.
? 訓練の際は, intent detectionのdecoderと, slot-fillingのdecoderの両?か
らの誤差が伝播される.
15

Proposed Methods > Attention-Based RNN Model ②
? ②Encoder-decoder構造におけるattention構造を, alignment-based RNN
モデルに適?するアプローチ.
– Bidirectional RNN(BiRNN)を?いたsequence labeling.
– 各stepで, aligned hidden state hiを活?するだけでなく, context vector ciの利?を
してみる.
? Hidden state は?全体の意味を持つが, 遠くの単語の意味は徐々に忘れてしまうため, そうし
た情報をciで補えるかみる.
16

Proposed Methods > Attention-Based RNN Model ②
? BiRNNは???章をforward/backward両?向から読む. RNN unitには同
じくLSTMセルを?いる.
? Slot label dependencies は, forward RNNに組み込まれてる.
? Encoder-decoder構造のencoderと同様に, hidden state hiは, fhiとbhiを
concatenateしたもの.
– 各hiは??sequence全体の情報を含み, 特に各i番?の単語周りにfocusしてる.
? hiは, context vector ciと組み合わされ, label分類を?う. (ciは, encoder-
decoder構造のとき同様, h=(h1,…,hT)を重み付きで?し合わせて算出.)
? Intent detectionは, ↑で計算したhを再利?して?う.
– Attention構造を使わない場合は, mean-poolingをhに対して?い, その後logistic回帰
を?い分類.
– Attention構造をつかう場合は, hidden state hの重み付け平均をすることで計算.
17

Proposed Methods
? Aligned inputsを活?したAttention-based encoder-decoderモデル(①)
とくらべて, attention-based RNNモデル(②)はより計算効率が良い.
– モデルの訓練時, encoder-decoder slot filling model(①)は, ??sequenceを2度読む
のに対して, attention-based RNN model(②)は?度しか読まない.
18

Experiments > Data
? ATIS(Airline Travel Information Systems)データセットの[6,7,9,19]にお
けるセットアップ[6,7,9,19]で.
– Training set: 4978 utterances from ATIS-2, ATIS-3 corpora
– Test set: 893 utterances from ATIS-3 NOV93, DEC94
– Slot labelsの種類: 127, intent typeの種類: 18
– 評価
? Slot filling: F1 score.
? Intent detection: classification error rate.
? さらに[9,20]で使われている追加のATISも得た
– 5138 utterances
– Slot labelの種類: 110, intent typeの種類: 21
– [9,20]同様, 10-fold cross validationを?った.
19

Experiments > Training Procedure
? LSTMの実装は[21]に沿う.
? LSTMセルにおけるユニット数を128に設定.
? Forget gate biasは1にセット. [22]
? LSTMの1層だけ使?. (LSTM層を重ねてより深いモデルを作るのはfuture
workで.)
? サイズ128のWord embeddingは, ランダムに初期化され, batch-size16の
ミニバッチ訓練の中でfine-tunedされる.
? Non-recurrent connectionsには訓練中 Dropout rate 0.5を適?.
? Gradient clippingのためのmaximum normは, 5に設定.
? OptimizationにはAdamを使?.
20

Experiments > Independent Training Model Results: Slot
Filling
? 今回の提案モデルを, Slot fillingのみで独?に訓練した場合.
? 上2つ(a)(b)みると,やはりalignment 情報は今回のタスクに必要そう.
? (b)(c)みると, attentionが微妙に精度に貢献していることがわかる.
– Attentionは基本的には?全体に均等に分散してたが, ?部, attentionが精度上げている
ようなケースもあった.
– 以下, noon部分のslotを予測するときのattention. (暗いところ程attention強い.) flight,
cleveland, dallas,に注?して, slot label ”B-depart_time.period_of_day”を導けてる.
21

Filling
? 今回の提案モデルを, Slot fillingのみで独?に訓練した場合.
? 下２つは, section3.2のやつ.
? attentionつけることでの精度向上はわずか.
– → ATISデータセットレベルの?さのテキストでは, attentionの恩恵無くとも hidden
state hiがslot labelingに必要な?全体の情報をencodeできていそう.
22

Filling
? Slot fillingモデルを以前のアプローチと?較した.
? 今回提案するどちらのモデルも精度以前のものに勝る.
23

Experiments > Independent Training Model Results: Intent
Detection
? Intent classification errorにおける以前のモデルとの?較
– ?差をつけて既存SotAに勝った.
? Attention-based encoder-decoder intent modelがbidirectional RNN
modelに勝った(表の下２つ)
– Encoderから渡されているSequence levelの情報と, decoder RNNに追加された?線形
層(cintent計算してるとこ)の影響かも.
24

Experiments > Joint Model Results
? 2タスクともに?うjointモデルでの精度?較
– Encoder-decoderアーキテクチャはjointにすることでindependentのときより, slot
fillingタスクで0.09%, intent detectionタスクで0.45%改善した.
– Attention-based bidirectional RNNはjointにすることでindependentのときより, slot
filling で0.23%, intent detectionで0.56%改善した.
– → attention-based bidirectional RNNの?が, joint訓練の恩恵?きく受けてる.
? さらに追加のデータで10-fold cross validationしてやる場合も, 提案?法ど
ちらも, 良い精度を出した.
25

Conclusions
? Slot-filling, intent-detectionの2タスクを同時にこなす上で, alignment情
報をattention-based encoder-decoder NNモデルで活?する?法を探索
し, またattention-based bidirectional RNNモデルを提案した.
? ダイアログシステムを作る際に, 2つのモデルを作らずとも, 1つのjointモデ
ルで済む嬉しさ.
? 提案?法は, ATISでstate-of-the-artの精度出した.
26

狠狠撸

Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling

More Related Content

Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling