An incremental algorithm for transition-based CCG parsingAkira Miyazawa
?
The document presents an incremental algorithm for transition-based CCG parsing developed by Ambati et al., emphasizing the importance of incrementality in applications like machine translation and speech recognition. It compares a baseline non-incremental algorithm with a new 'revinc' algorithm, showcasing significant improvements in connectedness, waiting time, and parsing speed. Overall, the revinc parser achieves better recall and faster processing by utilizing a revealing technique in its parsing strategy.
9. 9/50
ߥåƥ
Хˤ륳ߥåƥy yCOM (x) ϣ֩`ȥȥ
Z(1)
, . . . , Z(M)
ѧǥy y1 (x) , . . . , yM (x)
ʹä
yCOM (x) =
1
M
M
m=1
ym (x) (14.7)
Ȥʤ룮
njgHYäΤ`uƴ_J룮
10. 10/50
Х`u
y؎v h (x) Ȥǥy ym (x) h (x)
ȼӷĤ` m (x) ʹä
ym (x) = h (x) + m (x) (14.8)
ȱȁ룮ΤȤ\`
E (ym (x) ? h (x))
2
= E (m (x))
2
(14.9)
ȤʤΤǣƽ\`ڴ EAV
EAV := E
1
M
M
m=1
(ym (x) ? h (x))
2
=
1
M
M
m=1
E (m (x))
2
(14.10)
룮
11. 11/50
Х`u
һߥåƥˤˤĤƣ`ڴ ECOM
ECOM := E
?
? 1
M
M
m=1
ym ? h (x)
2
?
?
= E
?
? 1
M
M
m=1
m (x)
2
?
? (14.11)
룮٥ȥ (1 1) RM
(1 (x) m (x)) RM
ˌ Cauchy-Schwarz βʽä
M
m=1
m (x)
2
M
M
m=1
(m (x))
2
ECOM EAV
ɤģĤޤꥳߥåƥʹϤ`ڴϣҪ
`ڴκͤʤ
12. 12/50
Х`u
ؤ`ƽ 0 ǟovǤ룬ʤ
E [m (x)] = 0 (14.12)
cov (m (x) , (x)) = E [m (x) (x)] = 0, m = (14.13)
ɤĤʤ 1
ECOM = E
?
? 1
M
M
m=1
m (x)
2
?
?
=
1
M2
E
M
m=1
(m (x))
2
=
1
M
EAV (14.14)
Ȥʤ꣬`ڴ˵͜p룮
1 ͬ褦ʥǥͬ褦Ӗǩ`ѧƤΤ飬ʤȤ
ڴǤʤ
14. 14/50
AdaBoost
{w
(1)
n } {w
(2)
n } {w
(M)
n }
y1(x) y2(x) yM (x)
YM (x) = sign
M
m
mym(x)
for m = 1, . . . , N
Initialize the data weighting coe?cients
w
(1)
n :=
1
N
15. 15/50
AdaBoost
for m = 1, . . . , M
Fit a classi?er ym (x) to the training data by minimizing
Jm :=
N
n=1
w
(m)
n 1{x | ym(x)=tn} (xn) . (14.15)
Evaluate
m := log
1 ? m
m
(14.17)
where
m :=
N
n=1
w
(m)
n 1{x | ym(x)=tn} (xn)
N
n=1
w
(m)
n
. (14.15)
Update the data weighting coe?cients
w
(m+1)
n := w
(m)
n exp m1{x|ym(xn)=tn} (xn) . (14.18)
17. 17/50
ָ`С
Ŀ˂ t {?1, 1}y y R Ȥηe ty z ä
ΤȤָ` E (z) := exp (?z) ȤΤ뤨룮E (z)
t y ͬ (z > 0) ʤ 0 ˽ (z < 0) ʤ
δʂȤ룮
?2 ?1 0 1 2
z
E(z)
ָ`v ȥԩ`` ҥ`v ``
18. 18/50
ָ`С
¤` E С
E :=
N
n=1
exp ?tn
m
=1
1
2
y (xn)
εĤѧƤΤ 1, . . . , m?1 y1 (x) , . . . , ym?1 (x)
Ȥƣm ym (x) v E С룮E
E =
N
n=1
exp ?tn
m?1
=1
1
2
y (xn) ?
1
2
tnmym (xn)
=
N
n=1
w(m)
n exp ?
1
2
tnmym (xn) (14.22)
ȕQ룮¤Τ褦ä
w(m)
n := exp ?tn
m?1
=1
1
2
y (xn) (14.22 )
19. 19/50
ָ`С
ym : x {?1, 1} ˤä줿ǩ`ּϤ Tm
`äƷ줿ǩ`ּϤ Mm Ȥ룮ΤȤ
E = e?m/2
nTm
w(m)
n + em/2
nMm
w(m)
n
= em/2
? e?m/2
N
n=1
w(m)
n 1{x|ym(xn)=tn} (xn)
+ e?m/2
N
n=1
w(m)
n (14.23)
ɤģ ym (x) ˤĤС뤳Ȥ (14.15) С
뤳ȤͬǤ룮
20. 20/50
ָ`С
ޤ m ˤĤ֤ 0 ä
em/2
+ e?m/2
N
n=1
w(m)
n 1{x|ym(xn)=tn} (xn) ? e?m/2
N
n=1
w(m)
n = 0
em
+ 1 =
N
n=1
w(m)
n
N
n=1
w(m)
n 1{x|ym(xn)=tn} (xn)
Ȥʤ룮(14.15) ͬäQФȤˤ
m = log
1 ? m
m
ä룮ϥ르ꥺФ (14.17) ˵Ȥ
21. 21/50
ָ`С
ؤߤ (14.22 ) ˏΤΤ褦˸¤룮
w(m+1)
n = w(m)
n exp ?
1
2
tnmym (xn) (14.24)
tnym (xn) = 1 ? 21{x|ym(x)=tn} (xn) (14.25)
ɤĤȤʹ
w(m+1)
n = w(x)
n exp ?
1
2
m 1 ? 21{x|ym(x)=tn} (xn)
= w(x)
n exp ?
m
2
exp 1{x|ym(x)=tn} (xn) (14.26)
ȕQ룮exp (?m/2) Ϥ٤ƤΥǩ`˹ͨʤΤǟoҕ
(14.18) ä룮
22. 22/50
֩`ƥΤ`v
¤ָ`룮
E [exp (?ty (x))] =
t{?1,1}
exp (?ty (x)) p (t|x) p (x) dx (14.27)
ַʹä y ˤĤʽС룮
?t (y) := exp (?ty (x)) p (t|x) p (x) Ȥ룮ͣ¤Τ褦
룮
t{?1,1}
Dy?t (y) = 0
exp (y (x)) p (t = ?1|x) ? exp (?y (x)) p (t = 1|x) = 0
y (x) =
1
2
log
p (t = 1|x)
p (t = ?1|x)
(14.28)
Ĥޤ AdaBoost εĤ p (t = 1|x) (t = ?1|x) αȤΌν
ƤƤ룮줬KĤ (14.19) ǷvʹäƷ
äɤȤʤäƤ룮
37. 35/50
֦פ
ľ Tt ȹ t ¤פäƤǤľ ({t} , ?)
error-complexity measure Ϥ줾
R (Tt) = R (Tt) + Tt ,
R (({t} , ?)) = R (({t} , ?)) +
Ǥ룮R (Tt) > R(({t} , ?))ʤ
<
R (Tt) ? R (({t} , ?))
Tt ? 1
ɤĤʤפȡäۤ褤xפȡ넿ָ
ˤȤ
g (t ; T) :=
R (Tt) ? R (({t} , ?))
Tt ? 1
ȶ룮
38. 36/50
֦פ
Τ֦פ (weakest link pruning) ȺФ륢르ꥺ
ˤ T0
TJ
= ({t1} , ?) xg{ 0 < < J
ä룮
1: i 0
2: while Ti > 1
3: i min
tV (T i)T i
g t ; Ti
4: T i
arg min
tV (T i)T i
g t ; Ti
5: for t T i
6: Ti
Ti
? Ti
t
7: i i + 1
39. 37/50
_J
KĤˤʤФʤʤȤϣä줿 Ti J
i=0
Фm
Ti
x֤ȤǤ룮ΤηȤƤϽ_J (cross
validation) ʹ룮
Test
Train
Train
Train
Train
Test
Train
Train
...
Train
Train
Test
Train
Train
Train
Train
Test
1 2 K ? 1 K
40. 38/50
_J
혤¤Τ褦ˤʤ룮
1: ǩ` D ʹäƣľ Ti J
i=0
ȥѥ` i
J
i=0
룮
2: ǩ` D D =
K
k=1 Dk ȷָ룮O |Dk| ͬ
ˤʤ褦ˤ룮
3: D(k)
:= DDk ʹľ T(k)i ik
i=0
ȥѥ`
(k)
i
ik
i=0
룮
41. 39/50
_J
4: i :=
ii+1 ˌ Ri
(T) С褦
T(1)
(i) , . . . , T(K)
(i) Ȥˌꤹy y
(1)
i , . . . , y
(K)
i
룮
(k)
i ,
(k)
i+1 ʤ T(k)
() = T(k)i
Ȥʤ뤳
ע⤹룮
5: ޤǤνY¤뤳ȤǤ룮
RCV
Ti
=
1
N
K
k=1 n{n ; (xn,tn)Dk}
tn ? y
(k)
i (xn)
2
6: `һСľ T??
:= arg minT i R Ti
Ȥ룮
42. 40/50
_J
7: ˜` (standard error) SE 룮
SE RCV
Ti
:= s Ti
/
N,
s Ti
:=
1
N
N
n=1
tn ? y
((n))
i (xn)
2
? RCV (Ti)
2
,
(n) =
K
k=1
k1Dk
((xn, tn))
8: ¤ T ФСľ T?
KĤʽYȤ룮
RCV
(T) RCV
(T??
) + 1 SE (T??
)
Υҥ`ꥹƥåʛQ᷽ 1 SE rule ȺФ룮
43. 41/50
_J
Breiman et al. (1984) TABLE 3.3
i Ti R Ti RCV Ti SE Ti
1 31 .17 .30 .03
2?? 23 .19 .27 .03
3 17 .22 .30 .03
4 15 .23 .30 .03
5 14 .24 .31 .03
6? 10 .29 .30 .03
7 9 .32 .41 .04
8 7 .41 .51 .04
9 6 .46 .53 .04
10 5 .53 .61 .04
11 2 .75 .75 .03
12 1 .86 .86 .03
44. 42/50
CART ˤ
Τ˷Q혤ϻ؎ȤۤͬʤΤǣָλʤQ
C1, . . . , CK K 饹˷놖}⤭Ȥ룮 t
ͨäƤǩ` N (t) Ȥ룮ޤΤ Ck
Τ Nk (t) ȱΤȤ
p (t|Ck) =
Nk (t)
Nk
,
p (Ck, t) = p (Ck) p (t|Ck) =
Nk
N
Nk (t)
Nk
=
Nk (t)
N
,
p (t) =
K
k=1
p (Ck, t) =
N (t)
N
Ǥ룮ä_ p (Ck|t) ϴΤΤ褦룮
p (Ck|t) =
p (Ck, t)
p (t)
=
Nk (t)
N (t)
46. 44/50
v ? Ȥȣι t β I (t)
I (t) := ? (p (C1|t) , . . . , p (CK|t))
Ƕ룮
t tL tR ˤĤƣt 餽ǩ`θϤ
pL =
p (tL)
p (t)
, pR =
p (tR)
p (t)
Ȥȣt ˤƻ s ǷָȤβȤΜp٤
?I (s, t) = I (t) ? pRI (tR) ? pLI (tL)
ȱ룮ȤΜp٤ˤʤ褦ʷָФäƤФ褤
47. 45/50
v
`
I (t) = 1 ? max
k
p (Ck|t)
ȥԩ`
I (t) = ?
K
k=1
p (Ck|t) log p (Ck|t)
ָ (Gini index)
I (t) =
K
k=1
p (Ck|t) (1 ? p (Ck|t))
52. 50/50
ο
Bishop, C. M. (2006). Pattern Recognition and Machine
Learning. Springer.
Breiman, L., Friedman, J., Stone, C. J., and Olshen, R. A.
(1984). Classi?cation and regression trees. CRC press.
Freund, Y. and Schapire, R. E. (1996). Experiments with a new
boosting algorithm.
Friedman, J., Hastie, T., Tibshirani, R., et al. (2000). Additive
logistic regression: a statistical view of boosting (with
discussion and a rejoinder by the authors). The annals of
statistics.
Hastie, T., Tibshirani, R., and Friedman, J. (2009). The
Elements of Statistical Learning: data mining, inference and
prediction. Springer, second edition.
Murphy, K. P. (2012). Machine Learning: A Probabilistic
Perspective.
ƽ (2012). ϤƤΥѥ`JR. ɭ.