Monotonic Multihead Attention, Ma, Xutai, et al. "Monotonic Multihead Attention." International Conference on Learning Representations. 2020. review by June-Woo Kim
1 of 43
More Related Content
Editor's Notes
#5: Simultaneous Translation is very useful in many applications.
#34: Offline: Best performance with 3 layers and 2 heads (6)
MMA-H, improves in 1 layer with more heads
MMA-IL: similarly to offline model. Best = 6 layers and heads (24)
Latency, best performance = MMA-IL, 6 layers, 16 heads (96)