際際滷

際際滷Share a Scribd company logo
Monotonic Multihead Attention review
Monotonic Multihead Attention review
Monotonic Multihead Attention review
Monotonic Multihead Attention review
Monotonic Multihead Attention review
Monotonic Multihead Attention review
Monotonic Multihead Attention review
Monotonic Multihead Attention review
Monotonic Multihead Attention review
Monotonic Multihead Attention review
Monotonic Multihead Attention review
Monotonic Multihead Attention review
Monotonic Multihead Attention review
Monotonic Multihead Attention review
Monotonic Multihead Attention review
Monotonic Multihead Attention review
Monotonic Multihead Attention review
Monotonic Multihead Attention review
Monotonic Multihead Attention review
Monotonic Multihead Attention review
Monotonic Multihead Attention review
Monotonic Multihead Attention review
Monotonic Multihead Attention review
Monotonic Multihead Attention review
Monotonic Multihead Attention review
Monotonic Multihead Attention review
Monotonic Multihead Attention review
Monotonic Multihead Attention review
Monotonic Multihead Attention review
Monotonic Multihead Attention review
Monotonic Multihead Attention review
Monotonic Multihead Attention review
Monotonic Multihead Attention review
Monotonic Multihead Attention review
Monotonic Multihead Attention review
Monotonic Multihead Attention review
Monotonic Multihead Attention review
Monotonic Multihead Attention review
Monotonic Multihead Attention review
Monotonic Multihead Attention review
Monotonic Multihead Attention review
Monotonic Multihead Attention review
Monotonic Multihead Attention review

More Related Content

Editor's Notes

  • #5: Simultaneous Translation is very useful in many applications.
  • #34: Offline: Best performance with 3 layers and 2 heads (6) MMA-H, improves in 1 layer with more heads MMA-IL: similarly to offline model. Best = 6 layers and heads (24) Latency, best performance = MMA-IL, 6 layers, 16 heads (96)