1. Copyright (C) 2019 DeNA Co.,Ltd. All Rights Reserved.Copyright (C) 2019 DeNA Co.,Ltd. All Rights Reserved.
Nov. 24, 2019
Kentaro Tachibana
AI System Dept.
DeNA Co., Ltd.
Parrotron: An End-to-End Speech-to-Speech Conversion Model and its
Applications to Hearing-Impaired Speech and Speech Separation
3. Copyright (C) 2019 DeNA Co.,Ltd. All Rights Reserved.
取り上げる論文
? Parrotron: An End-to-End Speech-to-Speech Conversion Model and its
Applications to Hearing-Impaired Speech and Speech Separation
? この論文の貢献を一言でいうなら
1. 音声から音声への波形直接変換
2. Many-to-one音声変換を高品質なレベルで実現
3. 従来の音声変換以外にも、他の用途に適用し、フレームワークの有用性を証
明
4. Copyright (C) 2019 DeNA Co.,Ltd. All Rights Reserved.
Parrotronの取り組んだタスク
1. Many-to-one 音声変換(Voice normalization)
? あらゆる話者が、どんな環境で話しても、目標話者の話速?アクセント?声
質となるように変換
2. 聴覚障がい者(hearing-impaired)の音声変換
? 目標話者音声に変換することで、音声明瞭化?自然性向上
3. ノイズ除去?音源分離
? 背景ノイズの除去、対象話者の音声だけの抽出
Parrotonは多様な用途に適用可能!
27. Copyright (C) 2019 DeNA Co.,Ltd. All Rights Reserved.
まとめ
? End-to-end音声変換 Parrotronを提案
? 直接、波形-to-波形の変換が可能に!
? 高品質なmany-to-one 音声変換を実現
? ASR multitask学習が有効
? 音声変換以外で、Parrotronフレームの有効性を証明
? 障害者音声の明瞭化?ノイズ除去
28. Copyright (C) 2019 DeNA Co.,Ltd. All Rights Reserved.
参考文献
[Haque+, 18] A. Haque, M. Guo, and P. Verma, “Conditional end-to-end audio transforms,” Proc. Interspeech, 2018.
[Zhang+, 19] J. Zhang, Z. Ling, L.-J. Liu, Y. Jiang, and L.-R. Dai, “Sequence-to- sequence acoustic modeling for voice
conversion,” IEEE Transac- tions on Audio, Speech, and Language Processing, 2019.
[Tanaka+, 18] K. Tanaka, H. Kameoka, T. Kaneko, and N. Hojo, “AttS2S-VC: Sequence-to-sequence voice conversion with
attention and context preservation mechanisms,” arXiv:1811.04076, 2018.
[Bahdanau+, 15] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,”
Proc. ICLR, 2015.
[Chorowski+, 15] J. K. Chorowski, D. Bahdanau, D. Serdyuk, K. Cho, and Y. Bengio,
“Attention-based models for speech recognition,” in Advances in Neural Information Processing Systems, 2015, pp. 577–585.