A very brief introduction of what we have been working on at the AI Labs on "music AI" (specifically, automatic music composition/generation) and "speech AI" (specifically, Mandarin ASR).
1 of 21
Downloaded 24 times
More Related Content
20190625 Research at Taiwan AI Labs: Music and Speech AI
3. Music AI Research (in the Old Days)
? Algorithmic composition
?MIDI in, MIDI out
? Limitations
?Lack diversity and expressivity
?Some music genres are not written language
3
NLU
NLG
(Music encoding used by openAIs MuseNet model)
4. Music AI Research (at the Taiwan AILabs)
4
? audio in, audio out
? audio audio: source separation (SS) [denoising]
? audio score: music transcription (MT) [ASR]
? score score: composition [NLG]
? score audio: synthesis [TTS]
5. Note: A Song is Composed of Multiple Tracks
5
I Have Nothing ~Whitney Houston
(ݺߣ made by Hao-Min Liu)
6. Step 1: Source Separation
? Demix the music signal
? input: audio mixture
?output: individual tracks
6
(image from the Internet)
20. Sequence to Sequence ASR
? Advantages
1. Optimize the word accuracy directly
2. Downsize the model
3. Don't need to be dependent on lexicon, which is good for
some languages (e.g., ̨Z)
? Disadvantages
1. Need more data than traditional model (e.g., Kaldi) to get
comparable results
2.^yᘌ~Ą (e.g., yijЩض~ęC)
20