狠狠撸shows by User: jyamagis / http://www.slideshare.net/images/logo.gif 狠狠撸shows by User: jyamagis / Sat, 05 Nov 2022 14:24:02 GMT 狠狠撸Share feed for 狠狠撸shows by User: jyamagis DDS: A new device-degraded speech dataset for speech enhancement /slideshow/dds-a-new-devicedegraded-speech-dataset-for-speech-enhancement/254014360 is2022-dds-221105142402-4234006e
A large and growing amount of speech content in real-life scenarios is being recorded on consumer-grade devices in uncontrolled environments, resulting in degraded speech quality. Transforming such low-quality device-degraded speech into high-quality speech is a goal of speech enhancement (SE). This paper introduces a new speech dataset, DDS, to facilitate the research on SE. DDS provides aligned parallel recordings of high-quality speech (recorded in professional studios) and a number of versions of low-quality speech, producing approximately 2,000 hours speech data. The DDS dataset covers 27 realistic recording conditions by combining diverse acoustic environments and microphone devices, and each version of a condition consists of multiple recordings from six microphone positions to simulate different noise and reverberation levels. We also test several SE baseline systems on the DDS dataset and show the impact of recording diversity on performance. Paper: https://arxiv.org/abs/2109.07931]]>

A large and growing amount of speech content in real-life scenarios is being recorded on consumer-grade devices in uncontrolled environments, resulting in degraded speech quality. Transforming such low-quality device-degraded speech into high-quality speech is a goal of speech enhancement (SE). This paper introduces a new speech dataset, DDS, to facilitate the research on SE. DDS provides aligned parallel recordings of high-quality speech (recorded in professional studios) and a number of versions of low-quality speech, producing approximately 2,000 hours speech data. The DDS dataset covers 27 realistic recording conditions by combining diverse acoustic environments and microphone devices, and each version of a condition consists of multiple recordings from six microphone positions to simulate different noise and reverberation levels. We also test several SE baseline systems on the DDS dataset and show the impact of recording diversity on performance. Paper: https://arxiv.org/abs/2109.07931]]>
Sat, 05 Nov 2022 14:24:02 GMT /slideshow/dds-a-new-devicedegraded-speech-dataset-for-speech-enhancement/254014360 jyamagis@slideshare.net(jyamagis) DDS: A new device-degraded speech dataset for speech enhancement jyamagis A large and growing amount of speech content in real-life scenarios is being recorded on consumer-grade devices in uncontrolled environments, resulting in degraded speech quality. Transforming such low-quality device-degraded speech into high-quality speech is a goal of speech enhancement (SE). This paper introduces a new speech dataset, DDS, to facilitate the research on SE. DDS provides aligned parallel recordings of high-quality speech (recorded in professional studios) and a number of versions of low-quality speech, producing approximately 2,000 hours speech data. The DDS dataset covers 27 realistic recording conditions by combining diverse acoustic environments and microphone devices, and each version of a condition consists of multiple recordings from six microphone positions to simulate different noise and reverberation levels. We also test several SE baseline systems on the DDS dataset and show the impact of recording diversity on performance. Paper: https://arxiv.org/abs/2109.07931 <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/is2022-dds-221105142402-4234006e-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> A large and growing amount of speech content in real-life scenarios is being recorded on consumer-grade devices in uncontrolled environments, resulting in degraded speech quality. Transforming such low-quality device-degraded speech into high-quality speech is a goal of speech enhancement (SE). This paper introduces a new speech dataset, DDS, to facilitate the research on SE. DDS provides aligned parallel recordings of high-quality speech (recorded in professional studios) and a number of versions of low-quality speech, producing approximately 2,000 hours speech data. The DDS dataset covers 27 realistic recording conditions by combining diverse acoustic environments and microphone devices, and each version of a condition consists of multiple recordings from six microphone positions to simulate different noise and reverberation levels. We also test several SE baseline systems on the DDS dataset and show the impact of recording diversity on performance. Paper: https://arxiv.org/abs/2109.07931
DDS: A new device-degraded speech dataset for speech enhancement from Yamagishi Laboratory, National Institute of Informatics, Japan
]]>
11 0 https://cdn.slidesharecdn.com/ss_thumbnails/is2022-dds-221105142402-4234006e-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
The VoiceMOS Challenge 2022 /slideshow/the-voicemos-challenge-2022-252769758/252769758 voicemosnovideo-220831030529-3b121951
Presentation for Interspeech 2022: "The VoiceMOS Challenge 2022" Presenter: Dr. Erica Cooper, National Institute of Informatics Preprint: https://arxiv.org/abs/2203.11389 Video: https://youtu.be/99ZQ-SLUvKE Challenge website: https://voicemos-challenge-2022.github.io Thu-SS-OS-9-5 We present the first edition of the VoiceMOS Challenge, a scientific event that aims to promote the study of automatic prediction of the mean opinion score (MOS) of synthetic speech. This challenge drew 22 participating teams from academia and industry who tried a variety of approaches to tackle the problem of predicting human ratings of synthesized speech. The listening test data for the main track of the challenge consisted of samples from 187 different text-to-speech and voice conversion systems spanning over a decade of research, and the out-of-domain track consisted of data from more recent systems rated in a separate listening test. Results of the challenge show the effectiveness of fine-tuning self-supervised speech models for the MOS prediction task, as well as the difficulty of predicting MOS ratings for unseen speakers and listeners, and for unseen systems in the out-of-domain setting.]]>

Presentation for Interspeech 2022: "The VoiceMOS Challenge 2022" Presenter: Dr. Erica Cooper, National Institute of Informatics Preprint: https://arxiv.org/abs/2203.11389 Video: https://youtu.be/99ZQ-SLUvKE Challenge website: https://voicemos-challenge-2022.github.io Thu-SS-OS-9-5 We present the first edition of the VoiceMOS Challenge, a scientific event that aims to promote the study of automatic prediction of the mean opinion score (MOS) of synthetic speech. This challenge drew 22 participating teams from academia and industry who tried a variety of approaches to tackle the problem of predicting human ratings of synthesized speech. The listening test data for the main track of the challenge consisted of samples from 187 different text-to-speech and voice conversion systems spanning over a decade of research, and the out-of-domain track consisted of data from more recent systems rated in a separate listening test. Results of the challenge show the effectiveness of fine-tuning self-supervised speech models for the MOS prediction task, as well as the difficulty of predicting MOS ratings for unseen speakers and listeners, and for unseen systems in the out-of-domain setting.]]>
Wed, 31 Aug 2022 03:05:29 GMT /slideshow/the-voicemos-challenge-2022-252769758/252769758 jyamagis@slideshare.net(jyamagis) The VoiceMOS Challenge 2022 jyamagis Presentation for Interspeech 2022: "The VoiceMOS Challenge 2022" Presenter: Dr. Erica Cooper, National Institute of Informatics Preprint: https://arxiv.org/abs/2203.11389 Video: https://youtu.be/99ZQ-SLUvKE Challenge website: https://voicemos-challenge-2022.github.io Thu-SS-OS-9-5 We present the first edition of the VoiceMOS Challenge, a scientific event that aims to promote the study of automatic prediction of the mean opinion score (MOS) of synthetic speech. This challenge drew 22 participating teams from academia and industry who tried a variety of approaches to tackle the problem of predicting human ratings of synthesized speech. The listening test data for the main track of the challenge consisted of samples from 187 different text-to-speech and voice conversion systems spanning over a decade of research, and the out-of-domain track consisted of data from more recent systems rated in a separate listening test. Results of the challenge show the effectiveness of fine-tuning self-supervised speech models for the MOS prediction task, as well as the difficulty of predicting MOS ratings for unseen speakers and listeners, and for unseen systems in the out-of-domain setting. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/voicemosnovideo-220831030529-3b121951-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Presentation for Interspeech 2022: &quot;The VoiceMOS Challenge 2022&quot; Presenter: Dr. Erica Cooper, National Institute of Informatics Preprint: https://arxiv.org/abs/2203.11389 Video: https://youtu.be/99ZQ-SLUvKE Challenge website: https://voicemos-challenge-2022.github.io Thu-SS-OS-9-5 We present the first edition of the VoiceMOS Challenge, a scientific event that aims to promote the study of automatic prediction of the mean opinion score (MOS) of synthetic speech. This challenge drew 22 participating teams from academia and industry who tried a variety of approaches to tackle the problem of predicting human ratings of synthesized speech. The listening test data for the main track of the challenge consisted of samples from 187 different text-to-speech and voice conversion systems spanning over a decade of research, and the out-of-domain track consisted of data from more recent systems rated in a separate listening test. Results of the challenge show the effectiveness of fine-tuning self-supervised speech models for the MOS prediction task, as well as the difficulty of predicting MOS ratings for unseen speakers and listeners, and for unseen systems in the out-of-domain setting.
The VoiceMOS Challenge 2022 from Yamagishi Laboratory, National Institute of Informatics, Japan
]]>
64 0 https://cdn.slidesharecdn.com/ss_thumbnails/voicemosnovideo-220831030529-3b121951-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Analyzing Language-Independent Speaker Anonymization Framework under Unseen Conditions /slideshow/analyzing-languageindependent-speaker-anonymization-framework-under-unseen-conditions/252769371 thu-o-os-9-1paperid11065xiaoxiaomiao-220831024515-e7d3a266
Presentation for Interspeech 2022: "Analyzing Language-Independent Speaker Anonymization Framework under Unseen Conditions" Presenter: Dr. Xiaoxiao Miao, National Institute of Informatics Thu-O-OS-9-1 Video: https://youtu.be/wVIxyLiQa1Y Preprint: Preprint: https://arxiv.org/abs/2203.14834 In our previous work, we proposed a language-independent speaker anonymization system based on self-supervised learning models. Although the system can anonymize speech data of any language, the anonymization was imperfect, and the speech content of the anonymized speech was distorted. This limitation is more severe when the input speech is from a domain unseen in the training data. This study analyzed the bottleneck of the anonymization system under unseen conditions. It was found that the domain (e.g., language and channel) mismatch between the training and test data affected the neural waveform vocoder and anonymized speaker vectors, which limited the performance of the whole system. Increasing the training data diversity for the vocoder was found to be helpful to reduce its implicit language and channel dependency. Furthermore, a simple correlation-alignment-based domain adaption strategy was found to be significantly effective to alleviate the mismatch on the anonymized speaker vectors. Audio samples and source code are available online.]]>

Presentation for Interspeech 2022: "Analyzing Language-Independent Speaker Anonymization Framework under Unseen Conditions" Presenter: Dr. Xiaoxiao Miao, National Institute of Informatics Thu-O-OS-9-1 Video: https://youtu.be/wVIxyLiQa1Y Preprint: Preprint: https://arxiv.org/abs/2203.14834 In our previous work, we proposed a language-independent speaker anonymization system based on self-supervised learning models. Although the system can anonymize speech data of any language, the anonymization was imperfect, and the speech content of the anonymized speech was distorted. This limitation is more severe when the input speech is from a domain unseen in the training data. This study analyzed the bottleneck of the anonymization system under unseen conditions. It was found that the domain (e.g., language and channel) mismatch between the training and test data affected the neural waveform vocoder and anonymized speaker vectors, which limited the performance of the whole system. Increasing the training data diversity for the vocoder was found to be helpful to reduce its implicit language and channel dependency. Furthermore, a simple correlation-alignment-based domain adaption strategy was found to be significantly effective to alleviate the mismatch on the anonymized speaker vectors. Audio samples and source code are available online.]]>
Wed, 31 Aug 2022 02:45:15 GMT /slideshow/analyzing-languageindependent-speaker-anonymization-framework-under-unseen-conditions/252769371 jyamagis@slideshare.net(jyamagis) Analyzing Language-Independent Speaker Anonymization Framework under Unseen Conditions jyamagis Presentation for Interspeech 2022: "Analyzing Language-Independent Speaker Anonymization Framework under Unseen Conditions" Presenter: Dr. Xiaoxiao Miao, National Institute of Informatics Thu-O-OS-9-1 Video: https://youtu.be/wVIxyLiQa1Y Preprint: Preprint: https://arxiv.org/abs/2203.14834 In our previous work, we proposed a language-independent speaker anonymization system based on self-supervised learning models. Although the system can anonymize speech data of any language, the anonymization was imperfect, and the speech content of the anonymized speech was distorted. This limitation is more severe when the input speech is from a domain unseen in the training data. This study analyzed the bottleneck of the anonymization system under unseen conditions. It was found that the domain (e.g., language and channel) mismatch between the training and test data affected the neural waveform vocoder and anonymized speaker vectors, which limited the performance of the whole system. Increasing the training data diversity for the vocoder was found to be helpful to reduce its implicit language and channel dependency. Furthermore, a simple correlation-alignment-based domain adaption strategy was found to be significantly effective to alleviate the mismatch on the anonymized speaker vectors. Audio samples and source code are available online. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/thu-o-os-9-1paperid11065xiaoxiaomiao-220831024515-e7d3a266-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Presentation for Interspeech 2022: &quot;Analyzing Language-Independent Speaker Anonymization Framework under Unseen Conditions&quot; Presenter: Dr. Xiaoxiao Miao, National Institute of Informatics Thu-O-OS-9-1 Video: https://youtu.be/wVIxyLiQa1Y Preprint: Preprint: https://arxiv.org/abs/2203.14834 In our previous work, we proposed a language-independent speaker anonymization system based on self-supervised learning models. Although the system can anonymize speech data of any language, the anonymization was imperfect, and the speech content of the anonymized speech was distorted. This limitation is more severe when the input speech is from a domain unseen in the training data. This study analyzed the bottleneck of the anonymization system under unseen conditions. It was found that the domain (e.g., language and channel) mismatch between the training and test data affected the neural waveform vocoder and anonymized speaker vectors, which limited the performance of the whole system. Increasing the training data diversity for the vocoder was found to be helpful to reduce its implicit language and channel dependency. Furthermore, a simple correlation-alignment-based domain adaption strategy was found to be significantly effective to alleviate the mismatch on the anonymized speaker vectors. Audio samples and source code are available online.
Analyzing Language-Independent Speaker Anonymization Framework under Unseen Conditions from Yamagishi Laboratory, National Institute of Informatics, Japan
]]>
29 0 https://cdn.slidesharecdn.com/ss_thumbnails/thu-o-os-9-1paperid11065xiaoxiaomiao-220831024515-e7d3a266-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Spoofing-aware Attention Back-end with Multiple Enrollment and Novel Trials Sampling Strategy for SASVC 2022 /slideshow/spoofingaware-attention-backend-with-multiple-enrollment-and-novel-trials-sampling-strategy-for-sasvc-2022/252769155 wed-ss-os-6-5zengchang10495-220831023054-49475f5c
Presentation for Interspeech 2022: Spoofing-aware Attention Back-end with Multiple Enrollment and Novel Trials Sampling Strategy for SASVC 2022 Presenter: Chang Zeng (National Institute of Informatics and SOKENDAI) Wed-SS-OS-6-5 Presentation video: https://youtu.be/gXxP1nn5X6E The spoofing aware speaker verification challenge (SASVC) 2022 has been organized to explore the relation between automatic speaker verification (ASV) and spoof countermeasure (CM). In this paper, we will introduce our proposed spoofing- aware attention back-end developed for SASVC 2022. First, we design a novel sampling strategy for simulating real verification scenario. Then, in order to fully leverage information derived from multiple enrollments, a spoofing-aware attention back-end has been proposed. Finally, a joint decision strategy is aggregated to introduce mutual interaction between ASV module and CM module. Compared with the trial sampling method used in baseline systems, our proposed sampling method shows effective improvement without any attention modules. The experimental result shows our proposed spoofing-aware attention back-end improves the performance from 6.37% of best baseline system on evaluation dataset to 1.19% in term of SASV- EER (equal error rate) metric.]]>

Presentation for Interspeech 2022: Spoofing-aware Attention Back-end with Multiple Enrollment and Novel Trials Sampling Strategy for SASVC 2022 Presenter: Chang Zeng (National Institute of Informatics and SOKENDAI) Wed-SS-OS-6-5 Presentation video: https://youtu.be/gXxP1nn5X6E The spoofing aware speaker verification challenge (SASVC) 2022 has been organized to explore the relation between automatic speaker verification (ASV) and spoof countermeasure (CM). In this paper, we will introduce our proposed spoofing- aware attention back-end developed for SASVC 2022. First, we design a novel sampling strategy for simulating real verification scenario. Then, in order to fully leverage information derived from multiple enrollments, a spoofing-aware attention back-end has been proposed. Finally, a joint decision strategy is aggregated to introduce mutual interaction between ASV module and CM module. Compared with the trial sampling method used in baseline systems, our proposed sampling method shows effective improvement without any attention modules. The experimental result shows our proposed spoofing-aware attention back-end improves the performance from 6.37% of best baseline system on evaluation dataset to 1.19% in term of SASV- EER (equal error rate) metric.]]>
Wed, 31 Aug 2022 02:30:54 GMT /slideshow/spoofingaware-attention-backend-with-multiple-enrollment-and-novel-trials-sampling-strategy-for-sasvc-2022/252769155 jyamagis@slideshare.net(jyamagis) Spoofing-aware Attention Back-end with Multiple Enrollment and Novel Trials Sampling Strategy for SASVC 2022 jyamagis Presentation for Interspeech 2022: Spoofing-aware Attention Back-end with Multiple Enrollment and Novel Trials Sampling Strategy for SASVC 2022 Presenter: Chang Zeng (National Institute of Informatics and SOKENDAI) Wed-SS-OS-6-5 Presentation video: https://youtu.be/gXxP1nn5X6E The spoofing aware speaker verification challenge (SASVC) 2022 has been organized to explore the relation between automatic speaker verification (ASV) and spoof countermeasure (CM). In this paper, we will introduce our proposed spoofing- aware attention back-end developed for SASVC 2022. First, we design a novel sampling strategy for simulating real verification scenario. Then, in order to fully leverage information derived from multiple enrollments, a spoofing-aware attention back-end has been proposed. Finally, a joint decision strategy is aggregated to introduce mutual interaction between ASV module and CM module. Compared with the trial sampling method used in baseline systems, our proposed sampling method shows effective improvement without any attention modules. The experimental result shows our proposed spoofing-aware attention back-end improves the performance from 6.37% of best baseline system on evaluation dataset to 1.19% in term of SASV- EER (equal error rate) metric. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/wed-ss-os-6-5zengchang10495-220831023054-49475f5c-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Presentation for Interspeech 2022: Spoofing-aware Attention Back-end with Multiple Enrollment and Novel Trials Sampling Strategy for SASVC 2022 Presenter: Chang Zeng (National Institute of Informatics and SOKENDAI) Wed-SS-OS-6-5 Presentation video: https://youtu.be/gXxP1nn5X6E The spoofing aware speaker verification challenge (SASVC) 2022 has been organized to explore the relation between automatic speaker verification (ASV) and spoof countermeasure (CM). In this paper, we will introduce our proposed spoofing- aware attention back-end developed for SASVC 2022. First, we design a novel sampling strategy for simulating real verification scenario. Then, in order to fully leverage information derived from multiple enrollments, a spoofing-aware attention back-end has been proposed. Finally, a joint decision strategy is aggregated to introduce mutual interaction between ASV module and CM module. Compared with the trial sampling method used in baseline systems, our proposed sampling method shows effective improvement without any attention modules. The experimental result shows our proposed spoofing-aware attention back-end improves the performance from 6.37% of best baseline system on evaluation dataset to 1.19% in term of SASV- EER (equal error rate) metric.
Spoofing-aware Attention Back-end with Multiple Enrollment and Novel Trials Sampling Strategy for SASVC 2022 from Yamagishi Laboratory, National Institute of Informatics, Japan
]]>
21 0 https://cdn.slidesharecdn.com/ss_thumbnails/wed-ss-os-6-5zengchang10495-220831023054-49475f5c-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Odyssey 2022: Language-Independent Speaker Anonymization Approach using Self-Supervised Pre-Trained Models /jyamagis/odyssey-2022-languageindependent-speaker-anonymization-approach-using-selfsupervised-pretrained-models odyssey2022-la-ssl-sasnovideoxiaoxiaomiao-220620100008-8a37868b
Presenter: Dr. Xiaoxiao Miao, NII Paper: https://arxiv.org/abs/2202.13097 Speaker anonymization aims to protect the privacy of speakers while preserving spoken linguistic information from speech. Current mainstream neural network speaker anonymization systems are complicated, containing an F0 extractor, speaker encoder, automatic speech recognition acoustic model (ASR AM), speech synthesis acoustic model and speech waveform generation model. Moreover, as an ASR AM is language-dependent, trained on English data, it is hard to adapt it into another language. In this paper, we propose a simpler self-supervised learning (SSL)-based method for language-independent speaker anonymization without any explicit language-dependent model, which can be easily used for other languages. Extensive experiments were conducted on the VoicePrivacy Challenge 2020 datasets in English and AISHELL-3 datasets in Mandarin to demonstrate the effectiveness of our proposed SSL-based language-independent speaker anonymization method.]]>

Presenter: Dr. Xiaoxiao Miao, NII Paper: https://arxiv.org/abs/2202.13097 Speaker anonymization aims to protect the privacy of speakers while preserving spoken linguistic information from speech. Current mainstream neural network speaker anonymization systems are complicated, containing an F0 extractor, speaker encoder, automatic speech recognition acoustic model (ASR AM), speech synthesis acoustic model and speech waveform generation model. Moreover, as an ASR AM is language-dependent, trained on English data, it is hard to adapt it into another language. In this paper, we propose a simpler self-supervised learning (SSL)-based method for language-independent speaker anonymization without any explicit language-dependent model, which can be easily used for other languages. Extensive experiments were conducted on the VoicePrivacy Challenge 2020 datasets in English and AISHELL-3 datasets in Mandarin to demonstrate the effectiveness of our proposed SSL-based language-independent speaker anonymization method.]]>
Mon, 20 Jun 2022 10:00:08 GMT /jyamagis/odyssey-2022-languageindependent-speaker-anonymization-approach-using-selfsupervised-pretrained-models jyamagis@slideshare.net(jyamagis) Odyssey 2022: Language-Independent Speaker Anonymization Approach using Self-Supervised Pre-Trained Models jyamagis Presenter: Dr. Xiaoxiao Miao, NII Paper: https://arxiv.org/abs/2202.13097 Speaker anonymization aims to protect the privacy of speakers while preserving spoken linguistic information from speech. Current mainstream neural network speaker anonymization systems are complicated, containing an F0 extractor, speaker encoder, automatic speech recognition acoustic model (ASR AM), speech synthesis acoustic model and speech waveform generation model. Moreover, as an ASR AM is language-dependent, trained on English data, it is hard to adapt it into another language. In this paper, we propose a simpler self-supervised learning (SSL)-based method for language-independent speaker anonymization without any explicit language-dependent model, which can be easily used for other languages. Extensive experiments were conducted on the VoicePrivacy Challenge 2020 datasets in English and AISHELL-3 datasets in Mandarin to demonstrate the effectiveness of our proposed SSL-based language-independent speaker anonymization method. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/odyssey2022-la-ssl-sasnovideoxiaoxiaomiao-220620100008-8a37868b-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Presenter: Dr. Xiaoxiao Miao, NII Paper: https://arxiv.org/abs/2202.13097 Speaker anonymization aims to protect the privacy of speakers while preserving spoken linguistic information from speech. Current mainstream neural network speaker anonymization systems are complicated, containing an F0 extractor, speaker encoder, automatic speech recognition acoustic model (ASR AM), speech synthesis acoustic model and speech waveform generation model. Moreover, as an ASR AM is language-dependent, trained on English data, it is hard to adapt it into another language. In this paper, we propose a simpler self-supervised learning (SSL)-based method for language-independent speaker anonymization without any explicit language-dependent model, which can be easily used for other languages. Extensive experiments were conducted on the VoicePrivacy Challenge 2020 datasets in English and AISHELL-3 datasets in Mandarin to demonstrate the effectiveness of our proposed SSL-based language-independent speaker anonymization method.
Odyssey 2022: Language-Independent Speaker Anonymization Approach using Self-Supervised Pre-Trained Models from Yamagishi Laboratory, National Institute of Informatics, Japan
]]>
53 0 https://cdn.slidesharecdn.com/ss_thumbnails/odyssey2022-la-ssl-sasnovideoxiaoxiaomiao-220620100008-8a37868b-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Odyssey 2022: Investigating self-supervised front ends for speech spoofing countermeasures /slideshow/odyssey-2022-investigating-selfsupervised-front-ends-for-speech-spoofing-countermeasures/252021565 odyssey2022slidenovideo-220620095426-fca7f602
Presenter: Dr. Xin Wang, NII Paper: https://arxiv.org/abs/2111.07725 Self-supervised speech model is a rapid progressing research topic, and many pre-trained models have been released and used in various down stream tasks. For speech anti-spoofing, most countermeasures (CMs) use signal processing algorithms to extract acoustic features for classification. In this study, we use pre-trained self-supervised speech models as the front end of spoofing CMs. We investigated different back end architectures to be combined with the self-supervised front end, the effectiveness of fine-tuning the front end, and the performance of using different pre-trained self-supervised models. Our findings showed that, when a good pre-trained front end was fine-tuned with either a shallow or a deep neural network-based back end on the ASVspoof 2019 logical access (LA) training set, the resulting CM not only achieved a low EER score on the 2019 LA test set but also significantly outperformed the baseline on the ASVspoof 2015, 2021 LA, and 2021 deepfake test sets. A sub-band analysis further demonstrated that the CM mainly used the information in a specific frequency band to discriminate the bona fide and spoofed trials across the test sets.]]>

Presenter: Dr. Xin Wang, NII Paper: https://arxiv.org/abs/2111.07725 Self-supervised speech model is a rapid progressing research topic, and many pre-trained models have been released and used in various down stream tasks. For speech anti-spoofing, most countermeasures (CMs) use signal processing algorithms to extract acoustic features for classification. In this study, we use pre-trained self-supervised speech models as the front end of spoofing CMs. We investigated different back end architectures to be combined with the self-supervised front end, the effectiveness of fine-tuning the front end, and the performance of using different pre-trained self-supervised models. Our findings showed that, when a good pre-trained front end was fine-tuned with either a shallow or a deep neural network-based back end on the ASVspoof 2019 logical access (LA) training set, the resulting CM not only achieved a low EER score on the 2019 LA test set but also significantly outperformed the baseline on the ASVspoof 2015, 2021 LA, and 2021 deepfake test sets. A sub-band analysis further demonstrated that the CM mainly used the information in a specific frequency band to discriminate the bona fide and spoofed trials across the test sets.]]>
Mon, 20 Jun 2022 09:54:26 GMT /slideshow/odyssey-2022-investigating-selfsupervised-front-ends-for-speech-spoofing-countermeasures/252021565 jyamagis@slideshare.net(jyamagis) Odyssey 2022: Investigating self-supervised front ends for speech spoofing countermeasures jyamagis Presenter: Dr. Xin Wang, NII Paper: https://arxiv.org/abs/2111.07725 Self-supervised speech model is a rapid progressing research topic, and many pre-trained models have been released and used in various down stream tasks. For speech anti-spoofing, most countermeasures (CMs) use signal processing algorithms to extract acoustic features for classification. In this study, we use pre-trained self-supervised speech models as the front end of spoofing CMs. We investigated different back end architectures to be combined with the self-supervised front end, the effectiveness of fine-tuning the front end, and the performance of using different pre-trained self-supervised models. Our findings showed that, when a good pre-trained front end was fine-tuned with either a shallow or a deep neural network-based back end on the ASVspoof 2019 logical access (LA) training set, the resulting CM not only achieved a low EER score on the 2019 LA test set but also significantly outperformed the baseline on the ASVspoof 2015, 2021 LA, and 2021 deepfake test sets. A sub-band analysis further demonstrated that the CM mainly used the information in a specific frequency band to discriminate the bona fide and spoofed trials across the test sets. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/odyssey2022slidenovideo-220620095426-fca7f602-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Presenter: Dr. Xin Wang, NII Paper: https://arxiv.org/abs/2111.07725 Self-supervised speech model is a rapid progressing research topic, and many pre-trained models have been released and used in various down stream tasks. For speech anti-spoofing, most countermeasures (CMs) use signal processing algorithms to extract acoustic features for classification. In this study, we use pre-trained self-supervised speech models as the front end of spoofing CMs. We investigated different back end architectures to be combined with the self-supervised front end, the effectiveness of fine-tuning the front end, and the performance of using different pre-trained self-supervised models. Our findings showed that, when a good pre-trained front end was fine-tuned with either a shallow or a deep neural network-based back end on the ASVspoof 2019 logical access (LA) training set, the resulting CM not only achieved a low EER score on the 2019 LA test set but also significantly outperformed the baseline on the ASVspoof 2015, 2021 LA, and 2021 deepfake test sets. A sub-band analysis further demonstrated that the CM mainly used the information in a specific frequency band to discriminate the bona fide and spoofed trials across the test sets.
Odyssey 2022: Investigating self-supervised front ends for speech spoofing countermeasures from Yamagishi Laboratory, National Institute of Informatics, Japan
]]>
38 0 https://cdn.slidesharecdn.com/ss_thumbnails/odyssey2022slidenovideo-220620095426-fca7f602-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Generalization Ability of MOS Prediction Networks /jyamagis/generalization-ability-of-mos-prediction-networks ecooper-slides-220422060828
"Generalization Ability of MOS Prediction Networks," Erica Cooper, Wen-Chin Huang, Tomoki Toda, Junichi Yamagishi. Accepted to ICASSP 2022. Preprint: https://arxiv.org/abs/2110.02635]]>

"Generalization Ability of MOS Prediction Networks," Erica Cooper, Wen-Chin Huang, Tomoki Toda, Junichi Yamagishi. Accepted to ICASSP 2022. Preprint: https://arxiv.org/abs/2110.02635]]>
Fri, 22 Apr 2022 06:08:28 GMT /jyamagis/generalization-ability-of-mos-prediction-networks jyamagis@slideshare.net(jyamagis) Generalization Ability of MOS Prediction Networks jyamagis "Generalization Ability of MOS Prediction Networks," Erica Cooper, Wen-Chin Huang, Tomoki Toda, Junichi Yamagishi. Accepted to ICASSP 2022. Preprint: https://arxiv.org/abs/2110.02635 <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/ecooper-slides-220422060828-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> &quot;Generalization Ability of MOS Prediction Networks,&quot; Erica Cooper, Wen-Chin Huang, Tomoki Toda, Junichi Yamagishi. Accepted to ICASSP 2022. Preprint: https://arxiv.org/abs/2110.02635
Generalization Ability of MOS Prediction Networks from Yamagishi Laboratory, National Institute of Informatics, Japan
]]>
75 0 https://cdn.slidesharecdn.com/ss_thumbnails/ecooper-slides-220422060828-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Estimating the confidence of speech spoofing countermeasure /slideshow/estimating-the-confidence-of-speech-spoofing-countermeasure/251639177 xinwangslide-220422054021
"Estimating the confidence of speech spoofing countermeasure," Xin Wang, Junichi Yamagishi. Accepted to ICASSP 2022. Preprint: https://arxiv.org/abs/2110.04775]]>

"Estimating the confidence of speech spoofing countermeasure," Xin Wang, Junichi Yamagishi. Accepted to ICASSP 2022. Preprint: https://arxiv.org/abs/2110.04775]]>
Fri, 22 Apr 2022 05:40:21 GMT /slideshow/estimating-the-confidence-of-speech-spoofing-countermeasure/251639177 jyamagis@slideshare.net(jyamagis) Estimating the confidence of speech spoofing countermeasure jyamagis "Estimating the confidence of speech spoofing countermeasure," Xin Wang, Junichi Yamagishi. Accepted to ICASSP 2022. Preprint: https://arxiv.org/abs/2110.04775 <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/xinwangslide-220422054021-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> &quot;Estimating the confidence of speech spoofing countermeasure,&quot; Xin Wang, Junichi Yamagishi. Accepted to ICASSP 2022. Preprint: https://arxiv.org/abs/2110.04775
Estimating the confidence of speech spoofing countermeasure from Yamagishi Laboratory, National Institute of Informatics, Japan
]]>
95 0 https://cdn.slidesharecdn.com/ss_thumbnails/xinwangslide-220422054021-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Attention Back-end for Automatic Speaker Verification with Multiple Enrollment Utterances /slideshow/attention-backend-for-automatic-speaker-verification-with-multiple-enrollment-utterances/251638794 icassp2022slidezengchang-220422044117
"Attention Back-end for Automatic Speaker Verification with Multiple Enrollment Utterances," Chang Zeng, Xin Wang, Erica Cooper, Xiaoxiao Miao, Junichi Yamagishi. Accepted to ICASSP 2022. Preprint: https://arxiv.org/abs/2104.01541]]>

"Attention Back-end for Automatic Speaker Verification with Multiple Enrollment Utterances," Chang Zeng, Xin Wang, Erica Cooper, Xiaoxiao Miao, Junichi Yamagishi. Accepted to ICASSP 2022. Preprint: https://arxiv.org/abs/2104.01541]]>
Fri, 22 Apr 2022 04:41:17 GMT /slideshow/attention-backend-for-automatic-speaker-verification-with-multiple-enrollment-utterances/251638794 jyamagis@slideshare.net(jyamagis) Attention Back-end for Automatic Speaker Verification with Multiple Enrollment Utterances jyamagis "Attention Back-end for Automatic Speaker Verification with Multiple Enrollment Utterances," Chang Zeng, Xin Wang, Erica Cooper, Xiaoxiao Miao, Junichi Yamagishi. Accepted to ICASSP 2022. Preprint: https://arxiv.org/abs/2104.01541 <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/icassp2022slidezengchang-220422044117-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> &quot;Attention Back-end for Automatic Speaker Verification with Multiple Enrollment Utterances,&quot; Chang Zeng, Xin Wang, Erica Cooper, Xiaoxiao Miao, Junichi Yamagishi. Accepted to ICASSP 2022. Preprint: https://arxiv.org/abs/2104.01541
Attention Back-end for Automatic Speaker Verification with Multiple Enrollment Utterances from Yamagishi Laboratory, National Institute of Informatics, Japan
]]>
122 0 https://cdn.slidesharecdn.com/ss_thumbnails/icassp2022slidezengchang-220422044117-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
How do Voices from Past Speech Synthesis Challenges Compare Today? /slideshow/how-do-voices-from-past-speech-synthesis-challenges-compare-today/250075621 ssw11mospaper-210830033827
SSW11 presentation: How do Voices from Past Speech Synthesis Challenges Compare Today? Presenter: Erica Cooper Preprint: https://arxiv.org/abs/2105.02373]]>

SSW11 presentation: How do Voices from Past Speech Synthesis Challenges Compare Today? Presenter: Erica Cooper Preprint: https://arxiv.org/abs/2105.02373]]>
Mon, 30 Aug 2021 03:38:27 GMT /slideshow/how-do-voices-from-past-speech-synthesis-challenges-compare-today/250075621 jyamagis@slideshare.net(jyamagis) How do Voices from Past Speech Synthesis Challenges Compare Today? jyamagis SSW11 presentation: How do Voices from Past Speech Synthesis Challenges Compare Today? Presenter: Erica Cooper Preprint: https://arxiv.org/abs/2105.02373 <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/ssw11mospaper-210830033827-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> SSW11 presentation: How do Voices from Past Speech Synthesis Challenges Compare Today? Presenter: Erica Cooper Preprint: https://arxiv.org/abs/2105.02373
How do Voices from Past Speech Synthesis Challenges Compare Today? from Yamagishi Laboratory, National Institute of Informatics, Japan
]]>
133 0 https://cdn.slidesharecdn.com/ss_thumbnails/ssw11mospaper-210830033827-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Text-to-Speech Synthesis Techniques for MIDI-to-Audio Synthesis /slideshow/texttospeech-synthesis-techniques-for-miditoaudio-synthesis/250075603 xin-ssw11-midi2audio-v3novideo-210830033551
SSW11 presentation: "Text-to-Speech Synthesis Techniques for MIDI-to-Audio Synthesis" Presenter: Xin Wang Preprint: https://arxiv.org/abs/2104.12292]]>

SSW11 presentation: "Text-to-Speech Synthesis Techniques for MIDI-to-Audio Synthesis" Presenter: Xin Wang Preprint: https://arxiv.org/abs/2104.12292]]>
Mon, 30 Aug 2021 03:35:51 GMT /slideshow/texttospeech-synthesis-techniques-for-miditoaudio-synthesis/250075603 jyamagis@slideshare.net(jyamagis) Text-to-Speech Synthesis Techniques for MIDI-to-Audio Synthesis jyamagis SSW11 presentation: "Text-to-Speech Synthesis Techniques for MIDI-to-Audio Synthesis" Presenter: Xin Wang Preprint: https://arxiv.org/abs/2104.12292 <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/xin-ssw11-midi2audio-v3novideo-210830033551-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> SSW11 presentation: &quot;Text-to-Speech Synthesis Techniques for MIDI-to-Audio Synthesis&quot; Presenter: Xin Wang Preprint: https://arxiv.org/abs/2104.12292
Text-to-Speech Synthesis Techniques for MIDI-to-Audio Synthesis from Yamagishi Laboratory, National Institute of Informatics, Japan
]]>
186 0 https://cdn.slidesharecdn.com/ss_thumbnails/xin-ssw11-midi2audio-v3novideo-210830033551-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Preliminary study on using vector quantization latent spaces for TTS/VC systems with consistent performance /slideshow/preliminary-study-on-using-vector-quantization-latent-spaces-for-ttsvc-systems-with-consistent-performance/250075436 luong-ssw11-32-210830031611
Presentation for SSW11: "Preliminary study on using vector quantization latent spaces for TTS/VC systems with consistent performance" Presenter: Hieu-Thi Luong Preprint: https://arxiv.org/abs/2106.13479]]>

Presentation for SSW11: "Preliminary study on using vector quantization latent spaces for TTS/VC systems with consistent performance" Presenter: Hieu-Thi Luong Preprint: https://arxiv.org/abs/2106.13479]]>
Mon, 30 Aug 2021 03:16:11 GMT /slideshow/preliminary-study-on-using-vector-quantization-latent-spaces-for-ttsvc-systems-with-consistent-performance/250075436 jyamagis@slideshare.net(jyamagis) Preliminary study on using vector quantization latent spaces for TTS/VC systems with consistent performance jyamagis Presentation for SSW11: "Preliminary study on using vector quantization latent spaces for TTS/VC systems with consistent performance" Presenter: Hieu-Thi Luong Preprint: https://arxiv.org/abs/2106.13479 <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/luong-ssw11-32-210830031611-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Presentation for SSW11: &quot;Preliminary study on using vector quantization latent spaces for TTS/VC systems with consistent performance&quot; Presenter: Hieu-Thi Luong Preprint: https://arxiv.org/abs/2106.13479
Preliminary study on using vector quantization latent spaces for TTS/VC systems with consistent performance from Yamagishi Laboratory, National Institute of Informatics, Japan
]]>
215 0 https://cdn.slidesharecdn.com/ss_thumbnails/luong-ssw11-32-210830031611-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Advancements in Neural Vocoders /slideshow/advancements-in-neural-vocoders/249908577 vocoder-summer-school-2021-210803054821
Tutorial on neural vocoders at the 2021 Speech Processing Courses in Crete, "Inclusive Neural Speech Synthesis." Presenters: Xin Wang and Junichi Yamagishi, National Institute of Informatics, Japan]]>

Tutorial on neural vocoders at the 2021 Speech Processing Courses in Crete, "Inclusive Neural Speech Synthesis." Presenters: Xin Wang and Junichi Yamagishi, National Institute of Informatics, Japan]]>
Tue, 03 Aug 2021 05:48:20 GMT /slideshow/advancements-in-neural-vocoders/249908577 jyamagis@slideshare.net(jyamagis) Advancements in Neural Vocoders jyamagis Tutorial on neural vocoders at the 2021 Speech Processing Courses in Crete, "Inclusive Neural Speech Synthesis." Presenters: Xin Wang and Junichi Yamagishi, National Institute of Informatics, Japan <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/vocoder-summer-school-2021-210803054821-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Tutorial on neural vocoders at the 2021 Speech Processing Courses in Crete, &quot;Inclusive Neural Speech Synthesis.&quot; Presenters: Xin Wang and Junichi Yamagishi, National Institute of Informatics, Japan
Advancements in Neural Vocoders from Yamagishi Laboratory, National Institute of Informatics, Japan
]]>
1019 0 https://cdn.slidesharecdn.com/ss_thumbnails/vocoder-summer-school-2021-210803054821-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Neural Waveform Modeling /slideshow/neural-waveform-modeling/187580123 201909-fraunhoferiis-neural-waveform-models-191028033746
Neural Waveform Modelingfrom our experiences in text-to-speech application. September 2019 talk at Fraunhofer IIS Germany by Dr. Xin Wang ]]>

Neural Waveform Modelingfrom our experiences in text-to-speech application. September 2019 talk at Fraunhofer IIS Germany by Dr. Xin Wang ]]>
Mon, 28 Oct 2019 03:37:46 GMT /slideshow/neural-waveform-modeling/187580123 jyamagis@slideshare.net(jyamagis) Neural Waveform Modeling jyamagis Neural Waveform Modeling?from our experiences in text-to-speech application. September 2019 talk at Fraunhofer IIS Germany by Dr. Xin Wang <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/201909-fraunhoferiis-neural-waveform-models-191028033746-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Neural Waveform Modeling?from our experiences in text-to-speech application. September 2019 talk at Fraunhofer IIS Germany by Dr. Xin Wang
Neural Waveform Modeling from Yamagishi Laboratory, National Institute of Informatics, Japan
]]>
1146 0 https://cdn.slidesharecdn.com/ss_thumbnails/201909-fraunhoferiis-neural-waveform-models-191028033746-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Neural source-filter waveform model /jyamagis/neural-sourcefilter-waveform-model slp-l8-190521233946
These are the slides for the presentation titled "Neural source-waveform model," given at ICASSP 2019 in Brighton, UK. Presenter: Xin Wang, National Institute of Informatics, Japan]]>

These are the slides for the presentation titled "Neural source-waveform model," given at ICASSP 2019 in Brighton, UK. Presenter: Xin Wang, National Institute of Informatics, Japan]]>
Tue, 21 May 2019 23:39:46 GMT /jyamagis/neural-sourcefilter-waveform-model jyamagis@slideshare.net(jyamagis) Neural source-filter waveform model jyamagis These are the slides for the presentation titled "Neural source-waveform model," given at ICASSP 2019 in Brighton, UK. Presenter: Xin Wang, National Institute of Informatics, Japan <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/slp-l8-190521233946-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> These are the slides for the presentation titled &quot;Neural source-waveform model,&quot; given at ICASSP 2019 in Brighton, UK. Presenter: Xin Wang, National Institute of Informatics, Japan
Neural source-filter waveform model from Yamagishi Laboratory, National Institute of Informatics, Japan
]]>
1095 4 https://cdn.slidesharecdn.com/ss_thumbnails/slp-l8-190521233946-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Tutorial on end-to-end text-to-speech synthesis: Part 2 – Tactron and related end-to-end systems /slideshow/tutorial-on-endtoend-texttospeech-synthesis-part-2-tactron-and-related-endtoend-systems/130106654 sp-201901-part2-190201073157
These are slides used for invited tutorial on "end-to-end text-to-speech synthesis", given at IEICE SP workshop held on 27th Jan 2019. Part 2 – Tactron and related end-to-end systems Presenters: Xin Wang, Yusuke Yasuda (National Institute of Informatics, Japan)]]>

These are slides used for invited tutorial on "end-to-end text-to-speech synthesis", given at IEICE SP workshop held on 27th Jan 2019. Part 2 – Tactron and related end-to-end systems Presenters: Xin Wang, Yusuke Yasuda (National Institute of Informatics, Japan)]]>
Fri, 01 Feb 2019 07:31:57 GMT /slideshow/tutorial-on-endtoend-texttospeech-synthesis-part-2-tactron-and-related-endtoend-systems/130106654 jyamagis@slideshare.net(jyamagis) Tutorial on end-to-end text-to-speech synthesis: Part 2 – Tactron and related end-to-end systems jyamagis These are slides used for invited tutorial on "end-to-end text-to-speech synthesis", given at IEICE SP workshop held on 27th Jan 2019. Part 2 – Tactron and related end-to-end systems Presenters: Xin Wang, Yusuke Yasuda (National Institute of Informatics, Japan) <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/sp-201901-part2-190201073157-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> These are slides used for invited tutorial on &quot;end-to-end text-to-speech synthesis&quot;, given at IEICE SP workshop held on 27th Jan 2019. Part 2 – Tactron and related end-to-end systems Presenters: Xin Wang, Yusuke Yasuda (National Institute of Informatics, Japan)
Tutorial on end-to-end text-to-speech synthesis: Part 2 – Tactron and related end-to-end systems from Yamagishi Laboratory, National Institute of Informatics, Japan
]]>
4692 0 https://cdn.slidesharecdn.com/ss_thumbnails/sp-201901-part2-190201073157-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Tutorial on end-to-end text-to-speech synthesis: Part 1 – Neural waveform modeling /slideshow/tutorial-on-endtoend-texttospeech-synthesis-part-1-neural-waveform-modeling/130105846 sp-201901-part1-190201072240
These are slides used for invited tutorial on "end-to-end text-to-speech synthesis", given at IEICE SP workshop held on 27th Jan 2019. Part 1: Neural waveform modeling Presenters: Xin Wang, Yusuke Yasuda (National Institute of Informatics, Japan) ]]>

These are slides used for invited tutorial on "end-to-end text-to-speech synthesis", given at IEICE SP workshop held on 27th Jan 2019. Part 1: Neural waveform modeling Presenters: Xin Wang, Yusuke Yasuda (National Institute of Informatics, Japan) ]]>
Fri, 01 Feb 2019 07:22:40 GMT /slideshow/tutorial-on-endtoend-texttospeech-synthesis-part-1-neural-waveform-modeling/130105846 jyamagis@slideshare.net(jyamagis) Tutorial on end-to-end text-to-speech synthesis: Part 1 – Neural waveform modeling jyamagis These are slides used for invited tutorial on "end-to-end text-to-speech synthesis", given at IEICE SP workshop held on 27th Jan 2019. Part 1: Neural waveform modeling Presenters: Xin Wang, Yusuke Yasuda (National Institute of Informatics, Japan) <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/sp-201901-part1-190201072240-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> These are slides used for invited tutorial on &quot;end-to-end text-to-speech synthesis&quot;, given at IEICE SP workshop held on 27th Jan 2019. Part 1: Neural waveform modeling Presenters: Xin Wang, Yusuke Yasuda (National Institute of Informatics, Japan)
Tutorial on end-to-end text-to-speech synthesis: Part 1 – Neural waveform modeling from Yamagishi Laboratory, National Institute of Informatics, Japan
]]>
5288 6 https://cdn.slidesharecdn.com/ss_thumbnails/sp-201901-part1-190201072240-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 2)~ /jyamagis/nii-tacotronwavenet-part-2 sp-201901-part2-190128115135
2019年1月27日(日)に金沢において開催された音声研究会(SP)で実施した[チュートリアル招待講演]エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル ~のスライドです。 発表者:シン ワン、安田裕介]]>

2019年1月27日(日)に金沢において開催された音声研究会(SP)で実施した[チュートリアル招待講演]エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル ~のスライドです。 発表者:シン ワン、安田裕介]]>
Mon, 28 Jan 2019 11:51:35 GMT /jyamagis/nii-tacotronwavenet-part-2 jyamagis@slideshare.net(jyamagis) エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 2)~ jyamagis 2019年1月27日(日)に金沢において開催された音声研究会(SP)で実施した[チュートリアル招待講演]エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル ~のスライドです。 発表者:シン ワン、安田裕介 <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/sp-201901-part2-190128115135-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> 2019年1月27日(日)に金沢において開催された音声研究会(SP)で実施した[チュートリアル招待講演]エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル ~のスライドです。 発表者:シン ワン、安田裕介
エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 2)~ from Yamagishi Laboratory, National Institute of Informatics, Japan
]]>
2500 5 https://cdn.slidesharecdn.com/ss_thumbnails/sp-201901-part2-190128115135-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 1)~ /slideshow/nii-tacotronwavenet/129556511 sp-201901-part1-190128114431
2019年1月27日(日)に金沢において開催された音声研究会(SP)で実施した[チュートリアル招待講演]エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル ~のスライドです。 発表者:シン ワン、安田裕介]]>

2019年1月27日(日)に金沢において開催された音声研究会(SP)で実施した[チュートリアル招待講演]エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル ~のスライドです。 発表者:シン ワン、安田裕介]]>
Mon, 28 Jan 2019 11:44:31 GMT /slideshow/nii-tacotronwavenet/129556511 jyamagis@slideshare.net(jyamagis) エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 1)~ jyamagis 2019年1月27日(日)に金沢において開催された音声研究会(SP)で実施した[チュートリアル招待講演]エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル ~のスライドです。 発表者:シン ワン、安田裕介 <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/sp-201901-part1-190128114431-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> 2019年1月27日(日)に金沢において開催された音声研究会(SP)で実施した[チュートリアル招待講演]エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル ~のスライドです。 発表者:シン ワン、安田裕介
エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 1)~ from Yamagishi Laboratory, National Institute of Informatics, Japan
]]>
1954 4 https://cdn.slidesharecdn.com/ss_thumbnails/sp-201901-part1-190128114431-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
https://cdn.slidesharecdn.com/profile-photo-jyamagis-48x48.jpg?cb=1708398101 National Institute of Informatics (NII) is an academic research institution for informatics leading frontier research on data science, big data, and artificial intelligence, supported by the Ministry of Education, Culture, Sports, Science and Technology, Japan. Yamagishi Lab at NII is a sound media group consisting of 1 faculty, 4 postdocs and 3 PhDs. nii-yamagishilab.github.io https://cdn.slidesharecdn.com/ss_thumbnails/is2022-dds-221105142402-4234006e-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/dds-a-new-devicedegraded-speech-dataset-for-speech-enhancement/254014360 DDS: A new device-degr... https://cdn.slidesharecdn.com/ss_thumbnails/voicemosnovideo-220831030529-3b121951-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/the-voicemos-challenge-2022-252769758/252769758 The VoiceMOS Challenge... https://cdn.slidesharecdn.com/ss_thumbnails/thu-o-os-9-1paperid11065xiaoxiaomiao-220831024515-e7d3a266-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/analyzing-languageindependent-speaker-anonymization-framework-under-unseen-conditions/252769371 Analyzing Language-Ind...