日本音響学会2021春季研究発表会1-1-2
北村大地, 矢田部浩平, "スペクトログラム無矛盾性を用いた独立低ランク行列分析の実験的評価," 日本音響学会 2021年春季研究発表会講演論文集, 1-1-2, pp. 121–124, Tokyo, March 2021.
Daichi Kitamura and Kohei Yatabe, "Experimental evaluation of consistent independent low-rank matrix analysis," Proceedings of 2021 Spring Meeting of Acoustical Society of Japan, 1-1-2, pp. 121–124, Tokyo, March 2021 (in Japanese).
独立深層学習行列分析に基づく多チャネル音源分離(Multichannel audio source separation based on indepen...Daichi Kitamura
?
角野隼斗, 北村大地, 高宗典玄, 高道慎之介, 猿渡洋, 小野順貴, "独立深層学習行列分析に基づく多チャネル音源分離," 日本音響学会 2018年春季研究発表会講演論文集, 1-4-16, pp. 449–452, Saitama, March 2018.
Hayato Sumino, Daichi Kitamura, Norihiro Takamune, Shinnosuke Takamichi, Hiroshi Saruwatari, Nobutaka Ono, "Multichannel audio source separation based on independent deeply learned matrix analysis," Proceedings of 2018 Spring Meeting of Acoustical Society of Japan, 1-4-16, pp. 449–452, Saitama, March 2018 (in Japanese).
独立低ランク行列分析に基づく音源分离とその発展(Audio source separation based on independent low-rank...Daichi Kitamura
?
北村大地, "独立低ランク行列分析に基づく音源分离とその発展," IEICE信号処理研究会, 2021年8月24日.
Daichi Kitamura, "Audio source separation based on independent low-rank matrix analysis and its extensions," IEICE Technical Group on Signal Processing, Aug. 24th, 2021.
http://d-kitamura.net
This document summarizes a research talk on statistical-model-based speech enhancement techniques that aim to reduce noise without generating musical noise artifacts. The talk outlines conventional enhancement methods like spectral subtraction and Wiener filtering that often cause musical noise. It then proposes a biased minimum mean-square error estimator that can achieve a musical-noise-free state by introducing a bias parameter. Analysis and experiments show this method can reduce noise while keeping the kurtosis ratio fixed at 1.0 to prevent musical noise, outperforming other techniques in terms of speech quality. A strong speech prior model is found to limit achieving musical-noise-free states, so the prior must be carefully selected.
ICASSP 2019音声&音響論文読み会(https://connpass.com/event/128527/)での発表資料です。
AASP (Audio and Acoustic Signal Processing) 分野の紹介と、ICASSP 2019での動向を紹介しています。#icassp2019jp
非負値行列分解の確率的生成モデルと多チャネル音源分離への応用 (Generative model in nonnegative matrix facto...Daichi Kitamura
?
北村大地, "非負値行列分解の確率的生成モデルと多チャネル音源分離への応用," 慶應義塾大学理工学部電子工学科湯川研究室 招待講演, Kanagawa, November, 2015.
Daichi Kitamura, "Generative model in nonnegative matrix factorization and its application to multichannel sound source separation," Keio University, Science and Technology, Department of Electronics and Electrical Engineeing, Yukawa Laboratory, Invited Talk, Kanagawa, November, 2015.
A Brief Introduction of Anomalous Sound Detection: Recent Studies and Future...Yuma Koizumi
?
Yuma Koizumi presents an overview of anomalous sound detection (ASD), discussing recent challenges and future prospects, particularly in unsupervised settings. The presentation highlights the difficulties of detecting anomalies due to the unpredictability of data patterns and emphasizes the need for innovative approaches to training models with limited labeled data. Additionally, it addresses the impact of domain shifts on detection systems and suggests adaptations using few-shot learning techniques.
【DL輪読会】Incorporating group update for speech enhancement based on convolutio...Deep Learning JP
?
1. The document discusses a research paper on speech enhancement using a convolutional gated recurrent network (CGRN) and ordered neuron long short-term memory (ON-LSTM).
2. The proposed method aims to improve speech quality by incorporating both time and frequency dependencies using CGRN, and handling noise with varying change rates using ON-LSTM.
3. CGRN replaces fully-connected layers with convolutions, allowing it to capture local spatial structures in the frequency domain. ON-LSTM groups neurons based on the change rate of internal information to model hierarchical representations.
独立深層学習行列分析に基づく多チャネル音源分離(Multichannel audio source separation based on indepen...Daichi Kitamura
?
角野隼斗, 北村大地, 高宗典玄, 高道慎之介, 猿渡洋, 小野順貴, "独立深層学習行列分析に基づく多チャネル音源分離," 日本音響学会 2018年春季研究発表会講演論文集, 1-4-16, pp. 449–452, Saitama, March 2018.
Hayato Sumino, Daichi Kitamura, Norihiro Takamune, Shinnosuke Takamichi, Hiroshi Saruwatari, Nobutaka Ono, "Multichannel audio source separation based on independent deeply learned matrix analysis," Proceedings of 2018 Spring Meeting of Acoustical Society of Japan, 1-4-16, pp. 449–452, Saitama, March 2018 (in Japanese).
独立低ランク行列分析に基づく音源分离とその発展(Audio source separation based on independent low-rank...Daichi Kitamura
?
北村大地, "独立低ランク行列分析に基づく音源分离とその発展," IEICE信号処理研究会, 2021年8月24日.
Daichi Kitamura, "Audio source separation based on independent low-rank matrix analysis and its extensions," IEICE Technical Group on Signal Processing, Aug. 24th, 2021.
http://d-kitamura.net
This document summarizes a research talk on statistical-model-based speech enhancement techniques that aim to reduce noise without generating musical noise artifacts. The talk outlines conventional enhancement methods like spectral subtraction and Wiener filtering that often cause musical noise. It then proposes a biased minimum mean-square error estimator that can achieve a musical-noise-free state by introducing a bias parameter. Analysis and experiments show this method can reduce noise while keeping the kurtosis ratio fixed at 1.0 to prevent musical noise, outperforming other techniques in terms of speech quality. A strong speech prior model is found to limit achieving musical-noise-free states, so the prior must be carefully selected.
ICASSP 2019音声&音響論文読み会(https://connpass.com/event/128527/)での発表資料です。
AASP (Audio and Acoustic Signal Processing) 分野の紹介と、ICASSP 2019での動向を紹介しています。#icassp2019jp
非負値行列分解の確率的生成モデルと多チャネル音源分離への応用 (Generative model in nonnegative matrix facto...Daichi Kitamura
?
北村大地, "非負値行列分解の確率的生成モデルと多チャネル音源分離への応用," 慶應義塾大学理工学部電子工学科湯川研究室 招待講演, Kanagawa, November, 2015.
Daichi Kitamura, "Generative model in nonnegative matrix factorization and its application to multichannel sound source separation," Keio University, Science and Technology, Department of Electronics and Electrical Engineeing, Yukawa Laboratory, Invited Talk, Kanagawa, November, 2015.
A Brief Introduction of Anomalous Sound Detection: Recent Studies and Future...Yuma Koizumi
?
Yuma Koizumi presents an overview of anomalous sound detection (ASD), discussing recent challenges and future prospects, particularly in unsupervised settings. The presentation highlights the difficulties of detecting anomalies due to the unpredictability of data patterns and emphasizes the need for innovative approaches to training models with limited labeled data. Additionally, it addresses the impact of domain shifts on detection systems and suggests adaptations using few-shot learning techniques.
【DL輪読会】Incorporating group update for speech enhancement based on convolutio...Deep Learning JP
?
1. The document discusses a research paper on speech enhancement using a convolutional gated recurrent network (CGRN) and ordered neuron long short-term memory (ON-LSTM).
2. The proposed method aims to improve speech quality by incorporating both time and frequency dependencies using CGRN, and handling noise with varying change rates using ON-LSTM.
3. CGRN replaces fully-connected layers with convolutions, allowing it to capture local spatial structures in the frequency domain. ON-LSTM groups neurons based on the change rate of internal information to model hierarchical representations.
The VoiceMOS Challenge 2022 aimed to encourage research in automatic prediction of mean opinion scores (MOS) for speech quality. It featured two tracks evaluating systems' ability to predict MOS ratings from a large existing dataset or a separate listening test. 21 teams participated in the main track and 15 in the out-of-domain track. Several teams outperformed the best baseline, which fine-tuned a self-supervised model, though the top-performing approaches generally involved ensembling or multi-task learning. While unseen systems were predictable, unseen listeners and speakers remained a difficulty, especially for generalizing to a new test. The challenge highlighted progress in MOS prediction but also the need for metrics reflecting both ranking and absolute accuracy
Investigation of Text-to-Speech based Synthetic Parallel Data for Sequence-to...NU_I_TODALAB
?
This document investigates the use of synthetic parallel data (SPD) to enhance non-parallel voice conversion (VC) through sequence-to-sequence modeling. The study evaluates the feasibility and influence of SPD on VC performance, analyzing various training pairs and the effectiveness of semiparallel datasets. Findings indicate that SPD is viable for VC, but its success depends on the training data quality and the size of the dataset used.
Interactive voice conversion for augmented speech productionNU_I_TODALAB
?
This document discusses recent progress in interactive voice conversion techniques for augmenting speech production. It begins by explaining the physical limitations of normal speech production and how voice conversion can augment speech by controlling more information. It then discusses how interactive voice conversion allows for quick response times, better controllability through real-time feedback, and understanding user intent from multimodal behavior signals. Recent advances discussed include low-latency voice conversion networks, controllable waveform generation respecting the source-filter model of speech, and expression control using signals like arm movements. The goal is to develop cooperatively augmented speech that can help users with lost speech abilities.
Recent progress on voice conversion: What is next?NU_I_TODALAB
?
The document discusses recent advancements in voice conversion (VC) techniques, emphasizing the importance of preserving linguistic content while modifying non-linguistic features. It outlines the Voice Conversion Challenges (VCC) from 2016 to 2020, highlighting different training methods and the role of neural vocoders. The paper also suggests future directions for VC research, focusing on improving performance, developing interactive applications, and exploring higher-level feature conversions.
Weakly-Supervised Sound Event Detection with Self-AttentionNU_I_TODALAB
?
This document presents a weakly-supervised sound event detection method using self-attention, aiming to enhance detection performance through the utilization of weak label data. The proposed approach introduces a special tag token for weak label handling and employs a transformer encoder for improved sequence modeling, achieving performance improvements from a baseline CRNN model. Experimental results indicate a notable increase in sound event detection accuracy, with the new method outperforming the baseline in various evaluation metrics.
Statistical voice conversion with direct waveform modelingNU_I_TODALAB
?
This document provides an outline for a tutorial on voice conversion techniques. It begins with an introduction to the goal of the tutorial, which is to help participants grasp the basics and recent progress of VC, develop a baseline VC system, and develop a more sophisticated system using a neural vocoder. The tutorial will include an overview of VC techniques, introduction of freely available software for building a VC system, and breaks between sessions. The first session will cover the basics of VC, improvements to VC techniques, and an overview of recent progress in direct waveform modeling. The second session will demonstrate how to develop a VC system using the WaveNet vocoder with freely available tools.
The document outlines a hands-on workshop for developing voice conversion (VC) systems using open-source software called Sprocket, created by Nagoya University. It details the process of building a traditional GMM-based VC and includes instructions for installing the software, preparing datasets, and configuring the system for speaker conversion. The overall goal is to provide participants with the knowledge and tools needed to initiate their own VC research and development.