音源分離における音響モデリング(Acoustic modeling in audio source separation)Daichi Kitamura
?
北村大地, "音源分離における音響モデリング," 日本音響学会 サマーセミナー 招待講演, September 11th, 2017.
Daichi Kitamura, "Acoustic modeling in audio source separation," The Acoustical Society of Japan, Summer Seminar Invited Talk, September 11th, 2017.
GAN-based statistical speech synthesis (in Japanese)Yuki Saito
?
Guest presentation at "Applied Gaussian Process and Machine Learning," Graduate School of Information Science and Technology, The University of Tokyo, Japan, 2021.
2017年6月24日,ICASSP2017読み会(関東編)@東京大学
AASP-L3: Deep Learning for Source Separation and Enhancement I
東京大学特任助教 北村大地担当分のスライド
私が著者ではないペーパーの紹介スライドですので,再配布等はご遠慮ください.また,このスライドで取り扱っていない詳細な情報に関しては対象となる論文をご参照ください.
ランク1空間近似を用いたBSSにおける音源及び空間モデルの考察 Study on Source and Spatial Models for BSS wi...Daichi Kitamura
?
Presented at 2015 Autumn Meeting of Acoustical Society of Japan (domestic conference)
北村大地, 猿渡洋, 小野順貴, 澤田宏, 亀岡弘和, "ランク1空間近似を用いたBSSにおける音源及び空間モデルの考察," 日本音響学会 2015年秋季研究発表会, 3-6-10, pp.583-586, Fukushima, September 2015.
Daichi Kitamura, Hiroshi Saruwatari, Nobutaka Ono, Hiroshi Sawada, Hirokazu Kameoka, "Study on source and spatial models for BSS with rank-1 spatial approximation," Proceedings of 2015 Autumn Meeting of Acoustical Society of Japan, 3-6-10, pp.583-586, Fukushima, September 2015 (in Japanese).
This document summarizes a research talk on statistical-model-based speech enhancement techniques that aim to reduce noise without generating musical noise artifacts. The talk outlines conventional enhancement methods like spectral subtraction and Wiener filtering that often cause musical noise. It then proposes a biased minimum mean-square error estimator that can achieve a musical-noise-free state by introducing a bias parameter. Analysis and experiments show this method can reduce noise while keeping the kurtosis ratio fixed at 1.0 to prevent musical noise, outperforming other techniques in terms of speech quality. A strong speech prior model is found to limit achieving musical-noise-free states, so the prior must be carefully selected.
北村大地, 小野順貴, "独立性基準を用いた非負値行列因子分解の効果的な初期値決定法," 日本音響学会 2016年春季研究発表会, 3-3-5, pp. 619-622, Kanagawa, March 2016.
Daichi Kitamura, Nobutaka Ono, "Statistical-independence-based effective initialization for nonnegative matrix factorization," Proceedings of 2016 Spring Meeting of Acoustical Society of Japan, 3-3-5, pp. 619-622, Kanagawa, March 2016 (in Japanese).
ICASSP 2019音声&音響論文読み会(https://connpass.com/event/128527/)での発表資料です。
AASP (Audio and Acoustic Signal Processing) 分野の紹介と、ICASSP 2019での動向を紹介しています。#icassp2019jp
The document proposes an improved method for audio signal separation using supervised nonnegative matrix factorization (NMF) with time-variant basis deformation. The key contributions are:
1. Classifying supervised bases into time-variant attack and sustain parts and applying different all-pole model-based deformations to each.
2. Introducing discriminative training to avoid overfitting the interference signal and better separate the target.
3. An iterative approximated algorithm is presented that searches for deformation matrices representing the target signal while being constrained to also fit the mixture signal.
4. Experimental results on instrument mixtures show the proposed method achieves better signal-to-distortion ratio performance than previous supervised NMF techniques.
2017年6月24日,ICASSP2017読み会(関東編)@東京大学
AASP-L3: Deep Learning for Source Separation and Enhancement I
東京大学特任助教 北村大地担当分のスライド
私が著者ではないペーパーの紹介スライドですので,再配布等はご遠慮ください.また,このスライドで取り扱っていない詳細な情報に関しては対象となる論文をご参照ください.
ランク1空間近似を用いたBSSにおける音源及び空間モデルの考察 Study on Source and Spatial Models for BSS wi...Daichi Kitamura
?
Presented at 2015 Autumn Meeting of Acoustical Society of Japan (domestic conference)
北村大地, 猿渡洋, 小野順貴, 澤田宏, 亀岡弘和, "ランク1空間近似を用いたBSSにおける音源及び空間モデルの考察," 日本音響学会 2015年秋季研究発表会, 3-6-10, pp.583-586, Fukushima, September 2015.
Daichi Kitamura, Hiroshi Saruwatari, Nobutaka Ono, Hiroshi Sawada, Hirokazu Kameoka, "Study on source and spatial models for BSS with rank-1 spatial approximation," Proceedings of 2015 Autumn Meeting of Acoustical Society of Japan, 3-6-10, pp.583-586, Fukushima, September 2015 (in Japanese).
This document summarizes a research talk on statistical-model-based speech enhancement techniques that aim to reduce noise without generating musical noise artifacts. The talk outlines conventional enhancement methods like spectral subtraction and Wiener filtering that often cause musical noise. It then proposes a biased minimum mean-square error estimator that can achieve a musical-noise-free state by introducing a bias parameter. Analysis and experiments show this method can reduce noise while keeping the kurtosis ratio fixed at 1.0 to prevent musical noise, outperforming other techniques in terms of speech quality. A strong speech prior model is found to limit achieving musical-noise-free states, so the prior must be carefully selected.
北村大地, 小野順貴, "独立性基準を用いた非負値行列因子分解の効果的な初期値決定法," 日本音響学会 2016年春季研究発表会, 3-3-5, pp. 619-622, Kanagawa, March 2016.
Daichi Kitamura, Nobutaka Ono, "Statistical-independence-based effective initialization for nonnegative matrix factorization," Proceedings of 2016 Spring Meeting of Acoustical Society of Japan, 3-3-5, pp. 619-622, Kanagawa, March 2016 (in Japanese).
ICASSP 2019音声&音響論文読み会(https://connpass.com/event/128527/)での発表資料です。
AASP (Audio and Acoustic Signal Processing) 分野の紹介と、ICASSP 2019での動向を紹介しています。#icassp2019jp
The document proposes an improved method for audio signal separation using supervised nonnegative matrix factorization (NMF) with time-variant basis deformation. The key contributions are:
1. Classifying supervised bases into time-variant attack and sustain parts and applying different all-pole model-based deformations to each.
2. Introducing discriminative training to avoid overfitting the interference signal and better separate the target.
3. An iterative approximated algorithm is presented that searches for deformation matrices representing the target signal while being constrained to also fit the mixture signal.
4. Experimental results on instrument mixtures show the proposed method achieves better signal-to-distortion ratio performance than previous supervised NMF techniques.
The document describes a proposed hybrid method for multichannel signal separation using supervised nonnegative matrix factorization (SNMF). The method combines directional clustering for spatial separation with SNMF incorporating spectrogram restoration for spectral separation. Experiments show the hybrid method achieves better separation performance than conventional single-channel SNMF or multichannel NMF methods, as measured by signal-to-distortion ratio. The optimal divergence for the SNMF component involves a tradeoff between separation ability and ability to restore missing spectral components.
This document proposes a flexible microphone array system using informed source separation methods for a rescue robot. It aims to detect victim speech in disaster areas using multiple microphones on the robot's flexible body. The proposed method uses supervised rank-1 nonnegative matrix factorization (NMF) and statistical signal estimation to address two key problems: ego-noise basis mismatch due to the robot's self-vibrations, and speech model ambiguity. Experiments show the proposed approach outperforms conventional independent vector analysis and single-channel NMF, improving speech detection even with mismatched ego-noise recordings.
Shoichi Koyama, Naoki Murata, and Hiroshi Saruwatari. "Super-resolution in sound field recording and reproduction based on sparse representation"
presented at 5th Joint Meeting Acoustical Society of America and Acoustical Society of Japan (28 Nov. - 2 Dec. 2016, Honolulu, USA)
Shoichi Koyama, "Source-Location-Informed Sound Field Recording and Reproduction: A Generalization to Arrays of Arbitrary Geometry"
Presented in 2016 AES International Conference on Sound Field Control (July 18-20 2016, Guildford, UK)
独立性に基づくブラインド音源分離の発展と独立低ランク行列分析 History of independence-based blind source sep...Daichi Kitamura
?
東京大学 システム情報学専攻 談話会
2017年2月27日(月)15時~16時30分
北村大地, "独立性に基づくブラインド音源分離の発展と独立低ランク行列分析," 東京大学 システム情報学専攻 談話会, 2月27日, 2017年.
Daichi Kitamura, "History of independence-based blind source separation and independent low-rank matrix analysis," The University of Tokyo, Department of Information Physics and Computing, Seminar, 27th Feb., 2017.
The document describes a real-time DNN voice conversion system with feedback to acquire character traits. It proposes a method to provide real-time feedback of the converted voice to the speaker to encourage speech modification (prosody and emphasis) towards the target speaker's character. Subjective evaluations from the first-person (user) perspective and third-person perspective found that the system improved the reproduction of the target speaker's character, especially for inexperienced users. Providing only pitch feedback was already quite effective.
9. /13
収録音声の例とHMM学習
9
? 収録音声の例
? HMM学習(補正)
– 読み上げ誤りの含まれる音声を使用すると、音声合成の品質が低下
– → 収録音声のうち、HMM尤度が相対的に高い音声のみを使用
発話文 話者1 話者2 話者3
There is no mine and there are no miners.
Do you often take them for a walk?
That’s interesting.