TR2015-153
A sampling-based speaker clustering using utterance-oriented Dirichlet process mixture model and its evaluation on large scale data
-
- "A Sampling-Based Speaker Clustering Using Utterance-Oriented Dirichlet Process Mixture Model and Its Evaluation on Large Scale Data", APSIPA Transactions on Signal and Information Processing, DOI: 10.1017/ATSIP.2015.19, Vol. 4, October 2015.BibTeX TR2015-153 PDF
- @article{Tawara2015oct,
- author = {Tawara, N. and Ogawa, T. and Watanabe, S. and Nakamura, A. and Kobayashi, T.},
- title = {A Sampling-Based Speaker Clustering Using Utterance-Oriented Dirichlet Process Mixture Model and Its Evaluation on Large Scale Data},
- journal = {APSIPA Transactions on Signal and Information Processing},
- year = 2015,
- volume = 4,
- month = oct,
- doi = {10.1017/ATSIP.2015.19},
- issn = {2048-7703},
- url = {https://www.merl.com/publications/TR2015-153}
- }
,
- "A Sampling-Based Speaker Clustering Using Utterance-Oriented Dirichlet Process Mixture Model and Its Evaluation on Large Scale Data", APSIPA Transactions on Signal and Information Processing, DOI: 10.1017/ATSIP.2015.19, Vol. 4, October 2015.
-
Research Areas:
Abstract:
An infinite mixture model is applied to model-based speaker clustering with sampling-based optimization to make it possible to estimate the number of speakers. For this purpose, a framework of nonparametric Bayesian modeling is implemented with the Markov chain Monte Carlo (MCMC) and incorporated in the utterance-oriented speaker model. The proposed model is called the utterance-oriented Dirichlet process mixture model (UO-DPMM). The present paper demonstrates that UO-DPMM is successfully applied on large-scale data and outperforms the conventional hierarchical agglomerative clustering, especially for large amounts of utterances.