TR2003-88

Multi-Channel Source Separation by Beamforming Trained with Factorial HMMS

- Reyes-Gomez, M.J., Raj, B., Ellis, D.P.W., "Multi-Channel Source Separation by Beamforming Trained with Factorial HMMS", IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), October 2003, pp. 13-16.
  BibTeX TR2003-88 PDF
  - @inproceedings{Reyes-Gomez2003oct,
  - author = {Reyes-Gomez, M.J. and Raj, B. and Ellis, D.P.W.},
  - title = {{Multi-Channel Source Separation by Beamforming Trained with Factorial HMMS}},
  - booktitle = {IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)},
  - year = 2003,
  - pages = {13--16},
  - month = oct,
  - url = {https://www.merl.com/publications/TR2003-88}
  - }
Research Areas:

Artificial Intelligence, Speech & Audio

Abstract:

Speaker separation has conventionally been treated as a problem of Blind Source Separation (BSS). This approach does not utilize any knowledge of the statistical characteristics of the signals to be separated, relying mainly on the independence between the various signals to separate them. Maximum-likelihood techniques, on the other hand, utilize knowledge of the a priori probability distributions of the signals from the speakers, in order to effect separation. In [5] we present a Maximum-likelihood speaker separation technique that utilizes detailed statistical information about the signals to be separated, represented in the form of hidden Markov models (HMMs), to estimate the parameters of a filter-and-sum processor for signal separation. In this paper we show that the filters that are estimated for any utterance by a speaker generalize well to other utterances by the same speaker, provided the locations of the various speakers remains constant. Thus, filters that have been estimated using a \"training\" utterance of known transcript can be used to separate all future signals by the speaker from mixtures of speech signals in an unsupervised manner. On the other hand, the filters are ineffective for other speakres, indicating that they capture the spatio-frequency characteristics of the speaker.

Related News & Events

NEWS WASPAA 2003: 2 publications by MERL researchers and others
Date: October 20, 2003
Where: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)
Brief
- The papers "Non-negative Matrix Factorization for Polyphonic Music Transcription" by Smaragdis, P. and Brown, J.C. and "Multi-Channel Source Separation by Beamforming Trained with Factorial HMMS" by Reyes-Gomez, M.J., Raj, B. and Ellis, D.P.W. were presented at the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

Research Areas:

Abstract: