TR2017-107

Multichannel End-to-end Speech Recognition

- Ochiai, T., Watanabe, S., Hori, T., Hershey, J.R., "Multichannel End-to-end Speech Recognition", International Conference on Machine Learning (ICML), August 2017.
  BibTeX TR2017-107 PDF
  - @inproceedings{Ochiai2017aug,
  - author = {Ochiai, Tsubasa and Watanabe, Shinji and Hori, Takaaki and Hershey, John R.},
  - title = {{Multichannel End-to-end Speech Recognition}},
  - booktitle = {International Conference on Machine Learning (ICML)},
  - year = 2017,
  - month = aug,
  - url = {https://www.merl.com/publications/TR2017-107}
  - }
Research Areas:

Artificial Intelligence, Speech & Audio

Abstract:

The field of speech recognition is in the midst of a paradigm shift: end-to-end neural networks are challenging the dominance of hidden Markov models as a core technology. Using an attention mechanism in a recurrent encoder-decoder architecture solves the dynamic time alignment problem, allowing joint end-to-end training of the acoustic and language modeling components. In this paper we extend the end-to-end framework to encompass microphone array signal processing for noise suppression and speech enhancement within the acoustic encoding network. This allows the beamforming components to be optimized jointly within the recognition architecture to improve the end-to-end speech recognition objective. Experiments on the noisy speech benchmarks (CHiME-4 and AMI) show that our multichannel end-to-end system outperformed the attention-based baseline with input from a conventional adaptive beamformer.

Related Publications

Ochiai, T., Watanabe, S., Katagiri, S., Hori, T., Hershey, J.R., "Speaker Adaptation for Multichannel End-to-End Speech Recognition", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), DOI: 10.1109/ICASSP.2018.8462161, April 2018, pp. 6707-6711.

BibTeX TR2018-006 PDF

@inproceedings{Ochiai2018apr,
author = {Ochiai, Tsubasa and Watanabe, Shinji and Katagiri, Shigeru and Hori, Takaaki and Hershey, John R.},
title = {{Speaker Adaptation for Multichannel End-to-End Speech Recognition}},
booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
year = 2018,
pages = {6707--6711},
month = apr,
doi = {10.1109/ICASSP.2018.8462161},
url = {https://www.merl.com/publications/TR2018-006}
}

Ochiai, T., Watanabe, S., Hori, T., Hershey, J.R., "Multichannel End-to-end Speech Recognition", arXiv, March 2017.

BibTeX arXiv

@article{Ochiai2017mar,
author = {Ochiai, Tsubasa and Watanabe, Shinji and Hori, Takaaki and Hershey, John R.},
title = {{Multichannel End-to-end Speech Recognition}},
journal = {arXiv},
year = 2017,
month = mar,
url = {https://arxiv.org/abs/1703.04783}
}

Research Areas:

Abstract: