TR2021-094

Dual Causal/Non-Causal Self-Attention for Streaming End-to-End Speech Recognition

- Moritz, N., Hori, T., Le Roux, J., "Dual Causal/Non-Causal Self-Attention for Streaming End-to-End Speech Recognition", Interspeech, DOI: 10.21437/Interspeech.2021-1693, August 2021, pp. 1822-1826.
  BibTeX TR2021-094 PDF
  - @inproceedings{Moritz2021aug,
  - author = {Moritz, Niko and Hori, Takaaki and {Le Roux}, Jonathan},
  - title = {{Dual Causal/Non-Causal Self-Attention for Streaming End-to-End Speech Recognition}},
  - booktitle = {Interspeech},
  - year = 2021,
  - pages = {1822--1826},
  - month = aug,
  - doi = {10.21437/Interspeech.2021-1693},
  - url = {https://www.merl.com/publications/TR2021-094}
  - }
MERL Contact:
- Jonathan
  Le Roux
Research Areas:

Artificial Intelligence, Machine Learning, Speech & Audio

Abstract:

Attention-based end-to-end automatic speech recognition (ASR) systems have recently demonstrated state-of-the-art results for numerous tasks. However, the application of self-attention and attention-based encoder-decoder models remains challenging for streaming ASR, where each word must be recognized shortly after it was spoken. In this work, we present the dual causal/non-causal self-attention (DCN) architecture, which analyzes a fixed number of look-ahead frames at each self-attention layer of a deep neural network, without causing the overall context to grow beyond the look-ahead of a single layer when using multiple DCN layers. DCN is compared to chunk-based and restricted self-attention using streaming transformer and conformer architectures, showing improved ASR performance over restricted self-attention and competitive ASR results compared to chunk-based self-attention, while providing the advantage of frame-synchronously processing input and output frames. Combined with triggered attention, the proposed streaming end-to-end ASR systems obtained state-of-the-art results on the LibriSpeech, HKUST, and Switchboard ASR tasks.

Related Publication

Moritz, N., Hori, T., Le Roux, J., "Dual Causal/Non-Causal Self-Attention for Streaming End-to-End Speech Recognition", arXiv, July 2021.

BibTeX arXiv

@article{Moritz2021jul,
author = {Moritz, Niko and Hori, Takaaki and {Le Roux}, Jonathan},
title = {{Dual Causal/Non-Causal Self-Attention for Streaming End-to-End Speech Recognition}},
journal = {arXiv},
year = 2021,
month = jul,
url = {https://arxiv.org/abs/2107.01269}
}

MERL Contact:

JonathanLe Roux

Research Areas:

Abstract:

Jonathan
Le Roux