TR2024-122

ZeroST: Zero-Shot Speech Translation

- Khurana, S., Hori, C., Laurent, A., Wichern, G., Le Roux, J., "ZeroST: Zero-Shot Speech Translation", Interspeech, DOI: 10.21437/Interspeech.2024-1088, September 2024, pp. 392-396.
  BibTeX TR2024-122 PDF
  - @inproceedings{Khurana2024sep,
  - author = {Khurana, Sameer and Hori, Chiori and Laurent, Antoine and Wichern, Gordon and {Le Roux}, Jonathan},
  - title = {{ZeroST: Zero-Shot Speech Translation}},
  - booktitle = {Interspeech},
  - year = 2024,
  - pages = {392--396},
  - month = sep,
  - doi = {10.21437/Interspeech.2024-1088},
  - issn = {2958-1796},
  - url = {https://www.merl.com/publications/TR2024-122}
  - }
MERL Contacts:
Research Areas:

Artificial Intelligence, Speech & Audio

Abstract:

Our work introduces the Zero-Shot Speech Translation (Ze- roST) framework, leveraging the synergistic potential of pretrained multilingual speech and text foundation models. Inspired by recent advances in multimodal foundation models, ZeroST utilizes a Query Transformer (Q-Former) to seamlessly connect a speech foundation model, such as Whisper or Massively Multilingual Speech (MMS), with a text translation model like No-Language-Left-Behind (NLLB). Our proposed learning framework enables the model to perform the speech- to-text translation in a zero-shot manner, bypassing the need for explicit supervision from expensive-to-collect speech-text translation pairs during training. Our extensive experiments, notably on the Europarl-ST benchmark, demonstrate that ZeroST achieves results comparable to those of a strong cascaded translation system and significantly outperforms baseline models. This promising approach paves the way for future research in zero-shot speech translation.

TR2024-122

ZeroST: Zero-Shot Speech Translation

MERL Contacts:

Chiori
Hori

Gordon
Wichern

Jonathan
Le Roux

Research Areas:

Abstract:

MERL Contacts:

ChioriHori

GordonWichern

JonathanLe Roux

Research Areas:

Abstract:

Chiori
Hori

Gordon
Wichern

Jonathan
Le Roux