TR2024-124

PARIS: Pseudo-AutoRegressIve Siamese Training for Online Speech Separation


    •  Pan, Z., Wichern, G., Germain, F.G., Saijo, K., Le Roux, J., "PARIS: Pseudo-AutoRegressIve Siamese Training for Online Speech Separation", Interspeech, September 2024.
      BibTeX TR2024-124 PDF
      • @inproceedings{Pan2024sep,
      • author = {Pan, Zexu and Wichern, Gordon and Germain, François G and Saijo, Kohei and Le Roux, Jonathan}},
      • title = {PARIS: Pseudo-AutoRegressIve Siamese Training for Online Speech Separation},
      • booktitle = {Interspeech},
      • year = 2024,
      • month = sep,
      • url = {https://www.merl.com/publications/TR2024-124}
      • }
  • MERL Contacts:
  • Research Areas:

    Artificial Intelligence, Machine Learning, Speech & Audio

Abstract:

While offline speech separation models have made significant advances, the streaming regime remains less explored and is typically limited to causal modifications of existing offline net- works. This study focuses on empowering a streaming speech separation model with autoregressive capability, in which the current step separation is conditioned on separated samples from past steps. To do so, we introduce pseudo-autoregressive Siamese (PARIS) training: with only two forward passes through a Siamese-style network for each batch, PARIS avoids the training-inference mismatch in teacher forcing and the need for numerous autoregressive steps during training. The pro- posed PARIS training improves the recent online SkiM model by 1.5 dB in SI-SNR on the WSJ0-2mix dataset, with minimal change to the network architecture and inference time.