Task-aware Unified Source Separation - Audio Examples

Unifying all source separation tasks under a single task-aware model.

MERL Researchers: Jonathan Le Roux, François Germain, Gordon Wichern (Speech & Audio).

This page provides audio examples for the Task-aware Unified Source Separation (TUSS) model introduced in the paper "Task-aware Unified Source Separation", by Kohei Saijo, Janek Ebbers, François G. Germain, Gordon Wichern, and Jonathan Le Roux, presented at the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2025 (TR2025-032).

As shown in Fig.1, the model features learnable prompts to specify what source to separate and changes its behavior based on the prompts.

The audio examples include both synthetic mixtures and real recordings.

Audio examples

We provide the audio examples of the following configurations (for some tasks, we skip the specialist model).

We utilized the medium model and M* corresponds to the ID in Table III of the paper.

  • Mixture: Input mixture.
  • Conventional (task-)specialist model (M2): Unlike the TUSS model, the model does not have any prompts. A model is trained specifically for each given task by utilizing only the data for that task and configuring the model to have the same number of output channels as that of sources in the mixtures.
  • Conventional unified model (M4): The network architecture is almost the same as the conventional specialist, but the model always outputs 4 sources. The model is trained on all the datasets for all the tasks, and is trained to output zeros when the number of sources is fewer than 4.
  • Prompting unified model(M5): Proposed TUSS model. It receives several prompts and outputs the specified sources. The number of outputs is the same as that of prompts. Note that the model was only trained with up to 4 prompts, yet some of the examples below show successful separation with 5 prompts.

Example 1 on DnR dataset

▼ Click to expand

Example 2 on DnR dataset

▼ Click to expand

Example on FMA dataset

▼ Click to expand

Example on WHAM! noise (Speech + SFX)

▼ Click to expand

Example on WHAM! noise (Drums + SFX)

▼ Click to expand


MERL Publications

  •  Saijo, K., Ebbers, J., Germain, F.G., Wichern, G., Le Roux, J., "Task-Aware Unified Source Separation", arXiv, October 2024.
    BibTeX arXiv
    • @article{Saijo2024oct,
    • author = {Saijo, Kohei and Ebbers, Janek and Germain, François G and Wichern, Gordon and {Le Roux}, Jonathan},
    • title = {{Task-Aware Unified Source Separation}},
    • journal = {arXiv},
    • year = 2024,
    • month = oct,
    • url = {https://arxiv.org/abs/2410.23987v1}
    • }
  •  Saijo, K., Ebbers, J., Germain, F.G., Wichern, G., Le Roux, J., "Task-Aware Unified Source Separation", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), March 2025.
    BibTeX TR2025-032 PDF
    • @inproceedings{Saijo2025mar,
    • author = {Saijo, Kohei and Ebbers, Janek and Germain, François G and Wichern, Gordon and {Le Roux}, Jonathan},
    • title = {{Task-Aware Unified Source Separation}},
    • booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
    • year = 2025,
    • month = mar,
    • url = {https://www.merl.com/publications/TR2025-032}
    • }