TR2017-010

Deep Clustering and Conventional Networks for Music Separation: Strong Together


    •  Luo, Y., Chen, Z., Hershey, J.R., Le Roux, J., Mesgarani, N., "Deep Clustering and Conventional Networks for Music Separation: Strong Together", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), March 2017.
      BibTeX TR2017-010 PDF
      • @inproceedings{Luo2017mar,
      • author = {Luo, Yi and Chen, Zhuo and Hershey, John R. and Le Roux, Jonathan and Mesgarani, Nima},
      • title = {Deep Clustering and Conventional Networks for Music Separation: Strong Together},
      • booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
      • year = 2017,
      • month = mar,
      • url = {https://www.merl.com/publications/TR2017-010}
      • }
  • MERL Contact:
  • Research Areas:

    Artificial Intelligence, Speech & Audio

Abstract:

Deep clustering is the first method to handle general audio separation scenarios with multiple sources of the same type and an arbitrary number of sources, performing impressively in speaker-independent speech separation tasks. However, little is known about its effectiveness in other challenging situations such as music source separation. Contrary to conventional networks that directly estimate the source signals, deep clustering generates an embedding for each time-frequency bin, and separates sources by clustering the bins in the embedding space. We show that deep clustering outperforms conventional networks on a singing voice separation task, in both matched and mismatched conditions, even though conventional networks have the advantage of end-to-end training for best signal approximation, presumably because its more flexible objective engenders better regularization. Since the strengths of deep clustering and conventional network architectures appear complementary, we explore combining them in a single hybrid network trained via an approach akin to multi-task learning. Remarkably, the combination significantly outperforms either of its components.

 

  • Related News & Events

    •  NEWS   MERL to present 10 papers at ICASSP 2017
      Date: March 5, 2017 - March 9, 2017
      Where: New Orleans
      MERL Contacts: Petros T. Boufounos; Takaaki Hori; Jonathan Le Roux; Dehong Liu; Hassan Mansour; Anthony Vetro; Ye Wang
      Research Areas: Computer Vision, Computational Sensing, Digital Video, Information Security, Speech & Audio
      Brief
      • MERL researchers will presented 10 papers at the upcoming IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP), to be held in New Orleans from March 5-9, 2017. Topics to be presented include recent advances in speech recognition and audio processing; graph signal processing; computational imaging; and privacy-preserving data analysis.

        ICASSP is the flagship conference of the IEEE Signal Processing Society, and the world's largest and most comprehensive technical conference focused on the research advances and latest technological development in signal and information processing. The event attracts more than 2000 participants each year.
    •