Deep NMF for Speech Separation


Non-negative matrix factorization (NMF) has been widely used for challenging single-channel audio source separation tasks. However, inference in NMF-based models relies on iterative inference methods, typically formulated as multiplicative updates. We propose "deep NMF", a novel non-negative deep network architecture which results from unfolding the NMF iterations and untying its parameters. This architecture can be discriminatively trained for optimal separation performance. To optimize its non-negative parameters, we show how a new form of back-propagation, based on multiplicative updates, can be used to preserve non-negativity, without the need for constrained optimization. We show on a challenging speech separation task that deep NMF improves in terms of accuracy upon NMF and is competitive with conventional sigmoid deep neural networks, while requiring a tenth of the number of parameters.


  • Related News & Events

    •  NEWS   John Hershey gives talk at MIT on Deep Unfolding
      Date: April 28, 2015
      • MERL researcher and speech team leader, John Hershey, gave a talk at MIT entitled, "Deep Unfolding: Deriving Novel Deep Network Architectures from Model-based Inference Methods" on April 28, 2015.

        Abstract: Model-based methods and deep neural networks have both been tremendously successful paradigms in machine learning. In model-based methods, problem domain knowledge can be built into the constraints of the model, typically at the expense of difficulties during inference. In contrast, deterministic deep neural networks are constructed in such a way that inference is straightforward, but their architectures are rather generic and it can be unclear how to incorporate problem domain knowledge. This work aims to obtain some of the advantages of both approaches. To do so, we start with a model-based approach and unfold the iterations of its inference method to form a layer-wise structure. This results in novel neural-network-like architectures that incorporate our model-based constraints, but can be trained discriminatively to perform fast and accurate inference. This framework allows us to view conventional sigmoid networks as a special case of unfolding Markov random field inference, and leads to other interesting generalizations. We show how it can be applied to other models, such as non-negative matrix factorization, to obtain a new kind of non-negative deep neural network that can be trained using a multiplicative back propagation-style update algorithm. In speech enhancement experiments we show that our approach is competitive with conventional neural networks, while using fewer parameters.
    •  NEWS   Multimedia Group researchers presented 8 papers at ICASSP 2015
      Date: April 19, 2015 - April 24, 2015
      Where: IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP)
      MERL Contacts: Anthony Vetro; Hassan Mansour; Petros T. Boufounos; Jonathan Le Roux
      • Multimedia Group researchers have presented 8 papers at the recent IEEE International Conference on Acoustics, Speech & Signal Processing, which was held in Brisbane, Australia from April 19-24, 2015.