TR2013-020

Effectiveness of Discriminative Training and Feature Transformation for Reverberated and Noisy Speech


    •  Tachioka, Y., Watanabe, S., Hershey, J.R., "Effectiveness of Discriminative Training and Feature Transformation for Reverberated and Noisy Speech", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2013.
      BibTeX TR2013-020 PDF
      • @inproceedings{Tachioka2013may,
      • author = {Tachioka, Y. and Watanabe, S. and Hershey, J.R.},
      • title = {Effectiveness of Discriminative Training and Feature Transformation for Reverberated and Noisy Speech},
      • booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
      • year = 2013,
      • month = may,
      • url = {https://www.merl.com/publications/TR2013-020}
      • }
  • Research Areas:

    Artificial Intelligence, Speech & Audio

Abstract:

Automatic speech recognition in the presence of non-stationary interference and reverberation remains a challenging problem. The 2nd Annual Speech Separation and Recognition Challenge introduces a new and difficult task with time-varying reverberation and non-stationary interference including natural background speech, home noises, or music. This paper establishes baselines using state-of-the-art ASR techniques such as discriminative training and various feature transformation on the middle-vocabulary sub-task of this challenge. In addition, we propose an augmented discriminative feature transformation that introduces arbitrary features to a discriminative feature transformation. We present experimental results showing that discriminative training of model parameters and feature transforms is highly effective for this task, and that the augmented feature transformation provides some preliminary benefits. The training code will be released as an advanced ASR baseline.

 

  • Related News & Events

    •  NEWS    ICASSP 2013: 9 publications by Jonathan Le Roux, Dehong Liu, Robert A. Cohen, Dong Tian, Shantanu D. Rane, Jianlin Guo, John R. Hershey, Shinji Watanabe, Petros T. Boufounos, Zafer Sahinoglu and Anthony Vetro
      Date: May 26, 2013
      Where: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
      MERL Contacts: Dehong Liu; Jianlin Guo; Anthony Vetro; Petros T. Boufounos; Jonathan Le Roux
      Brief
      • The papers "Stereo-based Feature Enhancement Using Dictionary Learning" by Watanabe, S. and Hershey, J.R., "Effectiveness of Discriminative Training and Feature Transformation for Reverberated and Noisy Speech" by Tachioka, Y., Watanabe, S. and Hershey, J.R., "Non-negative Dynamical System with Application to Speech and Audio" by Fevotte, C., Le Roux, J. and Hershey, J.R., "Source Localization in Reverberant Environments using Sparse Optimization" by Le Roux, J., Boufounos, P.T., Kang, K. and Hershey, J.R., "A Keypoint Descriptor for Alignment-Free Fingerprint Matching" by Garg, R. and Rane, S., "Transient Disturbance Detection for Power Systems with a General Likelihood Ratio Test" by Song, JX., Sahinoglu, Z. and Guo, J., "Disparity Estimation of Misaligned Images in a Scanline Optimization Framework" by Rzeszutek, R., Tian, D. and Vetro, A., "Screen Content Coding for HEVC Using Edge Modes" by Hu, S., Cohen, R.A., Vetro, A. and Kuo, C.C.J. and "Random Steerable Arrays for Synthetic Aperture Imaging" by Liu, D. and Boufounos, P.T. were presented at the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP).
    •