- Date: June 1, 2013
Where: International Workshop on Machine Listening in Multisource Environments (CHiME)
MERL Contact: Jonathan Le Roux
Research Area: Speech & Audio
Brief - The paper "Discriminative Methods for Noise Robust Speech Recognition: A CHiME Challenge Benchmark" by Tachioka, Y., Watanabe, S., Le Roux, J. and Hershey, J.R. was presented at the International Workshop on Machine Listening in Multisource Environments (CHiME).
-
- Date: June 1, 2013
Awarded to: Yuuki Tachioka, Shinji Watanabe, Jonathan Le Roux and John R. Hershey
Awarded for: "Discriminative Methods for Noise Robust Speech Recognition: A CHiME Challenge Benchmark"
Awarded by: International Workshop on Machine Listening in Multisource Environments (CHiME)
MERL Contact: Jonathan Le Roux
Research Area: Speech & Audio
Brief - The results of the 2nd 'CHiME' Speech Separation and Recognition Challenge are out! The team formed by MELCO researcher Yuuki Tachioka and MERL Speech & Audio team researchers Shinji Watanabe, Jonathan Le Roux and John Hershey obtained the best results in the continuous speech recognition task (Track 2). This very challenging task consisted in recognizing speech corrupted by highly non-stationary noises recorded in a real living room. Our proposal, which also included a simple yet extremely efficient denoising front-end, focused on investigating and developing state-of-the-art automatic speech recognition back-end techniques: feature transformation methods, as well as discriminative training methods for acoustic and language modeling. Our system significantly outperformed other participants. Our code has since been released as an improved baseline for the community to use.
-
- Date: June 1, 2013
MERL Contact: Jonathan Le Roux
Research Area: Speech & Audio
Brief - The results of the 2nd CHiME Speech Separation and Recognition Challenge are out! The team formed by MELCO researcher Yuuki Tachioka and MERL Speech & Audio team researchers Shinji Watanabe, Jonathan Le Roux and John Hershey obtained the best results in the continuous speech recognition task (Track 2). This very challenging task consisted in recognizing speech corrupted by highly non-stationary noises recorded in a real living room. Our proposal, which also included a simple yet extremely efficient denoising front-end, focused on investigating and developing state-of-the-art automatic speech recognition back-end techniques: feature transformation methods, as well as discriminative training methods for acoustic and language modeling. Our system significantly outperformed other participants. Our code has since been released as an improved baseline for the community to use.
-
- Date: May 26, 2013
Where: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
MERL Contacts: Dehong Liu; Jianlin Guo; Anthony Vetro; Petros T. Boufounos; Jonathan Le Roux Brief - The papers "Stereo-based Feature Enhancement Using Dictionary Learning" by Watanabe, S. and Hershey, J.R., "Effectiveness of Discriminative Training and Feature Transformation for Reverberated and Noisy Speech" by Tachioka, Y., Watanabe, S. and Hershey, J.R., "Non-negative Dynamical System with Application to Speech and Audio" by Fevotte, C., Le Roux, J. and Hershey, J.R., "Source Localization in Reverberant Environments using Sparse Optimization" by Le Roux, J., Boufounos, P.T., Kang, K. and Hershey, J.R., "A Keypoint Descriptor for Alignment-Free Fingerprint Matching" by Garg, R. and Rane, S., "Transient Disturbance Detection for Power Systems with a General Likelihood Ratio Test" by Song, JX., Sahinoglu, Z. and Guo, J., "Disparity Estimation of Misaligned Images in a Scanline Optimization Framework" by Rzeszutek, R., Tian, D. and Vetro, A., "Screen Content Coding for HEVC Using Edge Modes" by Hu, S., Cohen, R.A., Vetro, A. and Kuo, C.C.J. and "Random Steerable Arrays for Synthetic Aperture Imaging" by Liu, D. and Boufounos, P.T. were presented at the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP).
-
- Date & Time: Tuesday, May 7, 2013; 2:30 PM
Speaker: Dr. Yotaro Kubo, NTT Communication Science Laboratories, Kyoto, Japan
Research Area: Speech & Audio
Abstract - Kernel methods are important to realize both convexity in estimation and ability to represent nonlinear classification. However, in automatic speech recognition fields, kernel methods are not widely used conventionally. In this presentation, I will introduce several attempts to practically incorporate kernel methods into acoustic models for automatic speech recognition. The presentation will consist of two parts. The first part will describes maximum entropy discrimination and its application to a kernel machine training. The second part will describes dimensionality reduction of kernel-based features.
-
- Date & Time: Monday, January 28, 2013; 11:00 AM
Speaker: Prof. Jen-Tzung Chien, National Chiao Tung University, Taiwan
Research Area: Speech & Audio
Abstract - Bayesian learning provides attractive tools to model, analyze, search, recognize and understand real-world data. In this talk, I will introduce a new Bayesian group sparse learning and its application on speech recognition and signal separation. First of all, I present the group sparse hidden Markov models (GS-HMMs) where a sequence of acoustic features is driven by Markov chain and each feature vector is represented by two groups of basis vectors. The features across states and within states are represented accordingly. The sparse prior is imposed by introducing the Laplacian scale mixture (LSM) distribution. The robustness of speech recognition is illustrated. On the other hand, the LSM distribution is also incorporated into Bayesian group sparse learning based on the nonnegative matrix factorization (NMF). This approach is developed to estimate the reconstructed rhythmic and harmonic music signals from single-channel source signal. The Monte Carlo procedure is presented to infer two groups of parameters. The future work of Bayesian learning shall be discussed.
-
- Date & Time: Tuesday, December 11, 2012; 12:00 PM
Speaker: Takahiro Oku, NHK Science & Technology Research Laboratories
Research Area: Speech & Audio
Abstract - In this talk, I will present human-friendly broadcasting research conducted in NHK and research on speech recognition for real-time closed-captioning. The goal of human-friendly broadcasting research is to make broadcasting more accessible and enjoyable for everyone, including children, elderly, and physically challenged persons. The automatic speech recognition technology that NHK has developed makes it possible to create captions for the hearing impaired in real-time automatically. For sports programs such as professional sumo wrestling, a closed-captioning system has already been implemented in which captions are created by using speech recognition on a captioning re-speaker. In 2011, NHK General Television started broadcasting of closed captions for the information program "Morning Market". After the introduction of the implemented closed-captioning system, I will talk about our recent improvement obtained by an adaptation method that creates a more effective acoustic model using error correction results. The method reflects recognition error tendencies more effectively.
-
- Date: December 6, 2012
Where: APSIPA Transactions on Signal and Information Processing
Research Area: Speech & Audio
Brief - The article "Bayesian Approaches to Acoustic Modeling: A Review" by Watanabe, S. and Nakamura, A. was published in APSIPA Transactions on Signal and Information Processing.
-
- Date: November 1, 2012
Where: IEEE Signal Processing Magazine
Research Area: Speech & Audio
Brief - The article "Structured Discriminative Models For Speech Recognition" by Gales, M., Watanabe, S. and Fosler-Lussier, E. was published in IEEE Signal Processing Magazine.
-
- Date & Time: Thursday, September 6, 2012; 12:00 PM
Speaker: Dr. Daisuke Saito, The University of Tokyo
Research Area: Speech & Audio
Abstract - In voice conversion studies, realization of conversion from/to an arbitrary speaker's voice is one of the important objectives. For this purpose, eigenvoice conversion (EVC) based on an eigenvoice Gaussian mixture model (EV-GMM) was proposed. In the EVC, similarly to speaker recognition approaches, a speaker space is constructed based on GMM supervectors which are high-dimensional vectors derived by concatenating the mean vectors of each of the speaker GMMs. In the speaker space, each speaker is represented by a small number of weight parameters of eigen-supervectors. In this talk, we revisit construction of the speaker space by introducing the tensor analysis of training data set. In our approach, each speaker is represented as a matrix of which the row and the column respectively correspond to the Gaussian component and the dimension of the mean vector, and the speaker space is derived by the tensor analysis of the set of the matrices. Our approach can solve an inherent problem of supervector representation, and it improves the performance of voice conversion. Experimental results of one-to-many voice conversion demonstrate the effectiveness of the proposed approach.
-
- Date: March 15, 2011
Where: Machine Vision and Applications
Research Area: Machine Learning
Brief - The article "In-vehicle Camera Traffic Sign Detection and Recognition" by Ruta, A., Porikli, F.M., Watanabe, S. and Li, Y. was published in Machine Vision and Applications.
-
- Date: May 20, 2009
Where: IAPR Conference on Machine vision Applications (MVA)
Research Area: Machine Learning
Brief - The paper "A New Approach for In-Vehicle Camera Traffic Sign Detection and Recognition" by Ruta, A., Porikli, F., Li, Y., Watanabe, S., Kage, H. and Sumi, K. was presented at the IAPR Conference on Machine vision Applications (MVA).
-