News & Events

NEWS Jonathan Le Roux gives Plenary Lecture at the JSALT 2020 Summer Workshop
Date: July 10, 2020
Where: Virtual Baltimore, MD
MERL Contact: Jonathan Le Roux
Research Areas: Artificial Intelligence, Machine Learning, Speech & Audio
Brief
- MERL Senior Principal Research Scientist and Speech and Audio Senior Team Leader Jonathan Le Roux was invited by the Center for Language and Speech Processing at Johns Hopkins University to give a plenary lecture at the 2020 Frederick Jelinek Memorial Summer Workshop on Speech and Language Technology (JSALT). The talk, entitled "Deep Learning for Multifarious Speech Processing: Tackling Multiple Speakers, Microphones, and Languages", presented an overview of deep learning techniques developed at MERL towards the goal of cracking the Tower of Babel version of the cocktail party problem, that is, separating and/or recognizing the speech of multiple unknown speakers speaking simultaneously in multiple languages, in both single-channel and multi-channel scenarios: from deep clustering to chimera networks, phasebook and friends, and from seamless ASR to MIMO-Speech and Transformer-based multi-speaker ASR.
  
  JSALT 2020 is the seventh in a series of six-week-long research workshops on Machine Learning for Speech Language and Computer Vision Technology. A continuation of the well known Johns Hopkins University summer workshops, these workshops bring together diverse "dream teams" of leading professionals, graduate students, and undergraduates, in a truly cooperative, intensive, and substantive effort to advance the state of the science. MERL researchers led such teams in the JSALT 2015 workshop, on "Far-Field Speech Enhancement and Recognition in Mismatched Settings", and the JSALT 2018 workshop, on "Multi-lingual End-to-End Speech Recognition for Incomplete Data".
NEWS MERL presenting 13 papers and an industry talk at ICASSP 2020
Date: May 4, 2020 - May 8, 2020
Where: Virtual Barcelona
MERL Contacts: Petros T. Boufounos; Chiori Hori; Toshiaki Koike-Akino; Jonathan Le Roux; Dehong Liu; Yanting Ma; Hassan Mansour; Philip V. Orlik; Anthony Vetro; Pu (Perry) Wang; Gordon Wichern
Research Areas: Computational Sensing, Computer Vision, Machine Learning, Signal Processing, Speech & Audio
Brief
- MERL researchers are presenting 13 papers at the IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP), which is being held virtually from May 4-8, 2020. Petros Boufounos is also presenting a talk on the Computational Sensing Revolution in Array Processing (video) in ICASSP’s Industry Track, and Siheng Chen is co-organizing and chairing a special session on a Signal-Processing View of Graph Neural Networks.
  
  Topics to be presented include recent advances in speech recognition, audio processing, scene understanding, computational sensing, array processing, and parameter estimation. Videos for all talks are available on MERL's YouTube channel, with corresponding links in the references below.
  
  This year again, MERL is a sponsor of the conference and will be participating in the Student Job Fair; please join us to learn about our internship program and career opportunities.
  
  ICASSP is the flagship conference of the IEEE Signal Processing Society, and the world's largest and most comprehensive technical conference focused on the research advances and latest technological development in signal and information processing. The event attracts more than 2000 participants each year. Originally planned to be held in Barcelona, Spain, ICASSP has moved to a fully virtual setting due to the COVID-19 crisis, with free registration for participants not covering a paper.
AWARD Best Paper Award at the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2019
Date: December 18, 2019
Awarded to: Xuankai Chang, Wangyou Zhang, Yanmin Qian, Jonathan Le Roux, Shinji Watanabe
MERL Contact: Jonathan Le Roux
Research Areas: Artificial Intelligence, Machine Learning, Speech & Audio
Brief
- MERL researcher Jonathan Le Roux and co-authors Xuankai Chang, Shinji Watanabe (Johns Hopkins University), Wangyou Zhang, and Yanmin Qian (Shanghai Jiao Tong University) won the Best Paper Award at the 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2019), for the paper "MIMO-Speech: End-to-End Multi-Channel Multi-Speaker Speech Recognition". MIMO-Speech is a fully neural end-to-end framework that can transcribe the text of multiple speakers speaking simultaneously from multi-channel input. The system is comprised of a monaural masking network, a multi-source neural beamformer, and a multi-output speech recognition model, which are jointly optimized only via an automatic speech recognition (ASR) criterion. The award was received by lead author Xuankai Chang during the conference, which was held in Sentosa, Singapore from December 14-18, 2019.
EVENT SANE 2019 - Speech and Audio in the Northeast
Date: Thursday, October 24, 2019
Location: Columbia University, New York, NY
MERL Contact: Jonathan Le Roux
Research Areas: Artificial Intelligence, Machine Learning, Speech & Audio
Brief
- SANE 2019, a one-day event gathering researchers and students in speech and audio from the Northeast of the American continent, was held on Thursday October 24, 2019 at Columbia University, in New York City.
  
  It was the 8th edition in the SANE series of workshops, which started in 2012 and has been held every year alternately in Boston and New York. Since the first edition, the audience has steadily grown, with a previous record of 180 participants in 2017 and 2018, and a new record of 200 participants and 45 posters in 2019.
  
  This year's SANE conveniently took place in conjunction both with the WASPAA workshop, held October 20-23 in upstate New York, and with the DCASE workshop, held October 25-26 in Brooklyn, NY, for a full week of speech and audio enlightenment and delight.
  
  SANE 2019 featured invited talks by seven leading researchers from the Northeast as well as from the international community: Brian Kingsbury (IBM TJ Watson Research Center), Kristen Grauman (University of Texas at Austin, Facebook AI Research), Simon Doclo (University of Oldenburg), Karen Livescu (TTI-Chicago), Gabriel Synnaeve (Facebook AI Research), Hirokazu Kameoka (NTT Communication Science Laboratories), Ron Weiss (Google Brain). It also featured live demonstrations by Jonathan Le Roux (MERL) and Andrew Titus (Apple), and a lively poster session with 45 posters in Columbia University's Low Memorial Library, a National Historic Landmark.
  
  SANE 2019 was co-organized by Jonathan Le Roux (MERL), Nima Mesgarani (Columbia), John R. Hershey (Google), Shinji Watanabe (Johns Hopkins), and Steven J. Rennie (Pryon Inc.). SANE remained a free event thanks to generous sponsorship by Columbia University, MERL, Google, Apple, and Amazon.
  
  Slides and videos of the talks are available from the SANE workshop website.
NEWS MERL Speech & Audio Researchers Presenting 7 Papers and a Tutorial at Interspeech 2019
Date: September 15, 2019 - September 19, 2019
Where: Graz, Austria
MERL Contacts: Chiori Hori; Jonathan Le Roux; Gordon Wichern
Research Areas: Artificial Intelligence, Machine Learning, Speech & Audio
Brief
- MERL Speech & Audio Team researchers will be presenting 7 papers at the 20th Annual Conference of the International Speech Communication Association INTERSPEECH 2019, which is being held in Graz, Austria from September 15-19, 2019. Topics to be presented include recent advances in end-to-end speech recognition, speech separation, and audio-visual scene-aware dialog. Takaaki Hori is also co-presenting a tutorial on end-to-end speech processing.
  
  Interspeech is the world's largest and most comprehensive conference on the science and technology of spoken language processing. It gathers around 2000 participants from all over the world.
NEWS MERL presenting 16 papers at ICASSP 2019
Date: May 12, 2019 - May 17, 2019
Where: Brighton, UK
MERL Contacts: Petros T. Boufounos; Anoop Cherian; Chiori Hori; Toshiaki Koike-Akino; Jonathan Le Roux; Dehong Liu; Hassan Mansour; Tim K. Marks; Philip V. Orlik; Anthony Vetro; Pu (Perry) Wang; Gordon Wichern
Research Areas: Computational Sensing, Computer Vision, Machine Learning, Signal Processing, Speech & Audio
Brief
- MERL researchers will be presenting 16 papers at the IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP), which is being held in Brighton, UK from May 12-17, 2019. Topics to be presented include recent advances in speech recognition, audio processing, scene understanding, computational sensing, and parameter estimation. MERL is also a sponsor of the conference and will be participating in the student career luncheon; please join us at the lunch to learn about our internship program and career opportunities.
  
  ICASSP is the flagship conference of the IEEE Signal Processing Society, and the world's largest and most comprehensive technical conference focused on the research advances and latest technological development in signal and information processing. The event attracts more than 2000 participants each year.
NEWS MERL's seamless speech recognition technology featured in Mitsubishi Electric Corporation press release
Date: February 13, 2019
Where: Tokyo, Japan
MERL Contacts: Jonathan Le Roux; Gordon Wichern
Research Area: Speech & Audio
Brief
- Mitsubishi Electric Corporation announced that it has developed the world's first technology capable of highly accurate multilingual speech recognition without being informed which language is being spoken. The novel technology, Seamless Speech Recognition, incorporates Mitsubishi Electric's proprietary Maisart compact AI technology and is built on a single system that can simultaneously identify and understand spoken languages. In tests involving 5 languages, the system achieved recognition with over 90 percent accuracy, without being informed which language was being spoken. When incorporating 5 more languages with lower resources, accuracy remained above 80 percent. The technology can also understand multiple people speaking either the same or different languages simultaneously. A live demonstration involving a multilingual airport guidance system took place on February 13 in Tokyo, Japan. It was widely covered by the Japanese media, with reports by all six main Japanese TV stations and multiple articles in print and online newspapers, including in Japan's top newspaper, Asahi Shimbun. The technology is based on recent research by MERL's Speech and Audio team.
  
  Link:
  
  Mitsubishi Electric Corporation Press Release
  
  Media Coverage:
  
  NHK, News (Japanese)
  NHK World, News (English), video report (starting at 4'38")
  TV Asahi, ANN news (Japanese)
  Nippon TV, News24 (Japanese)
  Fuji TV, Prime News Alpha (Japanese)
  TV Tokyo, World Business Satellite (Japanese)
  TV Tokyo, Morning Satellite (Japanese)
  TBS, News, N Studio (Japanese)
  The Asahi Shimbun (Japanese)
  The Nikkei Shimbun (Japanese)
  Nikkei xTech (Japanese)
  Response (Japanese).
EVENT SANE 2018 - Speech and Audio in the Northeast
Date: Thursday, October 18, 2018
Location: Google, Cambridge, MA
MERL Contact: Jonathan Le Roux
Research Area: Speech & Audio
Brief
- SANE 2018, a one-day event gathering researchers and students in speech and audio from the Northeast of the American continent, will be held on Thursday October 18, 2018 at Google, in Cambridge, MA. MERL is one of the organizers and sponsors of the workshop.
  
  It is the 7th edition in the SANE series of workshops, which started at MERL in 2012. Since the first edition, the audience has steadily grown, with a record 180 participants in 2017.
  
  SANE 2018 will feature invited talks by leading researchers from the Northeast, as well as from the international community. It will also feature a lively poster session, open to both students and researchers.
NEWS Takaaki Hori leads speech technology workshop
Date: June 25, 2018 - August 3, 2018
Where: Johns Hopkins University, Baltimore, MD
MERL Contact: Jonathan Le Roux
Research Area: Speech & Audio
Brief
- MERL Speech & Audio Team researcher Takaaki Hori led a team of 27 senior researchers and Ph.D. students from different organizations around the world, working on "Multi-lingual End-to-End Speech Recognition for Incomplete Data" as part of the Jelinek Memorial Summer Workshop on Speech and Language Technology (JSALT). The JSALT workshop is a renowned 6-week hands-on workshop held yearly since 1995. This year, the workshop was held at Johns Hopkins University in Baltimore from June 25 to August 3, 2018. Takaaki's team developed new methods for end-to-end Automatic Speech Recognition (ASR) with a focus on low-resource languages with limited labelled data.
  
  End-to-end ASR can significantly reduce the burden of developing ASR systems for new languages, by eliminating the need for linguistic information such as pronunciation dictionaries. Some end-to-end systems have recently achieved performance comparable to or better than conventional systems in several tasks. However, the current model training algorithms basically require paired data, i.e., speech data and the corresponding transcription. Sufficient amount of such complete data is usually unavailable for minor languages, and creating such data sets is very expensive and time consuming.
  
  The goal of Takaaki's team project was to expand the applicability of end-to-end models to multilingual ASR, and to develop new technology that would make it possible to build highly accurate systems even for low-resource languages without a large amount of paired data. Some major accomplishments of the team include building multi-lingual end-to-end ASR systems for 17 languages, developing novel architectures and training methods for end-to-end ASR, building end-to-end ASR-TTS (Text-to-speech) chain for unpaired data training, and developing ESPnet, an open-source end-to-end speech processing toolkit. Three papers stemming from the team's work have already been accepted to the 2018 IEEE Spoken Language Technology Workshop (SLT), with several more to be submitted to upcoming conferences.
AWARD Best Student Paper Award at IEEE ICASSP 2018
Date: April 17, 2018
Awarded to: Zhong-Qiu Wang
MERL Contact: Jonathan Le Roux
Research Area: Speech & Audio
Brief
- Former MERL intern Zhong-Qiu Wang (Ph.D. Candidate at Ohio State University) has received a Best Student Paper Award at the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2018) for the paper "Multi-Channel Deep Clustering: Discriminative Spectral and Spatial Embeddings for Speaker-Independent Speech Separation" by Zhong-Qiu Wang, Jonathan Le Roux, and John Hershey. The paper presents work performed during Zhong-Qiu's internship at MERL in the summer 2017, extending MERL's pioneering Deep Clustering framework for speech separation to a multi-channel setup. The award was received on behalf on Zhong-Qiu by MERL researcher and co-author Jonathan Le Roux during the conference, held in Calgary April 15-20.
NEWS MERL presenting 9 papers at ICASSP 2018
Date: April 15, 2018 - April 20, 2018
Where: Calgary, AB
MERL Contacts: Petros T. Boufounos; Toshiaki Koike-Akino; Jonathan Le Roux; Dehong Liu; Hassan Mansour; Philip V. Orlik; Pu (Perry) Wang
Research Areas: Computational Sensing, Digital Video, Speech & Audio
Brief
- MERL researchers are presenting 9 papers at the IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP), which is being held in Calgary from April 15-20, 2018. Topics to be presented include recent advances in speech recognition, audio processing, and computational sensing. MERL is also a sponsor of the conference.
  
  ICASSP is the flagship conference of the IEEE Signal Processing Society, and the world's largest and most comprehensive technical conference focused on the research advances and latest technological development in signal and information processing. The event attracts more than 2000 participants each year.
TALK Theory and Applications of Sparse Model-Based Recurrent Neural Networks
Date & Time: Tuesday, March 6, 2018; 12:00 PM
Speaker: Scott Wisdom, Affectiva
MERL Host: Jonathan Le Roux
Research Area: Speech & Audio
Abstract
- Recurrent neural networks (RNNs) are effective, data-driven models for sequential data, such as audio and speech signals. However, like many deep networks, RNNs are essentially black boxes; though they are effective, their weights and architecture are not directly interpretable by practitioners. A major component of my dissertation research is explaining the success of RNNs and constructing new RNN architectures through the process of "deep unfolding," which can construct and explain deep network architectures using an equivalence to inference in statistical models. Deep unfolding yields principled initializations for training deep networks, provides insight into their effectiveness, and assists with interpretation of what these networks learn.
  
  In particular, I will show how RNNs with rectified linear units and residual connections are a particular deep unfolding of a sequential version of the iterative shrinkage-thresholding algorithm (ISTA), a simple and classic algorithm for solving L1-regularized least-squares. This equivalence allows interpretation of state-of-the-art unitary RNNs (uRNNs) as an unfolded sparse coding algorithm. I will also describe a new type of RNN architecture called deep recurrent nonnegative matrix factorization (DR-NMF). DR-NMF is an unfolding of a sparse NMF model of nonnegative spectrograms for audio source separation. Both of these networks outperform conventional LSTM networks while also providing interpretability for practitioners.
NEWS MERL's speech research featured in NPR's All Things Considered
Date: February 5, 2018
Where: National Public Radio (NPR)
MERL Contact: Jonathan Le Roux
Research Area: Speech & Audio
Brief
- MERL's speech separation technology was featured in NPR's All Things Considered, as part of an episode of All Tech Considered on artificial intelligence, "Can Computers Learn Like Humans?". An example separating the overlapped speech of two of the show's hosts was played on the air.
  The technology is based on a proprietary deep learning method called Deep Clustering. It is the world's first technology that separates in real time the simultaneous speech of multiple unknown speakers recorded with a single microphone. It is a key step towards building machines that can interact in noisy environments, in the same way that humans can have meaningful conversations in the presence of many other conversations.
  A live demonstration was featured in Mitsubishi Electric Corporation's Annual R&D Open House last year, and was also covered in international media at the time.
  
  (Photo credit: Sam Rowe for NPR)
  
  Link:
  "Can Computers Learn Like Humans?" (NPR, All Things Considered)
  MERL Deep Clustering Demo.
NEWS MERL presents 3 papers at ASRU 2017, John Hershey serves as general chair
Date: December 16, 2017 - December 20, 2017
Where: Okinawa, Japan
MERL Contacts: Chiori Hori; Jonathan Le Roux
Research Area: Speech & Audio
Brief
- MERL presented three papers at the 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), which was held in Okinawa, Japan from December 16-20, 2017. ASRU is the premier speech workshop, bringing together researchers from academia and industry in an intimate and collegial setting. More than 270 people attended the event this year, a record number. MERL's Speech and Audio Team was a key part of the organization of the workshop, with John Hershey serving as General Chair, Chiori Hori as Sponsorship Chair, and Jonathan Le Roux as Demonstration Chair. Two of the papers by MERL were selected among the 10 finalists for the best paper award. Mitsubishi Electric and MERL were also Platinum sponsors of the conference, with MERL awarding the MERL Best Student Paper Award.
EVENT SANE 2017 - Speech and Audio in the Northeast
Date: Thursday, October 19, 2017
Location: Google, New York, NY
MERL Contact: Jonathan Le Roux
Research Areas: Artificial Intelligence, Machine Learning, Speech & Audio
Brief
- SANE 2017, a one-day event gathering researchers and students in speech and audio from the Northeast of the American continent, was held on Thursday October 19, 2017 at Google, in New York, NY. It broke the attendance record for a SANE event, with 180 participants.
  
  It was a follow-up to SANE 2012, held at Mitsubishi Electric Research Labs (MERL), SANE 2013, held at Columbia University, SANE 2014, held at MIT CSAIL, SANE 2015, (already!) held at Google NY, and SANE 2016, held at MIT's McGovern Institute for Brain Research. Since the first edition, the audience has steadily grown, gathering over 100 researchers and students in recent editions.
  
  As in 2013 and 2015, this year's SANE took place in conjunction with the WASPAA workshop, held October 15-18 in upstate New York. Many WASPAA attendees (around 70!) also attended SANE.
  
  SANE 2017 featured invited talks by seven leading researchers from the Northeast and beyond: Sacha Krstulović (Audio Analytic), Yusuf Aytar (Google DeepMind), Florian Metze (CMU), Gunnar Evermann (Apple), Eric Humphrey (Spotify), Aaron Courville (University of Montreal), Aäron van den Oord (Google DeepMind). It also featured a live demo session with presentations by Jonathan Le Roux (MERL), Dan Ellis (Google), Arlo Faria (Remeeting), Tatsuya Komatsu (NEC), and a lively poster session with 26 posters.
  
  SANE 2017 was co-organized by Jonathan Le Roux (MERL), Dan Ellis (Google), Michael I. Mandel (CUNY), Hank Liao (Google), and John R. Hershey (MERL). SANE remained a free event thanks to generous sponsorship by Google and MERL.
  
  Slides and videos of the talks are available from the SANE workshop website.
NEWS MERL's breakthrough speech separation technology featured in Mitsubishi Electric Corporation's Annual R&D Open House
Date: May 24, 2017
Where: Tokyo, Japan
MERL Contact: Jonathan Le Roux
Research Area: Speech & Audio
Brief
- Mitsubishi Electric Corporation announced that it has created the world's first technology that separates in real time the simultaneous speech of multiple unknown speakers recorded with a single microphone. It's a key step towards building machines that can interact in noisy environments, in the same way that humans can have meaningful conversations in the presence of many other conversations. In tests, the simultaneous speeches of two and three people were separated with up to 90 and 80 percent accuracy, respectively. The novel technology, which was realized with Mitsubishi Electric's proprietary "Deep Clustering" method based on artificial intelligence (AI), is expected to contribute to more intelligible voice communications and more accurate automatic speech recognition. A characteristic feature of this approach is its versatility, in the sense that voices can be separated regardless of their language or the gender of the speakers. A live speech separation demonstration that took place on May 24 in Tokyo, Japan, was widely covered by the Japanese media, with reports by three of the main Japanese TV stations and multiple articles in print and online newspapers. The technology is based on recent research by MERL's Speech and Audio team.
  
  Links:
  Mitsubishi Electric Corporation Press Release
  MERL Deep Clustering Demo
  
  Media Coverage:
  
  Fuji TV, News, "Minna no Mirai" (Japanese)
  The Nikkei (Japanese)
  Nikkei Technology Online (Japanese)
  Sankei Biz (Japanese)
  EE Times Japan (Japanese)
  ITpro (Japanese)
  Nikkan Sports (Japanese)
  Nikkan Kogyo Shimbun (Japanese)
  Dempa Shimbun (Japanese)
  Il Sole 24 Ore (Italian)
  IEEE Spectrum (English).
NEWS MERL to present 10 papers at ICASSP 2017
Date: March 5, 2017 - March 9, 2017
Where: New Orleans
MERL Contacts: Petros T. Boufounos; Jonathan Le Roux; Dehong Liu; Hassan Mansour; Anthony Vetro; Ye Wang
Research Areas: Computer Vision, Computational Sensing, Digital Video, Information Security, Speech & Audio
Brief
- MERL researchers will presented 10 papers at the upcoming IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP), to be held in New Orleans from March 5-9, 2017. Topics to be presented include recent advances in speech recognition and audio processing; graph signal processing; computational imaging; and privacy-preserving data analysis.
  
  ICASSP is the flagship conference of the IEEE Signal Processing Society, and the world's largest and most comprehensive technical conference focused on the research advances and latest technological development in signal and information processing. The event attracts more than 2000 participants each year.
EVENT John Hershey to present tutorial at the 2016 IEEE SLT Workshop
Date: Tuesday, December 13, 2016
Location: 2016 IEEE Spoken Language Technology Workshop, San Diego, California
Speaker: John Hershey, MERL
MERL Contact: Jonathan Le Roux
Research Areas: Machine Learning, Speech & Audio
Brief
- MERL researcher John Hershey presents an invited tutorial at the 2016 IEEE Workshop on Spoken Language Technology, in San Diego, California. The topic, "developing novel deep neural network architectures from probabilistic models" stems from MERL work with collaborators Jonathan Le Roux and Shinji Watanabe, on a principled framework that seeks to improve our understanding of deep neural networks, and draws inspiration for new types of deep network from the arsenal of principles and tools developed over the years for conventional probabilistic models. The tutorial covers a range of parallel ideas in the literature that have formed a recent trend, as well as their application to speech and language.
EVENT SANE 2016 - Speech and Audio in the Northeast
Date: Friday, October 21, 2016
Location: MIT, McGovern Institute for Brain Research, Cambridge, MA
MERL Contact: Jonathan Le Roux
Research Area: Speech & Audio
Brief
- SANE 2016, a one-day event gathering researchers and students in speech and audio from the Northeast of the American continent, will be held on Friday October 21, 2016 at MIT's Brain and Cognitive Sciences Department, at the McGovern Institute for Brain Research, in Cambridge, MA.
  
  It is a follow-up to SANE 2012 (Mitsubishi Electric Research Labs - MERL), SANE 2013 (Columbia University), SANE 2014 (MIT CSAIL), and SANE 2015 (Google NY). Since the first edition, the audience has steadily grown, gathering 140 researchers and students in 2015.
  
  SANE 2016 will feature invited talks by leading researchers: Juan P. Bello (NYU), William T. Freeman (MIT/Google), Nima Mesgarani (Columbia University), DAn Ellis (Google), Shinji Watanabe (MERL), Josh McDermott (MIT), and Jesse Engel (Google). It will also feature a lively poster session during lunch time, open to both students and researchers.
  
  SANE 2016 is organized by Jonathan Le Roux (MERL), Josh McDermott (MIT), Jim Glass (MIT), and John R. Hershey (MERL).
NEWS MERL Speech & Audio researchers present two sold-out tutorials at Interspeech 2016
Date: September 8, 2016
Where: Interspeech 2016, San Francisco, CA
MERL Contact: Jonathan Le Roux
Research Area: Speech & Audio
Brief
- MERL Speech and Audio Team researchers Shinji Watanabe and Jonathan Le Roux presented two tutorials on September 8 at the Interspeech 2016 conference, held in San Francisco, CA. Shinji collaborated with Marc Delcroix (NTT Communication Science Laboratories, Japan) to deliver a three-hour lecture on "Recent Advances in Distant Speech Recognition", drawing upon their experience organizing and participating in six different recent robust speech processing challenges. Jonathan teamed with Emmanuel Vincent (Inria, France) and Hakan Erdogan (Sabanci University, Microsoft Research) to give an in-depth tour of the latest advances in "Learning-based Approaches to Speech Enhancement And Separation". This collaboration stemmed from extensive stays at MERL by Emmanuel and Hakan, Emmanuel as a summer visitor, and Hakan as a MERL visiting research scientist for over a year while on sabbatical.
  
  Both tutorials were sold out, each attracting more than 100 researchers and students in related fields, and received high praise from audience members.
NEWS MERL researchers present 12 papers at ICASSP 2016
Date: March 20, 2016 - March 25, 2016
Where: Shanghai, China
MERL Contacts: Petros T. Boufounos; Chiori Hori; Jonathan Le Roux; Dehong Liu; Hassan Mansour; Philip V. Orlik; Anthony Vetro
Research Areas: Computational Sensing, Digital Video, Speech & Audio, Communications, Signal Processing
Brief
- MERL researchers have presented 12 papers at the recent IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP), which was held in Shanghai, China from March 20-25, 2016. ICASSP is the flagship conference of the IEEE Signal Processing Society, and the world's largest and most comprehensive technical conference focused on the research advances and latest technological development in signal and information processing, with more than 1200 papers presented and over 2000 participants.
NEWS John Hershey gives invited talk at Johns Hopkins University on MERL's "Deep Clustering" breakthrough
Date: March 4, 2016
Where: Johns Hopkins Center for Language and Speech Processing
MERL Contact: Jonathan Le Roux
Research Area: Speech & Audio
Brief
- MERL researcher and speech team leader, John Hershey, was invited by the Center for Language and Speech Processing at Johns Hopkins University to give a talk on MERL's breakthrough audio separation work, known as "Deep Clustering". The talk was entitled "Speech Separation by Deep Clustering: Towards Intelligent Audio Analysis and Understanding," and was given on March 4, 2016.
  
  This is work conducted by MERL researchers John Hershey, Jonathan Le Roux, and Shinji Watanabe, and MERL interns, Zhuo Chen of Columbia University, and Yusef Isik of Sabanci University.
AWARD MERL's Speech Team Achieves World's 2nd Best Performance at the Third CHiME Speech Separation and Recognition Challenge
Date: December 15, 2015
Awarded to: John R. Hershey, Takaaki Hori, Jonathan Le Roux and Shinji Watanabe
MERL Contact: Jonathan Le Roux
Research Area: Speech & Audio
Brief
- The results of the third 'CHiME' Speech Separation and Recognition Challenge were publicly announced on December 15 at the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2015) held in Scottsdale, Arizona, USA. MERL's Speech and Audio Team, in collaboration with SRI, ranked 2nd out of 26 teams from Europe, Asia and the US. The task this year was to recognize speech recorded using a tablet in real environments such as cafes, buses, or busy streets. Due to the high levels of noise and the distance from the speaker's mouth to the microphones, this is very challenging task, where the baseline system only achieved 33.4% word error rate. The MERL/SRI system featured state-of-the-art techniques including multi-channel front-end, noise-robust feature extraction, and deep learning for speech enhancement, acoustic modeling, and language modeling, leading to a dramatic 73% reduction in word error rate, down to 9.1%. The core of the system has since been released as a new official challenge baseline for the community to use.
EVENT SANE 2015 - Speech and Audio in the Northeast
Date: Thursday, October 22, 2015
Location: Google, New York City, NY
MERL Contact: Jonathan Le Roux
Research Area: Speech & Audio
Brief
- SANE 2015, a one-day event gathering researchers and students in speech and audio from the Northeast of the American continent, will be held on Thursday October 22, 2015 at Google, in New York City, NY.
  
  It is a follow-up to SANE 2012, held at Mitsubishi Electric Research Labs (MERL), SANE 2013, held at Columbia University, and SANE 2014, held at MIT, which each gathered 70 to 90 researchers and students.
  
  SANE 2015 will feature invited talks by leading researchers from the Northeast, as well as from the international community: Rohit Prasad (Amazon), Michael Mandel (Brooklyn College, CUNY), Ron Weiss (Google), John Hershey (MERL), Pablo Sprechmann (NYU), Tuomas Virtanen (Tampere University of Technology), and Paris Smaragdis (UIUC). It will also feature a lively poster session during lunch time, open to both students and researchers.
  
  SANE 2015 is organized by Jonathan Le Roux (MERL), Hank Liao (Google), Andrew Senior (Google), and John R. Hershey (MERL).
NEWS Multimedia Group researchers presented 8 papers at ICASSP 2015
Date: April 19, 2015 - April 24, 2015
Where: IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP)
MERL Contacts: Anthony Vetro; Hassan Mansour; Petros T. Boufounos; Jonathan Le Roux
Brief
- Multimedia Group researchers have presented 8 papers at the recent IEEE International Conference on Acoustics, Speech & Signal Processing, which was held in Brisbane, Australia from April 19-24, 2015.

Link:

Media Coverage:

Link:

Links:

Media Coverage: