News & Events

68 News items, Awards, Events or Talks found.



Learn about the MERL Seminar Series.



  •  TALK    [MERL Seminar Series 2022] Prof. Chuang Gan presents talk titled Learning to Perceive Physical Scenes from Multi-Sensory Data
    Date & Time: Tuesday, September 6, 2022; 12:00 PM EDT
    Speaker: Chuang Gan, UMass Amherst & MIT-IBM Watson AI Lab
    MERL Host: Jonathan Le Roux
    Research Areas: Artificial Intelligence, Computer Vision, Machine Learning, Speech & Audio
    Abstract
    • Human sensory perception of the physical world is rich and multimodal and can flexibly integrate input from all five sensory modalities -- vision, touch, smell, hearing, and taste. However, in AI, attention has primarily focused on visual perception. In this talk, I will introduce my efforts in connecting vision with sound, which will allow machine perception systems to see objects and infer physics from multi-sensory data. In the first part of my talk, I will introduce a. self-supervised approach that could learn to parse images and separate the sound sources by watching and listening to unlabeled videos without requiring additional manual supervision. In the second part of my talk, I will show we may further infer the underlying causal structure in 3D environments through visual and auditory observations. This enables agents to seek the sound source of repeating environmental sound (e.g., alarm) or identify what object has fallen, and where, from an intermittent impact sound.
  •  
  •  NEWS    MERL congratulates Prof. Alex Waibel on receiving 2023 IEEE James L. Flanagan Speech and Audio Processing Award
    Date: August 22, 2022
    MERL Contacts: Chiori Hori; Jonathan Le Roux; Anthony Vetro
    Research Areas: Artificial Intelligence, Machine Learning, Speech & Audio
    Brief
    • IEEE has announced that the recipient of the 2023 IEEE James L. Flanagan Speech and Audio Processing Award will be Prof. Alex Waibel (CMU/Karlsruhe Institute of Technology), “For pioneering contributions to spoken language translation and supporting technologies.” Mitsubishi Electric Research Laboratories (MERL), which has become the new sponsor of this prestigious award in 2022, extends our warmest congratulations to Prof. Waibel.

      MERL Senior Principal Research Scientist Dr. Chiori Hori, who worked with Dr. Waibel at Carnegie Mellon University and collaborated with him as part of national projects on speech summarization and translation, comments on his invaluable contributions to the field: “He has contributed not only to the invention of groundbreaking technology in speech and spoken language processing but also to the promotion of an abundance of research projects through international research consortiums by linking American, European, and Asian research communities. Many of his former laboratory members and collaborators are now leading R&D in the AI field.”

      The IEEE Board of Directors established the IEEE James L. Flanagan Speech and Audio Processing Award in 2002 for outstanding contributions to the advancement of speech and/or audio signal processing. This award has recognized the contributions of some of the most renowned pioneers and leaders in their respective fields. MERL is proud to support the recognition of outstanding contributions to the field of speech and audio processing through its sponsorship of this award.
  •  
  •  NEWS    MERL presenting 8 papers at ICASSP 2022
    Date: May 22, 2022 - May 27, 2022
    Where: Singapore
    MERL Contacts: Anoop Cherian; Chiori Hori; Toshiaki Koike-Akino; Jonathan Le Roux; Tim K. Marks; Philip V. Orlik; Kuan-Chuan Peng; Pu (Perry) Wang; Gordon Wichern
    Research Areas: Artificial Intelligence, Computer Vision, Signal Processing, Speech & Audio
    Brief
    • MERL researchers are presenting 8 papers at the IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP), which is being held in Singapore from May 22-27, 2022. A week of virtual presentations also took place earlier this month.

      Topics to be presented include recent advances in speech recognition, audio processing, scene understanding, computational sensing, and classification.

      ICASSP is the flagship conference of the IEEE Signal Processing Society, and the world's largest and most comprehensive technical conference focused on the research advances and latest technological development in signal and information processing. The event attracts more than 2000 participants each year.
  •  
  •  NEWS    MERL work on scene-aware interaction featured in IEEE Spectrum
    Date: March 1, 2022
    MERL Contacts: Anoop Cherian; Chiori Hori; Jonathan Le Roux; Tim K. Marks; Anthony Vetro
    Research Areas: Artificial Intelligence, Computer Vision, Machine Learning, Speech & Audio
    Brief
    • MERL's research on scene-aware interaction was recently featured in an IEEE Spectrum article. The article, titled "At Last, A Self-Driving Car That Can Explain Itself" and authored by MERL Senior Principal Research Scientist Chiori Hori and MERL Director Anthony Vetro, gives an overview of MERL's efforts towards developing a system that can analyze multimodal sensing information for highly natural and intuitive interaction with humans through context-dependent generation of natural language. The technology recognizes contextual objects and events based on multimodal sensing information, such as images and video captured with cameras, audio information recorded with microphones, and localization information measured with LiDAR.

      Scene-Aware Interaction for car navigation, one target application that the article focuses on, will provide drivers with intuitive route guidance. Scene-Aware Interaction technology is expected to have wide applicability, including human-machine interfaces for in-vehicle infotainment, interaction with service robots in building and factory automation systems, systems that monitor the health and well-being of people, surveillance systems that interpret complex scenes for humans and encourage social distancing, support for touchless operation of equipment in public areas, and much more. MERL's Scene-Aware Interaction Technology had previously been featured in a Mitsubishi Electric Corporation Press Release.

      IEEE Spectrum is the flagship magazine and website of the IEEE, the world’s largest professional organization devoted to engineering and the applied sciences. IEEE Spectrum has a circulation of over 400,000 engineers worldwide, making it one of the leading science and engineering magazines.
  •  
  •  NEWS    Jonathan Le Roux discusses MERL's audio source separation work on popular machine learning podcast
    Date: January 24, 2022
    Where: The TWIML AI Podcast
    MERL Contact: Jonathan Le Roux
    Research Areas: Artificial Intelligence, Machine Learning, Speech & Audio
    Brief
    • MERL Speech & Audio Senior Team Leader Jonathan Le Roux was featured in an extended interview on the popular TWIML AI Podcast, presenting MERL's work towards solving the "cocktail party problem". Humans have the extraordinary ability to focus on particular sounds of interest within a complex acoustic scene, such as a cocktail party. MERL's Speech & Audio Team has been at the forefront of the field's effort to develop algorithms giving machines similar abilities. Jonathan talked with host Sam Charrington about the group's decade-long journey on this topic, from early pioneering work using deep learning for speech enhancement and speech separation, to recent works on weakly-supervised separation, hierarchical sound separation, as well as the separation of real-world soundtracks into speech, music, and sound effects (aka the "cocktail fork problem").

      The TWIML AI podcast, formerly known as This Week in Machine Learning & AI, was created in 2016 and is followed by more than 10,000 subscribers on Youtube and Twitter. Jonathan's interview marks the 555th episode of the podcast.
  •  
  •  NEWS    MERL Congratulates Recipients of 2022 IEEE Technical Field Awards in Signal Processing
    Date: July 26, 2021
    MERL Contacts: Petros T. Boufounos; Jonathan Le Roux; Philip V. Orlik; Anthony Vetro
    Research Areas: Signal Processing, Speech & Audio
    Brief
    • IEEE has announced that the recipients of the 2022 IEEE James L. Flanagan Speech and Audio Processing Award will be Hervé Bourlard (EPFL/Idiap Research Institute) and Nelson Morgan (ICSI), "For contributions to neural networks for statistical speech recognition," and the recipient of the 2022 IEEE Fourier Award for Signal Processing will be Ali Sayed (EPFL), "For contributions to the theory and practice of adaptive signal processing." More details about the contributions of Prof. Bourlard and Prof. Morgan can be found in the announcements by ICSI and EPFL, and those of Prof. Sayed in EPFL's announcement. Mitsubishi Electric Research Laboratories (MERL) has recently become the new sponsor of these two prestigious awards, and extends our warmest congratulations to all of the 2022 award recipients.

      The IEEE Board of Directors established the IEEE James L. Flanagan Speech and Audio Processing Award in 2002 for outstanding contributions to the advancement of speech and/or audio signal processing, while the IEEE Fourier Award for Signal Processing was established in 2012 for outstanding contribution to the advancement of signal processing, other than in the areas of speech and audio processing. Both awards have recognized the contributions of some of the most renowned pioneers and leaders in their respective fields. MERL is proud to support the recognition of outstanding contributions to the signal processing field through its sponsorship of these awards.
  •  
  •  NEWS    MERL becomes new sponsor of two prestigious IEEE Technical Field Awards in Signal Processing
    Date: July 9, 2021
    MERL Contacts: Petros T. Boufounos; Jonathan Le Roux; Philip V. Orlik; Anthony Vetro
    Research Areas: Signal Processing, Speech & Audio
    Brief
    • Mitsubishi Electric Research Laboratories (MERL) has become the new sponsor of two prestigious IEEE Technical Field Awards in Signal Processing, the IEEE James L. Flanagan Speech and Audio Processing Award and the IEEE Fourier Award for Signal Processing, for the years 2022-2031. "MERL is proud to support the recognition of outstanding contributions to signal processing by sponsoring both the IEEE James L. Flanagan Speech and Audio Processing Award and the IEEE Fourier Award for Signal Processing. These awards celebrate the creativity and innovation in the field that touch many aspects of our lives and drive our society forward" said Dr. Anthony Vetro, VP and Director at MERL.

      The IEEE Board of Directors established the IEEE James L. Flanagan Speech and Audio Processing Award in 2002 for outstanding contributions to the advancement of speech and/or audio signal processing, while the IEEE Fourier Award for Signal Processing was established in 2012 for outstanding contribution to the advancement of signal processing, other than in the areas of speech and audio processing. Both awards have since recognized the contributions of some of the most renowned pioneers and leaders in their respective fields.

      By underwriting these IEEE Technical Field Awards, MERL continues to make a mark by supporting the advancement of technology that makes lasting changes in the world.
  •  
  •  AWARD    Best Poster Award and Best Video Award at the International Society for Music Information Retrieval Conference (ISMIR) 2020
    Date: October 15, 2020
    Awarded to: Ethan Manilow, Gordon Wichern, Jonathan Le Roux
    MERL Contacts: Jonathan Le Roux; Gordon Wichern
    Research Areas: Artificial Intelligence, Machine Learning, Speech & Audio
    Brief
    • Former MERL intern Ethan Manilow and MERL researchers Gordon Wichern and Jonathan Le Roux won Best Poster Award and Best Video Award at the 2020 International Society for Music Information Retrieval Conference (ISMIR 2020) for the paper "Hierarchical Musical Source Separation". The conference was held October 11-14 in a virtual format. The Best Poster Awards and Best Video Awards were awarded by popular vote among the conference attendees.

      The paper proposes a new method for isolating individual sounds in an audio mixture that accounts for the hierarchical relationship between sound sources. Many sounds we are interested in analyzing are hierarchical in nature, e.g., during a music performance, a hi-hat note is one of many such hi-hat notes, which is one of several parts of a drumkit, itself one of many instruments in a band, which might be playing in a bar with other sounds occurring. Inspired by this, the paper re-frames the audio source separation problem as hierarchical, combining similar sounds together at certain levels while separating them at other levels, and shows on a musical instrument separation task that a hierarchical approach outperforms non-hierarchical models while also requiring less training data. The paper, poster, and video can be seen on the paper page on the ISMIR website.
  •  
  •  NEWS    MERL's Scene-Aware Interaction Technology Featured in Mitsubishi Electric Corporation Press Release
    Date: July 22, 2020
    Where: Tokyo, Japan
    MERL Contacts: Anoop Cherian; Chiori Hori; Jonathan Le Roux; Tim K. Marks; Anthony Vetro
    Research Areas: Artificial Intelligence, Computer Vision, Machine Learning, Speech & Audio
    Brief
    • Mitsubishi Electric Corporation announced that the company has developed what it believes to be the world’s first technology capable of highly natural and intuitive interaction with humans based on a scene-aware capability to translate multimodal sensing information into natural language.

      The novel technology, Scene-Aware Interaction, incorporates Mitsubishi Electric’s proprietary Maisart® compact AI technology to analyze multimodal sensing information for highly natural and intuitive interaction with humans through context-dependent generation of natural language. The technology recognizes contextual objects and events based on multimodal sensing information, such as images and video captured with cameras, audio information recorded with microphones, and localization information measured with LiDAR.

      Scene-Aware Interaction for car navigation, one target application, will provide drivers with intuitive route guidance. The technology is also expected to have applicability to human-machine interfaces for in-vehicle infotainment, interaction with service robots in building and factory automation systems, systems that monitor the health and well-being of people, surveillance systems that interpret complex scenes for humans and encourage social distancing, support for touchless operation of equipment in public areas, and much more. The technology is based on recent research by MERL's Speech & Audio and Computer Vision groups.
  •  
  •  NEWS    Jonathan Le Roux gives Plenary Lecture at the JSALT 2020 Summer Workshop
    Date: July 10, 2020
    Where: Virtual Baltimore, MD
    MERL Contact: Jonathan Le Roux
    Research Areas: Artificial Intelligence, Machine Learning, Speech & Audio
    Brief
    • MERL Senior Principal Research Scientist and Speech and Audio Senior Team Leader Jonathan Le Roux was invited by the Center for Language and Speech Processing at Johns Hopkins University to give a plenary lecture at the 2020 Frederick Jelinek Memorial Summer Workshop on Speech and Language Technology (JSALT). The talk, entitled "Deep Learning for Multifarious Speech Processing: Tackling Multiple Speakers, Microphones, and Languages", presented an overview of deep learning techniques developed at MERL towards the goal of cracking the Tower of Babel version of the cocktail party problem, that is, separating and/or recognizing the speech of multiple unknown speakers speaking simultaneously in multiple languages, in both single-channel and multi-channel scenarios: from deep clustering to chimera networks, phasebook and friends, and from seamless ASR to MIMO-Speech and Transformer-based multi-speaker ASR.

      JSALT 2020 is the seventh in a series of six-week-long research workshops on Machine Learning for Speech Language and Computer Vision Technology. A continuation of the well known Johns Hopkins University summer workshops, these workshops bring together diverse "dream teams" of leading professionals, graduate students, and undergraduates, in a truly cooperative, intensive, and substantive effort to advance the state of the science. MERL researchers led such teams in the JSALT 2015 workshop, on "Far-Field Speech Enhancement and Recognition in Mismatched Settings", and the JSALT 2018 workshop, on "Multi-lingual End-to-End Speech Recognition for Incomplete Data".
  •  
  •  NEWS    MERL presenting 13 papers and an industry talk at ICASSP 2020
    Date: May 4, 2020 - May 8, 2020
    Where: Virtual Barcelona
    MERL Contacts: Karl Berntorp; Petros T. Boufounos; Chiori Hori; Toshiaki Koike-Akino; Jonathan Le Roux; Dehong Liu; Yanting Ma; Hassan Mansour; Philip V. Orlik; Anthony Vetro; Pu (Perry) Wang; Gordon Wichern
    Research Areas: Computational Sensing, Computer Vision, Machine Learning, Signal Processing, Speech & Audio
    Brief
    • MERL researchers are presenting 13 papers at the IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP), which is being held virtually from May 4-8, 2020. Petros Boufounos is also presenting a talk on the Computational Sensing Revolution in Array Processing (video) in ICASSP’s Industry Track, and Siheng Chen is co-organizing and chairing a special session on a Signal-Processing View of Graph Neural Networks.

      Topics to be presented include recent advances in speech recognition, audio processing, scene understanding, computational sensing, array processing, and parameter estimation. Videos for all talks are available on MERL's YouTube channel, with corresponding links in the references below.

      This year again, MERL is a sponsor of the conference and will be participating in the Student Job Fair; please join us to learn about our internship program and career opportunities.

      ICASSP is the flagship conference of the IEEE Signal Processing Society, and the world's largest and most comprehensive technical conference focused on the research advances and latest technological development in signal and information processing. The event attracts more than 2000 participants each year. Originally planned to be held in Barcelona, Spain, ICASSP has moved to a fully virtual setting due to the COVID-19 crisis, with free registration for participants not covering a paper.
  •  
  •  AWARD    Best Paper Award at the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2019
    Date: December 18, 2019
    Awarded to: Xuankai Chang, Wangyou Zhang, Yanmin Qian, Jonathan Le Roux, Shinji Watanabe
    MERL Contact: Jonathan Le Roux
    Research Areas: Artificial Intelligence, Machine Learning, Speech & Audio
    Brief
    • MERL researcher Jonathan Le Roux and co-authors Xuankai Chang, Shinji Watanabe (Johns Hopkins University), Wangyou Zhang, and Yanmin Qian (Shanghai Jiao Tong University) won the Best Paper Award at the 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2019), for the paper "MIMO-Speech: End-to-End Multi-Channel Multi-Speaker Speech Recognition". MIMO-Speech is a fully neural end-to-end framework that can transcribe the text of multiple speakers speaking simultaneously from multi-channel input. The system is comprised of a monaural masking network, a multi-source neural beamformer, and a multi-output speech recognition model, which are jointly optimized only via an automatic speech recognition (ASR) criterion. The award was received by lead author Xuankai Chang during the conference, which was held in Sentosa, Singapore from December 14-18, 2019.
  •  
  •  NEWS    MERL Speech & Audio Researchers Presenting 7 Papers and a Tutorial at Interspeech 2019
    Date: September 15, 2019 - September 19, 2019
    Where: Graz, Austria
    MERL Contacts: Chiori Hori; Jonathan Le Roux; Gordon Wichern
    Research Areas: Artificial Intelligence, Machine Learning, Speech & Audio
    Brief
    • MERL Speech & Audio Team researchers will be presenting 7 papers at the 20th Annual Conference of the International Speech Communication Association INTERSPEECH 2019, which is being held in Graz, Austria from September 15-19, 2019. Topics to be presented include recent advances in end-to-end speech recognition, speech separation, and audio-visual scene-aware dialog. Takaaki Hori is also co-presenting a tutorial on end-to-end speech processing.

      Interspeech is the world's largest and most comprehensive conference on the science and technology of spoken language processing. It gathers around 2000 participants from all over the world.
  •  
  •  NEWS    MERL presenting 16 papers at ICASSP 2019
    Date: May 12, 2019 - May 17, 2019
    Where: Brighton, UK
    MERL Contacts: Petros T. Boufounos; Anoop Cherian; Chiori Hori; Toshiaki Koike-Akino; Jonathan Le Roux; Dehong Liu; Hassan Mansour; Tim K. Marks; Philip V. Orlik; Anthony Vetro; Pu (Perry) Wang; Gordon Wichern
    Research Areas: Computational Sensing, Computer Vision, Machine Learning, Signal Processing, Speech & Audio
    Brief
    • MERL researchers will be presenting 16 papers at the IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP), which is being held in Brighton, UK from May 12-17, 2019. Topics to be presented include recent advances in speech recognition, audio processing, scene understanding, computational sensing, and parameter estimation. MERL is also a sponsor of the conference and will be participating in the student career luncheon; please join us at the lunch to learn about our internship program and career opportunities.

      ICASSP is the flagship conference of the IEEE Signal Processing Society, and the world's largest and most comprehensive technical conference focused on the research advances and latest technological development in signal and information processing. The event attracts more than 2000 participants each year.
  •  
  •  NEWS    MERL's seamless speech recognition technology featured in Mitsubishi Electric Corporation press release
    Date: February 13, 2019
    Where: Tokyo, Japan
    MERL Contacts: Jonathan Le Roux; Gordon Wichern
    Research Area: Speech & Audio
    Brief
  •  
  •  EVENT    SANE 2018 - Speech and Audio in the Northeast
    Date: Thursday, October 18, 2018
    Location: Google, Cambridge, MA
    MERL Contact: Jonathan Le Roux
    Research Area: Speech & Audio
    Brief
    • SANE 2018, a one-day event gathering researchers and students in speech and audio from the Northeast of the American continent, will be held on Thursday October 18, 2018 at Google, in Cambridge, MA. MERL is one of the organizers and sponsors of the workshop.

      It is the 7th edition in the SANE series of workshops, which started at MERL in 2012. Since the first edition, the audience has steadily grown, with a record 180 participants in 2017.

      SANE 2018 will feature invited talks by leading researchers from the Northeast, as well as from the international community. It will also feature a lively poster session, open to both students and researchers.
  •  
  •  NEWS    Takaaki Hori leads speech technology workshop
    Date: June 25, 2018 - August 3, 2018
    Where: Johns Hopkins University, Baltimore, MD
    MERL Contact: Jonathan Le Roux
    Research Area: Speech & Audio
    Brief
    • MERL Speech & Audio Team researcher Takaaki Hori led a team of 27 senior researchers and Ph.D. students from different organizations around the world, working on "Multi-lingual End-to-End Speech Recognition for Incomplete Data" as part of the Jelinek Memorial Summer Workshop on Speech and Language Technology (JSALT). The JSALT workshop is a renowned 6-week hands-on workshop held yearly since 1995. This year, the workshop was held at Johns Hopkins University in Baltimore from June 25 to August 3, 2018. Takaaki's team developed new methods for end-to-end Automatic Speech Recognition (ASR) with a focus on low-resource languages with limited labelled data.

      End-to-end ASR can significantly reduce the burden of developing ASR systems for new languages, by eliminating the need for linguistic information such as pronunciation dictionaries. Some end-to-end systems have recently achieved performance comparable to or better than conventional systems in several tasks. However, the current model training algorithms basically require paired data, i.e., speech data and the corresponding transcription. Sufficient amount of such complete data is usually unavailable for minor languages, and creating such data sets is very expensive and time consuming.

      The goal of Takaaki's team project was to expand the applicability of end-to-end models to multilingual ASR, and to develop new technology that would make it possible to build highly accurate systems even for low-resource languages without a large amount of paired data. Some major accomplishments of the team include building multi-lingual end-to-end ASR systems for 17 languages, developing novel architectures and training methods for end-to-end ASR, building end-to-end ASR-TTS (Text-to-speech) chain for unpaired data training, and developing ESPnet, an open-source end-to-end speech processing toolkit. Three papers stemming from the team's work have already been accepted to the 2018 IEEE Spoken Language Technology Workshop (SLT), with several more to be submitted to upcoming conferences.
  •  
  •  AWARD    Best Student Paper Award at IEEE ICASSP 2018
    Date: April 17, 2018
    Awarded to: Zhong-Qiu Wang
    MERL Contact: Jonathan Le Roux
    Research Area: Speech & Audio
    Brief
    • Former MERL intern Zhong-Qiu Wang (Ph.D. Candidate at Ohio State University) has received a Best Student Paper Award at the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2018) for the paper "Multi-Channel Deep Clustering: Discriminative Spectral and Spatial Embeddings for Speaker-Independent Speech Separation" by Zhong-Qiu Wang, Jonathan Le Roux, and John Hershey. The paper presents work performed during Zhong-Qiu's internship at MERL in the summer 2017, extending MERL's pioneering Deep Clustering framework for speech separation to a multi-channel setup. The award was received on behalf on Zhong-Qiu by MERL researcher and co-author Jonathan Le Roux during the conference, held in Calgary April 15-20.
  •  
  •  NEWS    MERL presenting 9 papers at ICASSP 2018
    Date: April 15, 2018 - April 20, 2018
    Where: Calgary, AB
    MERL Contacts: Petros T. Boufounos; Toshiaki Koike-Akino; Jonathan Le Roux; Dehong Liu; Hassan Mansour; Philip V. Orlik; Pu (Perry) Wang
    Research Areas: Computational Sensing, Digital Video, Speech & Audio
    Brief
    • MERL researchers are presenting 9 papers at the IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP), which is being held in Calgary from April 15-20, 2018. Topics to be presented include recent advances in speech recognition, audio processing, and computational sensing. MERL is also a sponsor of the conference.

      ICASSP is the flagship conference of the IEEE Signal Processing Society, and the world's largest and most comprehensive technical conference focused on the research advances and latest technological development in signal and information processing. The event attracts more than 2000 participants each year.
  •  
  •  TALK    Theory and Applications of Sparse Model-Based Recurrent Neural Networks
    Date & Time: Tuesday, March 6, 2018; 12:00 PM
    Speaker: Scott Wisdom, Affectiva
    MERL Host: Jonathan Le Roux
    Research Area: Speech & Audio
    Abstract
    • Recurrent neural networks (RNNs) are effective, data-driven models for sequential data, such as audio and speech signals. However, like many deep networks, RNNs are essentially black boxes; though they are effective, their weights and architecture are not directly interpretable by practitioners. A major component of my dissertation research is explaining the success of RNNs and constructing new RNN architectures through the process of "deep unfolding," which can construct and explain deep network architectures using an equivalence to inference in statistical models. Deep unfolding yields principled initializations for training deep networks, provides insight into their effectiveness, and assists with interpretation of what these networks learn.

      In particular, I will show how RNNs with rectified linear units and residual connections are a particular deep unfolding of a sequential version of the iterative shrinkage-thresholding algorithm (ISTA), a simple and classic algorithm for solving L1-regularized least-squares. This equivalence allows interpretation of state-of-the-art unitary RNNs (uRNNs) as an unfolded sparse coding algorithm. I will also describe a new type of RNN architecture called deep recurrent nonnegative matrix factorization (DR-NMF). DR-NMF is an unfolding of a sparse NMF model of nonnegative spectrograms for audio source separation. Both of these networks outperform conventional LSTM networks while also providing interpretability for practitioners.
  •  
  •  NEWS    MERL's speech research featured in NPR's All Things Considered
    Date: February 5, 2018
    Where: National Public Radio (NPR)
    MERL Contact: Jonathan Le Roux
    Research Area: Speech & Audio
    Brief
    • MERL's speech separation technology was featured in NPR's All Things Considered, as part of an episode of All Tech Considered on artificial intelligence, "Can Computers Learn Like Humans?". An example separating the overlapped speech of two of the show's hosts was played on the air.
      The technology is based on a proprietary deep learning method called Deep Clustering. It is the world's first technology that separates in real time the simultaneous speech of multiple unknown speakers recorded with a single microphone. It is a key step towards building machines that can interact in noisy environments, in the same way that humans can have meaningful conversations in the presence of many other conversations.
      A live demonstration was featured in Mitsubishi Electric Corporation's Annual R&D Open House last year, and was also covered in international media at the time.

      (Photo credit: Sam Rowe for NPR)

      Link:
      "Can Computers Learn Like Humans?" (NPR, All Things Considered)
      MERL Deep Clustering Demo.
  •  
  •  NEWS    MERL presents 3 papers at ASRU 2017, John Hershey serves as general chair
    Date: December 16, 2017 - December 20, 2017
    Where: Okinawa, Japan
    MERL Contacts: Chiori Hori; Jonathan Le Roux
    Research Area: Speech & Audio
    Brief
    • MERL presented three papers at the 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), which was held in Okinawa, Japan from December 16-20, 2017. ASRU is the premier speech workshop, bringing together researchers from academia and industry in an intimate and collegial setting. More than 270 people attended the event this year, a record number. MERL's Speech and Audio Team was a key part of the organization of the workshop, with John Hershey serving as General Chair, Chiori Hori as Sponsorship Chair, and Jonathan Le Roux as Demonstration Chair. Two of the papers by MERL were selected among the 10 finalists for the best paper award. Mitsubishi Electric and MERL were also Platinum sponsors of the conference, with MERL awarding the MERL Best Student Paper Award.
  •  
  •  NEWS    MERL's breakthrough speech separation technology featured in Mitsubishi Electric Corporation's Annual R&D Open House
    Date: May 24, 2017
    Where: Tokyo, Japan
    MERL Contact: Jonathan Le Roux
    Research Area: Speech & Audio
    Brief
    • Mitsubishi Electric Corporation announced that it has created the world's first technology that separates in real time the simultaneous speech of multiple unknown speakers recorded with a single microphone. It's a key step towards building machines that can interact in noisy environments, in the same way that humans can have meaningful conversations in the presence of many other conversations. In tests, the simultaneous speeches of two and three people were separated with up to 90 and 80 percent accuracy, respectively. The novel technology, which was realized with Mitsubishi Electric's proprietary "Deep Clustering" method based on artificial intelligence (AI), is expected to contribute to more intelligible voice communications and more accurate automatic speech recognition. A characteristic feature of this approach is its versatility, in the sense that voices can be separated regardless of their language or the gender of the speakers. A live speech separation demonstration that took place on May 24 in Tokyo, Japan, was widely covered by the Japanese media, with reports by three of the main Japanese TV stations and multiple articles in print and online newspapers. The technology is based on recent research by MERL's Speech and Audio team.
      Links:
      Mitsubishi Electric Corporation Press Release
      MERL Deep Clustering Demo

      Media Coverage:

      Fuji TV, News, "Minna no Mirai" (Japanese)
      The Nikkei (Japanese)
      Nikkei Technology Online (Japanese)
      Sankei Biz (Japanese)
      EE Times Japan (Japanese)
      ITpro (Japanese)
      Nikkan Sports (Japanese)
      Nikkan Kogyo Shimbun (Japanese)
      Dempa Shimbun (Japanese)
      Il Sole 24 Ore (Italian)
      IEEE Spectrum (English).
  •  
  •  NEWS    MERL to present 10 papers at ICASSP 2017
    Date: March 5, 2017 - March 9, 2017
    Where: New Orleans
    MERL Contacts: Petros T. Boufounos; Jonathan Le Roux; Dehong Liu; Hassan Mansour; Anthony Vetro; Ye Wang
    Research Areas: Computer Vision, Computational Sensing, Digital Video, Information Security, Speech & Audio
    Brief
    • MERL researchers will presented 10 papers at the upcoming IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP), to be held in New Orleans from March 5-9, 2017. Topics to be presented include recent advances in speech recognition and audio processing; graph signal processing; computational imaging; and privacy-preserving data analysis.

      ICASSP is the flagship conference of the IEEE Signal Processing Society, and the world's largest and most comprehensive technical conference focused on the research advances and latest technological development in signal and information processing. The event attracts more than 2000 participants each year.
  •  
  •  EVENT    John Hershey to present tutorial at the 2016 IEEE SLT Workshop
    Date: Tuesday, December 13, 2016
    Location: 2016 IEEE Spoken Language Technology Workshop, San Diego, California
    Speaker: John Hershey, MERL
    MERL Contact: Jonathan Le Roux
    Research Areas: Machine Learning, Speech & Audio
    Brief
    • MERL researcher John Hershey presents an invited tutorial at the 2016 IEEE Workshop on Spoken Language Technology, in San Diego, California. The topic, "developing novel deep neural network architectures from probabilistic models" stems from MERL work with collaborators Jonathan Le Roux and Shinji Watanabe, on a principled framework that seeks to improve our understanding of deep neural networks, and draws inspiration for new types of deep network from the arsenal of principles and tools developed over the years for conventional probabilistic models. The tutorial covers a range of parallel ideas in the literature that have formed a recent trend, as well as their application to speech and language.
  •