- Date: July 22, 2020
Where: Tokyo, Japan
MERL Contacts: Anoop Cherian; Chiori Hori; Jonathan Le Roux; Tim K. Marks; Anthony Vetro
Research Areas: Artificial Intelligence, Computer Vision, Machine Learning, Speech & Audio
Brief - Mitsubishi Electric Corporation announced that the company has developed what it believes to be the world’s first technology capable of highly natural and intuitive interaction with humans based on a scene-aware capability to translate multimodal sensing information into natural language.
The novel technology, Scene-Aware Interaction, incorporates Mitsubishi Electric’s proprietary Maisart® compact AI technology to analyze multimodal sensing information for highly natural and intuitive interaction with humans through context-dependent generation of natural language. The technology recognizes contextual objects and events based on multimodal sensing information, such as images and video captured with cameras, audio information recorded with microphones, and localization information measured with LiDAR.
Scene-Aware Interaction for car navigation, one target application, will provide drivers with intuitive route guidance. The technology is also expected to have applicability to human-machine interfaces for in-vehicle infotainment, interaction with service robots in building and factory automation systems, systems that monitor the health and well-being of people, surveillance systems that interpret complex scenes for humans and encourage social distancing, support for touchless operation of equipment in public areas, and much more. The technology is based on recent research by MERL's Speech & Audio and Computer Vision groups.
-
- Date: May 4, 2020 - May 8, 2020
Where: Virtual Barcelona
MERL Contacts: Karl Berntorp; Petros T. Boufounos; Chiori Hori; Toshiaki Koike-Akino; Jonathan Le Roux; Dehong Liu; Yanting Ma; Hassan Mansour; Philip V. Orlik; Anthony Vetro; Pu (Perry) Wang; Gordon Wichern
Research Areas: Computational Sensing, Computer Vision, Machine Learning, Signal Processing, Speech & Audio
Brief - MERL researchers are presenting 13 papers at the IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP), which is being held virtually from May 4-8, 2020. Petros Boufounos is also presenting a talk on the Computational Sensing Revolution in Array Processing (video) in ICASSP’s Industry Track, and Siheng Chen is co-organizing and chairing a special session on a Signal-Processing View of Graph Neural Networks.
Topics to be presented include recent advances in speech recognition, audio processing, scene understanding, computational sensing, array processing, and parameter estimation. Videos for all talks are available on MERL's YouTube channel, with corresponding links in the references below.
This year again, MERL is a sponsor of the conference and will be participating in the Student Job Fair; please join us to learn about our internship program and career opportunities.
ICASSP is the flagship conference of the IEEE Signal Processing Society, and the world's largest and most comprehensive technical conference focused on the research advances and latest technological development in signal and information processing. The event attracts more than 2000 participants each year. Originally planned to be held in Barcelona, Spain, ICASSP has moved to a fully virtual setting due to the COVID-19 crisis, with free registration for participants not covering a paper.
-
- Date: November 9, 2019
Research Areas: Artificial Intelligence, Machine Learning, Speech & Audio
Brief - Takaaki Hori has been elected to serve on the Speech and Language Processing Technical Committee (SLTC) of the IEEE Signal Processing Society for a 3-year term.
The SLTC promotes and influences all the technical areas of speech and language processing such as speech recognition, speech synthesis, spoken language understanding, speech to speech translation, spoken dialog management, speech indexing, information extraction from audio, and speaker and language recognition.
-
- Date: September 15, 2019 - September 19, 2019
Where: Graz, Austria
MERL Contacts: Chiori Hori; Jonathan Le Roux; Gordon Wichern
Research Areas: Artificial Intelligence, Machine Learning, Speech & Audio
Brief - MERL Speech & Audio Team researchers will be presenting 7 papers at the 20th Annual Conference of the International Speech Communication Association INTERSPEECH 2019, which is being held in Graz, Austria from September 15-19, 2019. Topics to be presented include recent advances in end-to-end speech recognition, speech separation, and audio-visual scene-aware dialog. Takaaki Hori is also co-presenting a tutorial on end-to-end speech processing.
Interspeech is the world's largest and most comprehensive conference on the science and technology of spoken language processing. It gathers around 2000 participants from all over the world.
-
- Date: May 12, 2019 - May 17, 2019
Where: Brighton, UK
MERL Contacts: Petros T. Boufounos; Anoop Cherian; Chiori Hori; Toshiaki Koike-Akino; Jonathan Le Roux; Dehong Liu; Hassan Mansour; Tim K. Marks; Philip V. Orlik; Anthony Vetro; Pu (Perry) Wang; Gordon Wichern
Research Areas: Computational Sensing, Computer Vision, Machine Learning, Signal Processing, Speech & Audio
Brief - MERL researchers will be presenting 16 papers at the IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP), which is being held in Brighton, UK from May 12-17, 2019. Topics to be presented include recent advances in speech recognition, audio processing, scene understanding, computational sensing, and parameter estimation. MERL is also a sponsor of the conference and will be participating in the student career luncheon; please join us at the lunch to learn about our internship program and career opportunities.
ICASSP is the flagship conference of the IEEE Signal Processing Society, and the world's largest and most comprehensive technical conference focused on the research advances and latest technological development in signal and information processing. The event attracts more than 2000 participants each year.
-
- Date: February 13, 2019
Where: Tokyo, Japan
MERL Contacts: Jonathan Le Roux; Gordon Wichern
Research Area: Speech & Audio
Brief - Mitsubishi Electric Corporation announced that it has developed the world's first technology capable of highly accurate multilingual speech recognition without being informed which language is being spoken. The novel technology, Seamless Speech Recognition, incorporates Mitsubishi Electric's proprietary Maisart compact AI technology and is built on a single system that can simultaneously identify and understand spoken languages. In tests involving 5 languages, the system achieved recognition with over 90 percent accuracy, without being informed which language was being spoken. When incorporating 5 more languages with lower resources, accuracy remained above 80 percent. The technology can also understand multiple people speaking either the same or different languages simultaneously. A live demonstration involving a multilingual airport guidance system took place on February 13 in Tokyo, Japan. It was widely covered by the Japanese media, with reports by all six main Japanese TV stations and multiple articles in print and online newspapers, including in Japan's top newspaper, Asahi Shimbun. The technology is based on recent research by MERL's Speech and Audio team.
Link:
Mitsubishi Electric Corporation Press Release
Media Coverage:
NHK, News (Japanese)
NHK World, News (English), video report (starting at 4'38")
TV Asahi, ANN news (Japanese)
Nippon TV, News24 (Japanese)
Fuji TV, Prime News Alpha (Japanese)
TV Tokyo, World Business Satellite (Japanese)
TV Tokyo, Morning Satellite (Japanese)
TBS, News, N Studio (Japanese)
The Asahi Shimbun (Japanese)
The Nikkei Shimbun (Japanese)
Nikkei xTech (Japanese)
Response (Japanese).
-
- Date: Thursday, October 18, 2018
Location: Google, Cambridge, MA
MERL Contact: Jonathan Le Roux
Research Area: Speech & Audio
Brief - SANE 2018, a one-day event gathering researchers and students in speech and audio from the Northeast of the American continent, will be held on Thursday October 18, 2018 at Google, in Cambridge, MA. MERL is one of the organizers and sponsors of the workshop.
It is the 7th edition in the SANE series of workshops, which started at MERL in 2012. Since the first edition, the audience has steadily grown, with a record 180 participants in 2017.
SANE 2018 will feature invited talks by leading researchers from the Northeast, as well as from the international community. It will also feature a lively poster session, open to both students and researchers.
-
- Date: June 25, 2018 - August 3, 2018
Where: Johns Hopkins University, Baltimore, MD
MERL Contact: Jonathan Le Roux
Research Area: Speech & Audio
Brief - MERL Speech & Audio Team researcher Takaaki Hori led a team of 27 senior researchers and Ph.D. students from different organizations around the world, working on "Multi-lingual End-to-End Speech Recognition for Incomplete Data" as part of the Jelinek Memorial Summer Workshop on Speech and Language Technology (JSALT). The JSALT workshop is a renowned 6-week hands-on workshop held yearly since 1995. This year, the workshop was held at Johns Hopkins University in Baltimore from June 25 to August 3, 2018. Takaaki's team developed new methods for end-to-end Automatic Speech Recognition (ASR) with a focus on low-resource languages with limited labelled data.
End-to-end ASR can significantly reduce the burden of developing ASR systems for new languages, by eliminating the need for linguistic information such as pronunciation dictionaries. Some end-to-end systems have recently achieved performance comparable to or better than conventional systems in several tasks. However, the current model training algorithms basically require paired data, i.e., speech data and the corresponding transcription. Sufficient amount of such complete data is usually unavailable for minor languages, and creating such data sets is very expensive and time consuming.
The goal of Takaaki's team project was to expand the applicability of end-to-end models to multilingual ASR, and to develop new technology that would make it possible to build highly accurate systems even for low-resource languages without a large amount of paired data. Some major accomplishments of the team include building multi-lingual end-to-end ASR systems for 17 languages, developing novel architectures and training methods for end-to-end ASR, building end-to-end ASR-TTS (Text-to-speech) chain for unpaired data training, and developing ESPnet, an open-source end-to-end speech processing toolkit. Three papers stemming from the team's work have already been accepted to the 2018 IEEE Spoken Language Technology Workshop (SLT), with several more to be submitted to upcoming conferences.
-
- Date: April 15, 2018 - April 20, 2018
Where: Calgary, AB
MERL Contacts: Petros T. Boufounos; Toshiaki Koike-Akino; Jonathan Le Roux; Dehong Liu; Hassan Mansour; Philip V. Orlik; Pu (Perry) Wang
Research Areas: Computational Sensing, Digital Video, Speech & Audio
Brief - MERL researchers are presenting 9 papers at the IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP), which is being held in Calgary from April 15-20, 2018. Topics to be presented include recent advances in speech recognition, audio processing, and computational sensing. MERL is also a sponsor of the conference.
ICASSP is the flagship conference of the IEEE Signal Processing Society, and the world's largest and most comprehensive technical conference focused on the research advances and latest technological development in signal and information processing. The event attracts more than 2000 participants each year.
-
- Date: December 16, 2017 - December 20, 2017
Where: Okinawa, Japan
MERL Contacts: Chiori Hori; Jonathan Le Roux
Research Area: Speech & Audio
Brief - MERL presented three papers at the 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), which was held in Okinawa, Japan from December 16-20, 2017. ASRU is the premier speech workshop, bringing together researchers from academia and industry in an intimate and collegial setting. More than 270 people attended the event this year, a record number. MERL's Speech and Audio Team was a key part of the organization of the workshop, with John Hershey serving as General Chair, Chiori Hori as Sponsorship Chair, and Jonathan Le Roux as Demonstration Chair. Two of the papers by MERL were selected among the 10 finalists for the best paper award. Mitsubishi Electric and MERL were also Platinum sponsors of the conference, with MERL awarding the MERL Best Student Paper Award.
-
- Date: Sunday, December 10, 2017
Location: Hyatt Regency, Long Beach, CA
MERL Contact: Chiori Hori
Research Area: Speech & Audio
Brief - MERL researcher Chiori Hori led the organization of the 6th edition of the Dialog System Technology Challenges (DSTC6). This year's edition of DSTC is split into three tracks: End-to-End Goal Oriented Dialog Learning, End-to-End Conversation Modeling, and Dialogue Breakdown Detection. A total of 23 teams from all over the world competed in the various tracks, and will meet at the Hyatt Regency in Long Beach, CA, USA on December 10 to present their results at a dedicated workshop colocated with NIPS 2017.
MERL's Speech and Audio Team and Mitsubishi Electric Corporation jointly submitted a set of systems to the End-to-End Conversation Modeling Track, obtaining the best rank among 19 submissions in terms of objective metrics.
-
- Date: March 5, 2017 - March 9, 2017
Where: New Orleans
MERL Contacts: Petros T. Boufounos; Jonathan Le Roux; Dehong Liu; Hassan Mansour; Anthony Vetro; Ye Wang
Research Areas: Computer Vision, Computational Sensing, Digital Video, Information Security, Speech & Audio
Brief - MERL researchers will presented 10 papers at the upcoming IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP), to be held in New Orleans from March 5-9, 2017. Topics to be presented include recent advances in speech recognition and audio processing; graph signal processing; computational imaging; and privacy-preserving data analysis.
ICASSP is the flagship conference of the IEEE Signal Processing Society, and the world's largest and most comprehensive technical conference focused on the research advances and latest technological development in signal and information processing. The event attracts more than 2000 participants each year.
-
- Date: March 20, 2016 - March 25, 2016
Where: Shanghai, China
MERL Contacts: Petros T. Boufounos; Chiori Hori; Jonathan Le Roux; Dehong Liu; Hassan Mansour; Philip V. Orlik; Anthony Vetro
Research Areas: Computational Sensing, Digital Video, Speech & Audio, Communications, Signal Processing
Brief - MERL researchers have presented 12 papers at the recent IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP), which was held in Shanghai, China from March 20-25, 2016. ICASSP is the flagship conference of the IEEE Signal Processing Society, and the world's largest and most comprehensive technical conference focused on the research advances and latest technological development in signal and information processing, with more than 1200 papers presented and over 2000 participants.
-
- Date: December 15, 2015
Awarded to: John R. Hershey, Takaaki Hori, Jonathan Le Roux and Shinji Watanabe
MERL Contact: Jonathan Le Roux
Research Area: Speech & Audio
Brief - The results of the third 'CHiME' Speech Separation and Recognition Challenge were publicly announced on December 15 at the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2015) held in Scottsdale, Arizona, USA. MERL's Speech and Audio Team, in collaboration with SRI, ranked 2nd out of 26 teams from Europe, Asia and the US. The task this year was to recognize speech recorded using a tablet in real environments such as cafes, buses, or busy streets. Due to the high levels of noise and the distance from the speaker's mouth to the microphones, this is very challenging task, where the baseline system only achieved 33.4% word error rate. The MERL/SRI system featured state-of-the-art techniques including multi-channel front-end, noise-robust feature extraction, and deep learning for speech enhancement, acoustic modeling, and language modeling, leading to a dramatic 73% reduction in word error rate, down to 9.1%. The core of the system has since been released as a new official challenge baseline for the community to use.
-