News & Events

TALK [MERL Seminar Series 2022] Prof. Chuang Gan presents talk titled Learning to Perceive Physical Scenes from Multi-Sensory Data
Date & Time: Tuesday, September 6, 2022; 12:00 PM EDT
Speaker: Chuang Gan, UMass Amherst & MIT-IBM Watson AI Lab
MERL Host: Jonathan Le Roux
Research Areas: Artificial Intelligence, Computer Vision, Machine Learning, Speech & Audio
Abstract
- Human sensory perception of the physical world is rich and multimodal and can flexibly integrate input from all five sensory modalities -- vision, touch, smell, hearing, and taste. However, in AI, attention has primarily focused on visual perception. In this talk, I will introduce my efforts in connecting vision with sound, which will allow machine perception systems to see objects and infer physics from multi-sensory data. In the first part of my talk, I will introduce a. self-supervised approach that could learn to parse images and separate the sound sources by watching and listening to unlabeled videos without requiring additional manual supervision. In the second part of my talk, I will show we may further infer the underlying causal structure in 3D environments through visual and auditory observations. This enables agents to seek the sound source of repeating environmental sound (e.g., alarm) or identify what object has fallen, and where, from an intermittent impact sound.
NEWS MERL congratulates Prof. Alex Waibel on receiving 2023 IEEE James L. Flanagan Speech and Audio Processing Award
Date: August 22, 2022
MERL Contacts: Chiori Hori; Jonathan Le Roux; Anthony Vetro
Research Areas: Artificial Intelligence, Machine Learning, Speech & Audio
Brief
- IEEE has announced that the recipient of the 2023 IEEE James L. Flanagan Speech and Audio Processing Award will be Prof. Alex Waibel (CMU/Karlsruhe Institute of Technology), “For pioneering contributions to spoken language translation and supporting technologies.” Mitsubishi Electric Research Laboratories (MERL), which has become the new sponsor of this prestigious award in 2022, extends our warmest congratulations to Prof. Waibel.
  
  MERL Senior Principal Research Scientist Dr. Chiori Hori, who worked with Dr. Waibel at Carnegie Mellon University and collaborated with him as part of national projects on speech summarization and translation, comments on his invaluable contributions to the field: “He has contributed not only to the invention of groundbreaking technology in speech and spoken language processing but also to the promotion of an abundance of research projects through international research consortiums by linking American, European, and Asian research communities. Many of his former laboratory members and collaborators are now leading R&D in the AI field.”
  
  The IEEE Board of Directors established the IEEE James L. Flanagan Speech and Audio Processing Award in 2002 for outstanding contributions to the advancement of speech and/or audio signal processing. This award has recognized the contributions of some of the most renowned pioneers and leaders in their respective fields. MERL is proud to support the recognition of outstanding contributions to the field of speech and audio processing through its sponsorship of this award.
NEWS MERL presenting 8 papers at ICASSP 2022
Date: May 22, 2022 - May 27, 2022
Where: Singapore
MERL Contacts: Anoop Cherian; Chiori Hori; Toshiaki Koike-Akino; Jonathan Le Roux; Tim K. Marks; Philip V. Orlik; Kuan-Chuan Peng; Pu (Perry) Wang; Gordon Wichern
Research Areas: Artificial Intelligence, Computer Vision, Signal Processing, Speech & Audio
Brief
- MERL researchers are presenting 8 papers at the IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP), which is being held in Singapore from May 22-27, 2022. A week of virtual presentations also took place earlier this month.
  
  Topics to be presented include recent advances in speech recognition, audio processing, scene understanding, computational sensing, and classification.
  
  ICASSP is the flagship conference of the IEEE Signal Processing Society, and the world's largest and most comprehensive technical conference focused on the research advances and latest technological development in signal and information processing. The event attracts more than 2000 participants each year.
NEWS MERL work on scene-aware interaction featured in IEEE Spectrum
Date: March 1, 2022
MERL Contacts: Anoop Cherian; Chiori Hori; Jonathan Le Roux; Tim K. Marks; Anthony Vetro
Research Areas: Artificial Intelligence, Computer Vision, Machine Learning, Speech & Audio
Brief
- MERL's research on scene-aware interaction was recently featured in an IEEE Spectrum article. The article, titled "At Last, A Self-Driving Car That Can Explain Itself" and authored by MERL Senior Principal Research Scientist Chiori Hori and MERL Director Anthony Vetro, gives an overview of MERL's efforts towards developing a system that can analyze multimodal sensing information for highly natural and intuitive interaction with humans through context-dependent generation of natural language. The technology recognizes contextual objects and events based on multimodal sensing information, such as images and video captured with cameras, audio information recorded with microphones, and localization information measured with LiDAR.
  
  Scene-Aware Interaction for car navigation, one target application that the article focuses on, will provide drivers with intuitive route guidance. Scene-Aware Interaction technology is expected to have wide applicability, including human-machine interfaces for in-vehicle infotainment, interaction with service robots in building and factory automation systems, systems that monitor the health and well-being of people, surveillance systems that interpret complex scenes for humans and encourage social distancing, support for touchless operation of equipment in public areas, and much more. MERL's Scene-Aware Interaction Technology had previously been featured in a Mitsubishi Electric Corporation Press Release.
  
  IEEE Spectrum is the flagship magazine and website of the IEEE, the world’s largest professional organization devoted to engineering and the applied sciences. IEEE Spectrum has a circulation of over 400,000 engineers worldwide, making it one of the leading science and engineering magazines.
TALK [MERL Seminar Series 2022] Learning Speech Representations with Multimodal Self-Supervision
Date & Time: Tuesday, March 1, 2022; 1:00 PM EST
Speaker: David Harwath, The University of Texas at Austin
MERL Host: Chiori Hori
Research Areas: Artificial Intelligence, Machine Learning, Speech & Audio
Abstract
- Humans learn spoken language and visual perception at an early age by being immersed in the world around them. Why can't computers do the same? In this talk, I will describe our ongoing work to develop methodologies for grounding continuous speech signals at the raw waveform level to natural image scenes. I will first present self-supervised models capable of discovering discrete, hierarchical structure (words and sub-word units) in the speech signal. Instead of conventional annotations, these models learn from correspondences between speech sounds and visual patterns such as objects and textures. Next, I will demonstrate how these discrete units can be used as a drop-in replacement for text transcriptions in an image captioning system, enabling us to directly synthesize spoken descriptions of images without the need for text as an intermediate representation. Finally, I will describe our latest work on Transformer-based models of visually-grounded speech. These models significantly outperform the prior state of the art on semantic speech-to-image retrieval tasks, and also learn representations that are useful for a multitude of other speech processing tasks.
NEWS Jonathan Le Roux discusses MERL's audio source separation work on popular machine learning podcast
Date: January 24, 2022
Where: The TWIML AI Podcast
MERL Contact: Jonathan Le Roux
Research Areas: Artificial Intelligence, Machine Learning, Speech & Audio
Brief
- MERL Speech & Audio Senior Team Leader Jonathan Le Roux was featured in an extended interview on the popular TWIML AI Podcast, presenting MERL's work towards solving the "cocktail party problem". Humans have the extraordinary ability to focus on particular sounds of interest within a complex acoustic scene, such as a cocktail party. MERL's Speech & Audio Team has been at the forefront of the field's effort to develop algorithms giving machines similar abilities. Jonathan talked with host Sam Charrington about the group's decade-long journey on this topic, from early pioneering work using deep learning for speech enhancement and speech separation, to recent works on weakly-supervised separation, hierarchical sound separation, as well as the separation of real-world soundtracks into speech, music, and sound effects (aka the "cocktail fork problem").
  
  The TWIML AI podcast, formerly known as This Week in Machine Learning & AI, was created in 2016 and is followed by more than 10,000 subscribers on Youtube and Twitter. Jonathan's interview marks the 555th episode of the podcast.
EVENT Prof. Melanie Zeilinger of ETH to give keynote at MERL's Virtual Open House
Date & Time: Thursday, December 9, 2021; 1:00pm - 5:30pm EST
Location: Virtual Event
Speaker: Prof. Melanie Zeilinger, ETH
Research Areas: Applied Physics, Artificial Intelligence, Communications, Computational Sensing, Computer Vision, Control, Data Analytics, Dynamical Systems, Electric Systems, Electronic and Photonic Devices, Machine Learning, Multi-Physical Modeling, Optimization, Robotics, Signal Processing, Speech & Audio, Digital Video, Human-Computer Interaction, Information Security
Brief
- MERL is excited to announce the second keynote speaker for our Virtual Open House 2021:
  Prof. Melanie Zeilinger from ETH .
  
  Our virtual open house will take place on December 9, 2021, 1:00pm - 5:30pm (EST).
  
  Join us to learn more about who we are, what we do, and discuss our internship and employment opportunities. Prof. Zeilinger's talk is scheduled for 3:15pm - 3:45pm (EST).
  
  Registration: https://mailchi.mp/merl/merlvoh2021
  
  Keynote Title: Control Meets Learning - On Performance, Safety and User Interaction
  
  Abstract: With increasing sensing and communication capabilities, physical systems today are becoming one of the largest generators of data, making learning a central component of autonomous control systems. While this paradigm shift offers tremendous opportunities to address new levels of system complexity, variability and user interaction, it also raises fundamental questions of learning in a closed-loop dynamical control system. In this talk, I will present some of our recent results showing how even safety-critical systems can leverage the potential of data. I will first briefly present concepts for using learning for automatic controller design and for a new safety framework that can equip any learning-based controller with safety guarantees. The second part will then discuss how expert and user information can be utilized to optimize system performance, where I will particularly highlight an approach developed together with MERL for personalizing the motion planning in autonomous driving to the individual driving style of a passenger.
EVENT Prof. Ashok Veeraraghavan of Rice University to give keynote at MERL's Virtual Open House
Date & Time: Thursday, December 9, 2021; 1:00pm - 5:30pm EST
Location: Virtual Event
Speaker: Prof. Ashok Veeraraghavan, Rice University
Research Areas: Applied Physics, Artificial Intelligence, Communications, Computational Sensing, Computer Vision, Control, Data Analytics, Dynamical Systems, Electric Systems, Electronic and Photonic Devices, Machine Learning, Multi-Physical Modeling, Optimization, Robotics, Signal Processing, Speech & Audio, Digital Video, Human-Computer Interaction, Information Security
Brief
- MERL is excited to announce the first keynote speaker for our Virtual Open House 2021:
  Prof. Ashok Veeraraghavan from Rice University.
  
  Our virtual open house will take place on December 9, 2021, 1:00pm - 5:30pm (EST).
  
  Join us to learn more about who we are, what we do, and discuss our internship and employment opportunities. Prof. Veeraraghavan's talk is scheduled for 1:15pm - 1:45pm (EST).
  
  Registration: https://mailchi.mp/merl/merlvoh2021
  
  Keynote Title: Computational Imaging: Beyond the limits imposed by lenses.
  
  Abstract: The lens has long been a central element of cameras, since its early use in the mid-nineteenth century by Niepce, Talbot, and Daguerre. The role of the lens, from the Daguerrotype to modern digital cameras, is to refract light to achieve a one-to-one mapping between a point in the scene and a point on the sensor. This effect enables the sensor to compute a particular two-dimensional (2D) integral of the incident 4D light-field. We propose a radical departure from this practice and the many limitations it imposes. In the talk we focus on two inter-related research projects that attempt to go beyond lens-based imaging.
  
  First, we discuss our lab’s recent efforts to build flat, extremely thin imaging devices by replacing the lens in a conventional camera with an amplitude mask and computational reconstruction algorithms. These lensless cameras, called FlatCams can be less than a millimeter in thickness and enable applications where size, weight, thickness or cost are the driving factors. Second, we discuss high-resolution, long-distance imaging using Fourier Ptychography, where the need for a large aperture aberration corrected lens is replaced by a camera array and associated phase retrieval algorithms resulting again in order of magnitude reductions in size, weight and cost. Finally, I will spend a few minutes discussing how the wholistic computational imaging approach can be used to create ultra-high-resolution wavefront sensors.
EVENT MERL Virtual Open House 2021
Date & Time: Thursday, December 9, 2021; 100pm-5:30pm (EST)
Location: Virtual Event
Research Areas: Applied Physics, Artificial Intelligence, Communications, Computational Sensing, Computer Vision, Control, Data Analytics, Dynamical Systems, Electric Systems, Electronic and Photonic Devices, Machine Learning, Multi-Physical Modeling, Optimization, Robotics, Signal Processing, Speech & Audio, Digital Video, Human-Computer Interaction, Information Security
Brief
- Mitsubishi Electric Research Laboratories cordially invites you to join our Virtual Open House, on December 9, 2021, 1:00pm - 5:30pm (EST).
  
  The event will feature keynotes, live sessions, research area booths, and time for open interactions with our researchers. Join us to learn more about who we are, what we do, and discuss our internship and employment opportunities.
  
  Registration: https://mailchi.mp/merl/merlvoh2021
TALK [MERL Seminar Series 2021] Dr. Ruohan Gao presents talk at MERL entitled Look and Listen: From Semantic to Spatial Audio-Visual Perception
Date & Time: Tuesday, September 28, 2021; 1:00 PM EST
Speaker: Dr. Ruohan Gao, Stanford University
MERL Host: Gordon Wichern
Research Areas: Computer Vision, Machine Learning, Speech & Audio
Abstract
- While computer vision has made significant progress by "looking" — detecting objects, actions, or people based on their appearance — it often does not listen. Yet cognitive science tells us that perception develops by making use of all our senses without intensive supervision. Towards this goal, in this talk I will present my research on audio-visual learning — We disentangle object sounds from unlabeled video, use audio as an efficient preview for action recognition in untrimmed video, decode the monaural soundtrack into its binaural counterpart by injecting visual spatial information, and use echoes to interact with the environment for spatial image representation learning. Together, these are steps towards multimodal understanding of the visual world, where audio serves as both the semantic and spatial signals. In the end, I will also briefly talk about our latest work on multisensory learning for robotics.
NEWS MERL Congratulates Recipients of 2022 IEEE Technical Field Awards in Signal Processing
Date: July 26, 2021
MERL Contacts: Petros T. Boufounos; Jonathan Le Roux; Philip V. Orlik; Anthony Vetro
Research Areas: Signal Processing, Speech & Audio
Brief
- IEEE has announced that the recipients of the 2022 IEEE James L. Flanagan Speech and Audio Processing Award will be Hervé Bourlard (EPFL/Idiap Research Institute) and Nelson Morgan (ICSI), "For contributions to neural networks for statistical speech recognition," and the recipient of the 2022 IEEE Fourier Award for Signal Processing will be Ali Sayed (EPFL), "For contributions to the theory and practice of adaptive signal processing." More details about the contributions of Prof. Bourlard and Prof. Morgan can be found in the announcements by ICSI and EPFL, and those of Prof. Sayed in EPFL's announcement. Mitsubishi Electric Research Laboratories (MERL) has recently become the new sponsor of these two prestigious awards, and extends our warmest congratulations to all of the 2022 award recipients.
  
  The IEEE Board of Directors established the IEEE James L. Flanagan Speech and Audio Processing Award in 2002 for outstanding contributions to the advancement of speech and/or audio signal processing, while the IEEE Fourier Award for Signal Processing was established in 2012 for outstanding contribution to the advancement of signal processing, other than in the areas of speech and audio processing. Both awards have recognized the contributions of some of the most renowned pioneers and leaders in their respective fields. MERL is proud to support the recognition of outstanding contributions to the signal processing field through its sponsorship of these awards.
NEWS MERL becomes new sponsor of two prestigious IEEE Technical Field Awards in Signal Processing
Date: July 9, 2021
MERL Contacts: Petros T. Boufounos; Jonathan Le Roux; Philip V. Orlik; Anthony Vetro
Research Areas: Signal Processing, Speech & Audio
Brief
- Mitsubishi Electric Research Laboratories (MERL) has become the new sponsor of two prestigious IEEE Technical Field Awards in Signal Processing, the IEEE James L. Flanagan Speech and Audio Processing Award and the IEEE Fourier Award for Signal Processing, for the years 2022-2031. "MERL is proud to support the recognition of outstanding contributions to signal processing by sponsoring both the IEEE James L. Flanagan Speech and Audio Processing Award and the IEEE Fourier Award for Signal Processing. These awards celebrate the creativity and innovation in the field that touch many aspects of our lives and drive our society forward" said Dr. Anthony Vetro, VP and Director at MERL.
  
  The IEEE Board of Directors established the IEEE James L. Flanagan Speech and Audio Processing Award in 2002 for outstanding contributions to the advancement of speech and/or audio signal processing, while the IEEE Fourier Award for Signal Processing was established in 2012 for outstanding contribution to the advancement of signal processing, other than in the areas of speech and audio processing. Both awards have since recognized the contributions of some of the most renowned pioneers and leaders in their respective fields.
  
  By underwriting these IEEE Technical Field Awards, MERL continues to make a mark by supporting the advancement of technology that makes lasting changes in the world.
NEWS Chiori Hori will give keynote on scene understanding via multimodal sensing at AI Electronics Symposium
Date: February 15, 2021
Where: The 2nd International Symposium on AI Electronics
MERL Contact: Chiori Hori
Research Areas: Artificial Intelligence, Computer Vision, Machine Learning, Speech & Audio
Brief
- Chiori Hori, a Senior Principal Researcher in MERL's Speech and Audio Team, will be a keynote speaker at the 2nd International Symposium on AI Electronics, alongside Alex Acero, Senior Director of Apple Siri, Roberto Cipolla, Professor of Information Engineering at the University of Cambridge, and Hiroshi Amano, Professor at Nagoya University and winner of the Nobel prize in Physics for his work on blue light-emitting diodes. The symposium, organized by Tohoku University, will be held online on February 15, 2021, 10am-4pm (JST).
  
  Chiori's talk, titled "Human Perspective Scene Understanding via Multimodal Sensing", will present MERL's work towards the development of scene-aware interaction. One important piece of technology that is still missing for human-machine interaction is natural and context-aware interaction, where machines understand their surrounding scene from the human perspective, and they can share their understanding with humans using natural language. To bridge this communications gap, MERL has been working at the intersection of research fields such as spoken dialog, audio-visual understanding, sensor signal understanding, and robotics technologies in order to build a new AI paradigm, called scene-aware interaction, that enables machines to translate their perception and understanding of a scene and respond to it using natural language to interact more effectively with humans. In this talk, the technologies will be surveyed, and an application for future car navigation will be introduced.
EVENT MERL Virtual Open House 2020
Date & Time: Wednesday, December 9, 2020; 1:00-5:00PM EST
Location: Virtual
MERL Contacts: Elizabeth Phillips; Anthony Vetro
Research Areas: Applied Physics, Artificial Intelligence, Communications, Computational Sensing, Computer Vision, Control, Data Analytics, Dynamical Systems, Electric Systems, Electronic and Photonic Devices, Machine Learning, Multi-Physical Modeling, Optimization, Robotics, Signal Processing, Speech & Audio
Brief
- MERL will host a virtual open house on December 9, 2020. Live sessions will be held from 1-5pm EST, including an overview of recent activities by our research groups and a talk by Prof. Pierre Moulin of University of Illinois at Urbana-Champaign on adversarial machine learning. Registered attendees will also be able to browse our virtual booths at their convenience and connect with our research staff on engagement opportunities including internship, post-doc and research scientist openings, as well as visiting faculty positions.
  
  Registration: https://mailchi.mp/merl/merl-virtual-open-house-2020
  Schedule: https://www.merl.com/events/voh20
  
  Current internship and employment openings:
  https://www.merl.com/internship/openings
  https://www.merl.com/employment/employment
  
  Information about working at MERL:
  https://www.merl.com/employment
AWARD Best Poster Award and Best Video Award at the International Society for Music Information Retrieval Conference (ISMIR) 2020
Date: October 15, 2020
Awarded to: Ethan Manilow, Gordon Wichern, Jonathan Le Roux
MERL Contacts: Jonathan Le Roux; Gordon Wichern
Research Areas: Artificial Intelligence, Machine Learning, Speech & Audio
Brief
- Former MERL intern Ethan Manilow and MERL researchers Gordon Wichern and Jonathan Le Roux won Best Poster Award and Best Video Award at the 2020 International Society for Music Information Retrieval Conference (ISMIR 2020) for the paper "Hierarchical Musical Source Separation". The conference was held October 11-14 in a virtual format. The Best Poster Awards and Best Video Awards were awarded by popular vote among the conference attendees.
  
  The paper proposes a new method for isolating individual sounds in an audio mixture that accounts for the hierarchical relationship between sound sources. Many sounds we are interested in analyzing are hierarchical in nature, e.g., during a music performance, a hi-hat note is one of many such hi-hat notes, which is one of several parts of a drumkit, itself one of many instruments in a band, which might be playing in a bar with other sounds occurring. Inspired by this, the paper re-frames the audio source separation problem as hierarchical, combining similar sounds together at certain levels while separating them at other levels, and shows on a musical instrument separation task that a hierarchical approach outperforms non-hierarchical models while also requiring less training data. The paper, poster, and video can be seen on the paper page on the ISMIR website.
NEWS Anoop Cherian gave an invited talk at the Multi-modal Video Analysis Workshop, ECCV 2020
Date: August 23, 2020
Where: European Conference on Computer Vision (ECCV), online, 2020
MERL Contact: Anoop Cherian
Research Areas: Artificial Intelligence, Computer Vision, Machine Learning, Speech & Audio
Brief
- MERL Principal Research Scientist Anoop Cherian gave an invited talk titled "Sound2Sight: Audio-Conditioned Visual Imagination" at the Multi-modal Video Analysis workshop held in conjunction with the European Conference on Computer Vision (ECCV), 2020. The talk was based on a recent ECCV paper that describes a new multimodal reasoning task called Sound2Sight and a generative adversarial machine learning algorithm for producing plausible video sequences conditioned on sound and visual context.
NEWS MERL's Scene-Aware Interaction Technology Featured in Mitsubishi Electric Corporation Press Release
Date: July 22, 2020
Where: Tokyo, Japan
MERL Contacts: Anoop Cherian; Chiori Hori; Jonathan Le Roux; Tim K. Marks; Anthony Vetro
Research Areas: Artificial Intelligence, Computer Vision, Machine Learning, Speech & Audio
Brief
- Mitsubishi Electric Corporation announced that the company has developed what it believes to be the world’s first technology capable of highly natural and intuitive interaction with humans based on a scene-aware capability to translate multimodal sensing information into natural language.
  
  The novel technology, Scene-Aware Interaction, incorporates Mitsubishi Electric’s proprietary Maisart® compact AI technology to analyze multimodal sensing information for highly natural and intuitive interaction with humans through context-dependent generation of natural language. The technology recognizes contextual objects and events based on multimodal sensing information, such as images and video captured with cameras, audio information recorded with microphones, and localization information measured with LiDAR.
  
  Scene-Aware Interaction for car navigation, one target application, will provide drivers with intuitive route guidance. The technology is also expected to have applicability to human-machine interfaces for in-vehicle infotainment, interaction with service robots in building and factory automation systems, systems that monitor the health and well-being of people, surveillance systems that interpret complex scenes for humans and encourage social distancing, support for touchless operation of equipment in public areas, and much more. The technology is based on recent research by MERL's Speech & Audio and Computer Vision groups.
NEWS Jonathan Le Roux gives Plenary Lecture at the JSALT 2020 Summer Workshop
Date: July 10, 2020
Where: Virtual Baltimore, MD
MERL Contact: Jonathan Le Roux
Research Areas: Artificial Intelligence, Machine Learning, Speech & Audio
Brief
- MERL Senior Principal Research Scientist and Speech and Audio Senior Team Leader Jonathan Le Roux was invited by the Center for Language and Speech Processing at Johns Hopkins University to give a plenary lecture at the 2020 Frederick Jelinek Memorial Summer Workshop on Speech and Language Technology (JSALT). The talk, entitled "Deep Learning for Multifarious Speech Processing: Tackling Multiple Speakers, Microphones, and Languages", presented an overview of deep learning techniques developed at MERL towards the goal of cracking the Tower of Babel version of the cocktail party problem, that is, separating and/or recognizing the speech of multiple unknown speakers speaking simultaneously in multiple languages, in both single-channel and multi-channel scenarios: from deep clustering to chimera networks, phasebook and friends, and from seamless ASR to MIMO-Speech and Transformer-based multi-speaker ASR.
  
  JSALT 2020 is the seventh in a series of six-week-long research workshops on Machine Learning for Speech Language and Computer Vision Technology. A continuation of the well known Johns Hopkins University summer workshops, these workshops bring together diverse "dream teams" of leading professionals, graduate students, and undergraduates, in a truly cooperative, intensive, and substantive effort to advance the state of the science. MERL researchers led such teams in the JSALT 2015 workshop, on "Far-Field Speech Enhancement and Recognition in Mismatched Settings", and the JSALT 2018 workshop, on "Multi-lingual End-to-End Speech Recognition for Incomplete Data".
NEWS Zhong-Qiu Wang joins MERL's Speech and Audio Team
Date: June 22, 2020
Research Areas: Artificial Intelligence, Machine Learning, Speech & Audio
Brief
- We are excited to announce that Dr. Zhong-Qiu Wang, who recently obtained his Ph.D. from The Ohio State University, has joined MERL's Speech and Audio Team as a Visiting Research Scientist. Zhong-Qiu brings strong expertise in microphone array processing, speech enhancement, blind source/speaker separation, and robust automatic speech recognition, for which he has developed some of the most advanced machine learning and deep learning methods.
  
  Prior to joining MERL, Zhong-Qiu received the B.Eng. degree in 2013 from Harbin Institute of Technology, Harbin, China, and the M.Sc. and Ph.D. degree in 2017 and 2020 from The Ohio State University, Columbus, USA, all in Computer Science. He was a summer research intern at Microsoft Research, Mitsubishi Electric Research Laboratories, and Google AI. He received a Best Student Paper Award at ICASSP 2018 for his work as an intern at MERL, and a Graduate Research Award from OSU Department of Computer Science and Engineering in 2020.
NEWS MERL presenting 13 papers and an industry talk at ICASSP 2020
Date: May 4, 2020 - May 8, 2020
Where: Virtual Barcelona
MERL Contacts: Petros T. Boufounos; Chiori Hori; Toshiaki Koike-Akino; Jonathan Le Roux; Dehong Liu; Yanting Ma; Hassan Mansour; Philip V. Orlik; Anthony Vetro; Pu (Perry) Wang; Gordon Wichern
Research Areas: Computational Sensing, Computer Vision, Machine Learning, Signal Processing, Speech & Audio
Brief
- MERL researchers are presenting 13 papers at the IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP), which is being held virtually from May 4-8, 2020. Petros Boufounos is also presenting a talk on the Computational Sensing Revolution in Array Processing (video) in ICASSP’s Industry Track, and Siheng Chen is co-organizing and chairing a special session on a Signal-Processing View of Graph Neural Networks.
  
  Topics to be presented include recent advances in speech recognition, audio processing, scene understanding, computational sensing, array processing, and parameter estimation. Videos for all talks are available on MERL's YouTube channel, with corresponding links in the references below.
  
  This year again, MERL is a sponsor of the conference and will be participating in the Student Job Fair; please join us to learn about our internship program and career opportunities.
  
  ICASSP is the flagship conference of the IEEE Signal Processing Society, and the world's largest and most comprehensive technical conference focused on the research advances and latest technological development in signal and information processing. The event attracts more than 2000 participants each year. Originally planned to be held in Barcelona, Spain, ICASSP has moved to a fully virtual setting due to the COVID-19 crisis, with free registration for participants not covering a paper.
AWARD Best Paper Award at the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2019
Date: December 18, 2019
Awarded to: Xuankai Chang, Wangyou Zhang, Yanmin Qian, Jonathan Le Roux, Shinji Watanabe
MERL Contact: Jonathan Le Roux
Research Areas: Artificial Intelligence, Machine Learning, Speech & Audio
Brief
- MERL researcher Jonathan Le Roux and co-authors Xuankai Chang, Shinji Watanabe (Johns Hopkins University), Wangyou Zhang, and Yanmin Qian (Shanghai Jiao Tong University) won the Best Paper Award at the 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2019), for the paper "MIMO-Speech: End-to-End Multi-Channel Multi-Speaker Speech Recognition". MIMO-Speech is a fully neural end-to-end framework that can transcribe the text of multiple speakers speaking simultaneously from multi-channel input. The system is comprised of a monaural masking network, a multi-source neural beamformer, and a multi-output speech recognition model, which are jointly optimized only via an automatic speech recognition (ASR) criterion. The award was received by lead author Xuankai Chang during the conference, which was held in Sentosa, Singapore from December 14-18, 2019.
NEWS Takaaki Hori elected to IEEE Technical Committee on Speech and Language Processing
Date: November 9, 2019
Research Areas: Artificial Intelligence, Machine Learning, Speech & Audio
Brief
- Takaaki Hori has been elected to serve on the Speech and Language Processing Technical Committee (SLTC) of the IEEE Signal Processing Society for a 3-year term.
  
  The SLTC promotes and influences all the technical areas of speech and language processing such as speech recognition, speech synthesis, spoken language understanding, speech to speech translation, spoken dialog management, speech indexing, information extraction from audio, and speaker and language recognition.
EVENT SANE 2019 - Speech and Audio in the Northeast
Date: Thursday, October 24, 2019
Location: Columbia University, New York, NY
MERL Contact: Jonathan Le Roux
Research Areas: Artificial Intelligence, Machine Learning, Speech & Audio
Brief
- SANE 2019, a one-day event gathering researchers and students in speech and audio from the Northeast of the American continent, was held on Thursday October 24, 2019 at Columbia University, in New York City.
  
  It was the 8th edition in the SANE series of workshops, which started in 2012 and has been held every year alternately in Boston and New York. Since the first edition, the audience has steadily grown, with a previous record of 180 participants in 2017 and 2018, and a new record of 200 participants and 45 posters in 2019.
  
  This year's SANE conveniently took place in conjunction both with the WASPAA workshop, held October 20-23 in upstate New York, and with the DCASE workshop, held October 25-26 in Brooklyn, NY, for a full week of speech and audio enlightenment and delight.
  
  SANE 2019 featured invited talks by seven leading researchers from the Northeast as well as from the international community: Brian Kingsbury (IBM TJ Watson Research Center), Kristen Grauman (University of Texas at Austin, Facebook AI Research), Simon Doclo (University of Oldenburg), Karen Livescu (TTI-Chicago), Gabriel Synnaeve (Facebook AI Research), Hirokazu Kameoka (NTT Communication Science Laboratories), Ron Weiss (Google Brain). It also featured live demonstrations by Jonathan Le Roux (MERL) and Andrew Titus (Apple), and a lively poster session with 45 posters in Columbia University's Low Memorial Library, a National Historic Landmark.
  
  SANE 2019 was co-organized by Jonathan Le Roux (MERL), Nima Mesgarani (Columbia), John R. Hershey (Google), Shinji Watanabe (Johns Hopkins), and Steven J. Rennie (Pryon Inc.). SANE remained a free event thanks to generous sponsorship by Columbia University, MERL, Google, Apple, and Amazon.
  
  Slides and videos of the talks are available from the SANE workshop website.
NEWS MERL Speech & Audio Researchers Presenting 7 Papers and a Tutorial at Interspeech 2019
Date: September 15, 2019 - September 19, 2019
Where: Graz, Austria
MERL Contacts: Chiori Hori; Jonathan Le Roux; Gordon Wichern
Research Areas: Artificial Intelligence, Machine Learning, Speech & Audio
Brief
- MERL Speech & Audio Team researchers will be presenting 7 papers at the 20th Annual Conference of the International Speech Communication Association INTERSPEECH 2019, which is being held in Graz, Austria from September 15-19, 2019. Topics to be presented include recent advances in end-to-end speech recognition, speech separation, and audio-visual scene-aware dialog. Takaaki Hori is also co-presenting a tutorial on end-to-end speech processing.
  
  Interspeech is the world's largest and most comprehensive conference on the science and technology of spoken language processing. It gathers around 2000 participants from all over the world.
NEWS MERL presenting 16 papers at ICASSP 2019
Date: May 12, 2019 - May 17, 2019
Where: Brighton, UK
MERL Contacts: Petros T. Boufounos; Anoop Cherian; Chiori Hori; Toshiaki Koike-Akino; Jonathan Le Roux; Dehong Liu; Hassan Mansour; Tim K. Marks; Philip V. Orlik; Anthony Vetro; Pu (Perry) Wang; Gordon Wichern
Research Areas: Computational Sensing, Computer Vision, Machine Learning, Signal Processing, Speech & Audio
Brief
- MERL researchers will be presenting 16 papers at the IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP), which is being held in Brighton, UK from May 12-17, 2019. Topics to be presented include recent advances in speech recognition, audio processing, scene understanding, computational sensing, and parameter estimation. MERL is also a sponsor of the conference and will be participating in the student career luncheon; please join us at the lunch to learn about our internship program and career opportunities.
  
  ICASSP is the flagship conference of the IEEE Signal Processing Society, and the world's largest and most comprehensive technical conference focused on the research advances and latest technological development in signal and information processing. The event attracts more than 2000 participants each year.