Artificial Intelligence
Making machines smarter for improved safety, efficiency and comfort.
Our AI research encompasses advances in computer vision, speech and audio processing, as well as data analytics. Key research themes include improved perception based on machine learning techniques, learning control policies through model-based reinforcement learning, as well as cognition and reasoning based on learned semantic representations. We apply our work to a broad range of automotive and robotics applications, as well as building and home systems.
Quick Links
-
Researchers
Jonathan
Le Roux
Toshiaki
Koike-Akino
Ye
Wang
Gordon
Wichern
Anoop
Cherian
Chiori
Hori
Tim K.
Marks
Michael J.
Jones
Daniel N.
Nikovski
Kieran
Parsons
Devesh K.
Jha
François
Germain
Suhas
Lohit
Philip V.
Orlik
Matthew
Brand
Diego
Romeres
Petros T.
Boufounos
Pu
(Perry)
WangHassan
Mansour
Moitreya
Chatterjee
Siddarth
Jain
Sameer
Khurana
William S.
Yerazunis
Kuan-Chuan
Peng
Mouhacine
Benosman
Radu
Corcodel
Arvind
Raghunathan
Jing
Liu
Hongbo
Sun
Yebin
Wang
Jianlin
Guo
Chungwei
Lin
Yanting
Ma
Bingnan
Wang
Stefano
Di Cairano
James
Queeney
Anthony
Vetro
Jinyun
Zhang
Jose
Amaya
Karl
Berntorp
Ankush
Chakrabarty
Vedang M.
Deshpande
Dehong
Liu
Pedro
Miraldo
Saviz
Mowlavi
Wataru
Tsujita
Abraham P.
Vinod
Janek
Ebbers
Ryo
Hase
Shinya
Tsuruta
Ryoma
Yataka
-
Awards
-
AWARD Jonathan Le Roux elevated to IEEE Fellow Date: January 1, 2024
Awarded to: Jonathan Le Roux
MERL Contact: Jonathan Le Roux
Research Areas: Artificial Intelligence, Machine Learning, Speech & AudioBrief- MERL Distinguished Scientist and Speech & Audio Senior Team Leader Jonathan Le Roux has been elevated to IEEE Fellow, effective January 2024, "for contributions to multi-source speech and audio processing."
Mitsubishi Electric celebrated Dr. Le Roux's elevation and that of another researcher from the company, Dr. Shumpei Kameyama, with a worldwide news release on February 15.
Dr. Jonathan Le Roux has made fundamental contributions to the field of multi-speaker speech processing, especially to the areas of speech separation and multi-speaker end-to-end automatic speech recognition (ASR). His contributions constituted a major advance in realizing a practically usable solution to the cocktail party problem, enabling machines to replicate humans’ ability to concentrate on a specific sound source, such as a certain speaker within a complex acoustic scene—a long-standing challenge in the speech signal processing community. Additionally, he has made key contributions to the measures used for training and evaluating audio source separation methods, developing several new objective functions to improve the training of deep neural networks for speech enhancement, and analyzing the impact of metrics used to evaluate the signal reconstruction quality. Dr. Le Roux’s technical contributions have been crucial in promoting the widespread adoption of multi-speaker separation and end-to-end ASR technologies across various applications, including smart speakers, teleconferencing systems, hearables, and mobile devices.
IEEE Fellow is the highest grade of membership of the IEEE. It honors members with an outstanding record of technical achievements, contributing importantly to the advancement or application of engineering, science and technology, and bringing significant value to society. Each year, following a rigorous evaluation procedure, the IEEE Fellow Committee recommends a select group of recipients for elevation to IEEE Fellow. Less than 0.1% of voting members are selected annually for this member grade elevation.
- MERL Distinguished Scientist and Speech & Audio Senior Team Leader Jonathan Le Roux has been elevated to IEEE Fellow, effective January 2024, "for contributions to multi-source speech and audio processing."
-
AWARD Honorable Mention Award at NeurIPS 23 Instruction Workshop Date: December 15, 2023
Awarded to: Lingfeng Sun, Devesh K. Jha, Chiori Hori, Siddharth Jain, Radu Corcodel, Xinghao Zhu, Masayoshi Tomizuka and Diego Romeres
MERL Contacts: Radu Corcodel; Chiori Hori; Siddarth Jain; Devesh K. Jha; Diego Romeres
Research Areas: Artificial Intelligence, Machine Learning, RoboticsBrief- MERL Researchers received an "Honorable Mention award" at the Workshop on Instruction Tuning and Instruction Following at the NeurIPS 2023 conference in New Orleans. The workshop was on the topic of instruction tuning and Instruction following for Large Language Models (LLMs). MERL researchers presented their work on interactive planning using LLMs for partially observable robotic tasks during the oral presentation session at the workshop.
-
AWARD MERL team wins the Audio-Visual Speech Enhancement (AVSE) 2023 Challenge Date: December 16, 2023
Awarded to: Zexu Pan, Gordon Wichern, Yoshiki Masuyama, Francois Germain, Sameer Khurana, Chiori Hori, and Jonathan Le Roux
MERL Contacts: François Germain; Chiori Hori; Sameer Khurana; Jonathan Le Roux; Gordon Wichern
Research Areas: Artificial Intelligence, Machine Learning, Speech & AudioBrief- MERL's Speech & Audio team ranked 1st out of 12 teams in the 2nd COG-MHEAR Audio-Visual Speech Enhancement Challenge (AVSE). The team was led by Zexu Pan, and also included Gordon Wichern, Yoshiki Masuyama, Francois Germain, Sameer Khurana, Chiori Hori, and Jonathan Le Roux.
The AVSE challenge aims to design better speech enhancement systems by harnessing the visual aspects of speech (such as lip movements and gestures) in a manner similar to the brain’s multi-modal integration strategies. MERL’s system was a scenario-aware audio-visual TF-GridNet, that incorporates the face recording of a target speaker as a conditioning factor and also recognizes whether the predominant interference signal is speech or background noise. In addition to outperforming all competing systems in terms of objective metrics by a wide margin, in a listening test, MERL’s model achieved the best overall word intelligibility score of 84.54%, compared to 57.56% for the baseline and 80.41% for the next best team. The Fisher’s least significant difference (LSD) was 2.14%, indicating that our model offered statistically significant speech intelligibility improvements compared to all other systems.
- MERL's Speech & Audio team ranked 1st out of 12 teams in the 2nd COG-MHEAR Audio-Visual Speech Enhancement Challenge (AVSE). The team was led by Zexu Pan, and also included Gordon Wichern, Yoshiki Masuyama, Francois Germain, Sameer Khurana, Chiori Hori, and Jonathan Le Roux.
See All Awards for Artificial Intelligence -
-
News & Events
-
NEWS MERL at the International Conference on Robotics and Automation (ICRA) 2024 Date: May 13, 2024 - May 17, 2024
Where: Yokohama, Japan
MERL Contacts: Anoop Cherian; Radu Corcodel; Stefano Di Cairano; Chiori Hori; Siddarth Jain; Devesh K. Jha; Jonathan Le Roux; Diego Romeres; William S. Yerazunis
Research Areas: Artificial Intelligence, Machine Learning, Optimization, Robotics, Speech & AudioBrief- MERL made significant contributions to both the organization and the technical program of the International Conference on Robotics and Automation (ICRA) 2024, which was held in Yokohama, Japan from May 13th to May 17th.
MERL was a Bronze sponsor of the conference, and exhibited a live robotic demonstration, which attracted a large audience. The demonstration showcased an Autonomous Robotic Assembly technology executed on MELCO's Assista robot arm and was the collaborative effort of the Optimization and Robotics Team together with the Advanced Technology department at Mitsubishi Electric.
MERL researchers from the Optimization and Robotics, Speech & Audio, and Control for Autonomy teams also presented 8 papers and 2 invited talks covering topics on robotic assembly, applications of LLMs to robotics, human robot interaction, safe and robust path planning for autonomous drones, transfer learning, perception and tactile sensing.
- MERL made significant contributions to both the organization and the technical program of the International Conference on Robotics and Automation (ICRA) 2024, which was held in Yokohama, Japan from May 13th to May 17th.
-
TALK [MERL Seminar Series 2024] Chuchu Fan presents talk titled Neural Certificates and LLMs in Large-Scale Autonomy Design Date & Time: Wednesday, May 29, 2024; 12:00 PM
Speaker: Chuchu Fan, MIT
MERL Host: Abraham P. Vinod
Research Areas: Artificial Intelligence, Control, Machine LearningAbstractLearning-enabled control systems have demonstrated impressive empirical performance on challenging control problems in robotics. However, this performance often arrives with the trade-off of diminished transparency and the absence of guarantees regarding the safety and stability of the learned controllers. In recent years, new techniques have emerged to provide these guarantees by learning certificates alongside control policies — these certificates provide concise, data-driven proofs that guarantee the safety and stability of the learned control system. These methods not only allow the user to verify the safety of a learned controller but also provide supervision during training, allowing safety and stability requirements to influence the training process itself. In this talk, we present two exciting updates on neural certificates. In the first work, we explore the use of graph neural networks to learn collision-avoidance certificates that can generalize to unseen and very crowded environments. The second work presents a novel reinforcement learning approach that can produce certificate functions with the policies while addressing the instability issues in the optimization process. Finally, if time permits, I will also talk about my group's recent work using LLM and domain-specific task and motion planners to allow natural language as input for robot planning.
See All News & Events for Artificial Intelligence -
-
Research Highlights
-
TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models -
Gear-NeRF: Free-Viewpoint Rendering and Tracking with Motion-Aware Spatio-Temporal Sampling -
Steered Diffusion -
Robust Machine Learning -
mmWave Beam-SNR Fingerprinting (mmBSF) -
Video Anomaly Detection -
Biosignal Processing for Human-Machine Interaction
-
-
Internships
-
SA2073: Multimodal scene-understanding
We are looking for a graduate student interested in helping advance the field of multimodal scene understanding, with a focus on scene understanding using natural language for robot dialog and/or indoor monitoring using a large language model. The intern will collaborate with MERL researchers to derive and implement new models and optimization methods, conduct experiments, and prepare results for publication. Internships regularly lead to one or more publications in top-tier venues, which can later become part of the intern''s doctoral work. The ideal candidates are senior Ph.D. students with experience in deep learning for audio-visual, signal, and natural language processing. Good programming skills in Python and knowledge of deep learning frameworks such as PyTorch are essential. Multiple positions are available with flexible start date (not just Spring/Summer but throughout 2024) and duration (typically 3-6 months).
-
ST2083: Deep Learning for Radar Perception
The Computation Sensing team at MERL is seeking a highly motivated intern to conduct fundamental research in radar perception. Expertise in deep learning-based object detection, multiple object tracking, data association, and representation learning (detection points, heatmaps, and raw radar waveforms) is required. Previous hands-on experience on open indoor/outdoor radar datasets is a plus. Familiarity with the concept of FMCW, MIMO, and range-Doppler-angle spectrum is an asset. The intern will collaborate with a small group of MERL researchers to develop novel algorithms, design experiments with MERL in-house testbed, and prepare results for patents and publication. The expected duration of the internship is 3 months with a flexible start date.
-
OR2103: Human Robot Collaboration in Assembly Tasks
MERL is looking for a self-motivated and qualified candidate to work on human-robot-interaction for manipulation and assembly collaborative scenarios. The ideal candidate is a PhD student and should have experience and records in one or multiple of the following areas. 1) Control, estimation and perception for Robotic manipulation 2) Task and Motion Planning 3) Learning from demonstration algorithms applied to robotic manipulation 4) Machine learning techniques for modeling and control as well as regression and classification problems. 5) Experience in working with robotic systems and familiarity with physics engine simulators like Mujoco, Isaac Gym, PyBullet. The successful candidate will be expected to develop, in collaboration with MERL employees, state of the art algorithms to solve complex manipulation tasks that involve human and robot collaborations. Proficiency in Python and ROS are required. The expectation is that the research will lead to one or more scientific publications. The expected duration s 3-4 months, with a flexible starting date.
See All Internships for Artificial Intelligence -
-
Recent Publications
- "SuperLoRA: Parameter-Efficient Unified Adaptation for Large Vision Models", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2024.BibTeX TR2024-062 PDF
- @inproceedings{Chen2024jun,
- author = {Chen, Xiangyu and Liu, Jing and Wang, Ye and Wang, Pu and Brand, Matthew and Wang, Guanghui and Koike-Akino, Toshiaki}},
- title = {SuperLoRA: Parameter-Efficient Unified Adaptation for Large Vision Models},
- booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
- year = 2024,
- month = jun,
- url = {https://www.merl.com/publications/TR2024-062}
- }
, - "Long-Tailed Anomaly Detection with Learnable Class Names", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2024.BibTeX TR2024-040 PDF Video Presentation
- @inproceedings{Ho2024jun,
- author = {Ho, Chih-Hui and Peng, Kuan-Chuan and Vasconcelos, Nuno},
- title = {Long-Tailed Anomaly Detection with Learnable Class Names},
- booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
- year = 2024,
- month = jun,
- url = {https://www.merl.com/publications/TR2024-040}
- }
, - "TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2024.BibTeX TR2024-059 PDF Video Software Presentation
- @inproceedings{Ni2024jun,
- author = {Ni, Haomiao and Egger, Bernhard and Lohit, Suhas and Cherian, Anoop and Wang, Ye and Koike-Akino, Toshiaki and Huang, Sharon X. and Marks, Tim K.},
- title = {TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models},
- booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
- year = 2024,
- month = jun,
- url = {https://www.merl.com/publications/TR2024-059}
- }
, - "Adversarial Imitation Learning from Visual Observations using Latent Information", Transactions on Machine Learning Research (TMLR), June 2024.BibTeX TR2024-068 PDF
- @article{Giammarino2024jun,
- author = {Giammarino, Vittorio and Queeney, James and Paschalidis, Ioannis Ch.},
- title = {Adversarial Imitation Learning from Visual Observations using Latent Information},
- journal = {Transactions on Machine Learning Research (TMLR)},
- year = 2024,
- month = jun,
- url = {https://www.merl.com/publications/TR2024-068}
- }
, - "Deep Neural Room Acoustics Primitive", International Conference on Machine Learning (ICML), June 2024.BibTeX TR2024-072 PDF
- @inproceedings{He2024jun,
- author = {He, Yuhang and Cherian, Anoop and Wichern, Gordon and Markham, Andrew}},
- title = {Deep Neural Room Acoustics Primitive},
- booktitle = {International Conference on Machine Learning (ICML)},
- year = 2024,
- month = jun,
- url = {https://www.merl.com/publications/TR2024-072}
- }
, - "Human Action Understanding-based Robot Planning using Multimodal LLM", IEEE International Conference on Robotics and Automation (ICRA), June 2024.BibTeX TR2024-066 PDF
- @inproceedings{Kambara2024jun,
- author = {Kambara, Motonari and Hori, Chiori and Sugiura, Komei and Ota, Kei and Jha, Devesh K. and Khurana, Sameer and Jain, Siddarth and Corcodel, Radu and Romeres, Diego and Le Roux, Jonathan}},
- title = {Human Action Understanding-based Robot Planning using Multimodal LLM},
- booktitle = {IEEE International Conference on Robotics and Automation (ICRA) Workshop},
- year = 2024,
- month = jun,
- url = {https://www.merl.com/publications/TR2024-066}
- }
, - "Gear-NeRF: Free-Viewpoint Rendering and Tracking with Motion-aware Spatio-Temporal Sampling", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), May 2024.BibTeX TR2024-042 PDF Videos
- @inproceedings{Liu2024may,
- author = {Liu, Xinhang and Tai, Yu-wing and Tang, Chi-Keung and Miraldo, Pedro and Lohit, Suhas and Chatterjee, Moitreya},
- title = {Gear-NeRF: Free-Viewpoint Rendering and Tracking with Motion-aware Spatio-Temporal Sampling},
- booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
- year = 2024,
- month = may,
- url = {https://www.merl.com/publications/TR2024-042}
- }
, - "Tracklet-based Explainable Video Anomaly Localization", IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, May 2024.BibTeX TR2024-057 PDF
- @inproceedings{Singh2024may,
- author = {Singh, Ashish and Jones, Michael J. and Learned-Miller, Erik}},
- title = {Tracklet-based Explainable Video Anomaly Localization},
- booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
- year = 2024,
- month = may,
- url = {https://www.merl.com/publications/TR2024-057}
- }
,
- "SuperLoRA: Parameter-Efficient Unified Adaptation for Large Vision Models", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2024.
-
Videos
-
Software & Data Downloads
-
Pixel-Grounded Prototypical Part Networks -
Sound Event Bounding Boxes -
Long-Tailed Anomaly Detection (LTAD) Dataset -
neural-IIR-field -
DeepBornFNO -
Steered Diffusion -
Hyperbolic Audio Source Separation -
Simple Multimodal Algorithmic Reasoning Task Dataset -
SOurce-free Cross-modal KnowledgE Transfer -
Audio-Visual-Language Embodied Navigation in 3D Environments -
Nonparametric Score Estimators -
3D MOrphable STyleGAN -
Instance Segmentation GAN -
Audio Visual Scene-Graph Segmentor -
Generalized One-class Discriminative Subspaces -
Goal directed RL with Safety Constraints -
Hierarchical Musical Instrument Separation -
Generating Visual Dynamics from Sound and Context -
Adversarially-Contrastive Optimal Transport -
Online Feature Extractor Network -
MotionNet -
FoldingNet++ -
Quasi-Newton Trust Region Policy Optimization -
Landmarks’ Location, Uncertainty, and Visibility Likelihood -
Robust Iterative Data Estimation -
Gradient-based Nikaido-Isoda -
Discriminative Subspace Pooling -
Partial Group Convolutional Neural Networks
-