Artificial Intelligence
Making machines smarter for improved safety, efficiency and comfort.
Our AI research encompasses advances in computer vision, speech and audio processing, as well as data analytics. Key research themes include improved perception based on machine learning techniques, learning control policies through model-based reinforcement learning, as well as cognition and reasoning based on learned semantic representations. We apply our work to a broad range of automotive and robotics applications, as well as building and home systems.
Quick Links
-
Researchers
Jonathan
Le Roux
Toshiaki
Koike-Akino
Ye
Wang
Gordon
Wichern
Anoop
Cherian
Tim K.
Marks
Chiori
Hori
Michael J.
Jones
Kieran
Parsons
Daniel N.
Nikovski
Devesh K.
Jha
François
Germain
Suhas
Lohit
Philip V.
Orlik
Matthew
Brand
Diego
Romeres
Petros T.
Boufounos
Pu
(Perry)
WangMoitreya
Chatterjee
Siddarth
Jain
Sameer
Khurana
Hassan
Mansour
Kuan-Chuan
Peng
William S.
Yerazunis
Mouhacine
Benosman
Jing
Liu
Radu
Corcodel
Arvind
Raghunathan
Hongbo
Sun
Yebin
Wang
Jianlin
Guo
Chungwei
Lin
Yanting
Ma
Bingnan
Wang
Ryo
Aihara
Stefano
Di Cairano
Pedro
Miraldo
Saviz
Mowlavi
James
Queeney
Anthony
Vetro
Jinyun
Zhang
Jose
Amaya
Karl
Berntorp
Ankush
Chakrabarty
Vedang M.
Deshpande
Janek
Ebbers
Dehong
Liu
Alexander
Schperberg
Wataru
Tsujita
Abraham P.
Vinod
Na
Li
-
Awards
-
AWARD MERL team wins the Listener Acoustic Personalisation (LAP) 2024 Challenge Date: August 29, 2024
Awarded to: Yoshiki Masuyama, Gordon Wichern, Francois G. Germain, Christopher Ick, and Jonathan Le Roux
MERL Contacts: François Germain; Jonathan Le Roux; Gordon Wichern
Research Areas: Artificial Intelligence, Machine Learning, Speech & AudioBrief- MERL's Speech & Audio team ranked 1st out of 7 teams in Task 2 of the 1st SONICOM Listener Acoustic Personalisation (LAP) Challenge, which focused on "Spatial upsampling for obtaining a high-spatial-resolution HRTF from a very low number of directions". The team was led by Yoshiki Masuyama, and also included Gordon Wichern, Francois Germain, MERL intern Christopher Ick, and Jonathan Le Roux.
The LAP Challenge workshop and award ceremony was hosted by the 32nd European Signal Processing Conference (EUSIPCO 24) on August 29, 2024 in Lyon, France. Yoshiki Masuyama presented the team's method, "Retrieval-Augmented Neural Field for HRTF Upsampling and Personalization", and received the award from Prof. Michele Geronazzo (University of Padova, IT, and Imperial College London, UK), Chair of the Challenge's Organizing Committee.
The LAP challenge aims to explore challenges in the field of personalized spatial audio, with the first edition focusing on the spatial upsampling and interpolation of head-related transfer functions (HRTFs). HRTFs with dense spatial grids are required for immersive audio experiences, but their recording is time-consuming. Although HRTF spatial upsampling has recently shown remarkable progress with approaches involving neural fields, HRTF estimation accuracy remains limited when upsampling from only a few measured directions, e.g., 3 or 5 measurements. The MERL team tackled this problem by proposing a retrieval-augmented neural field (RANF). RANF retrieves a subject whose HRTFs are close to those of the target subject at the measured directions from a library of subjects. The HRTF of the retrieved subject at the target direction is fed into the neural field in addition to the desired sound source direction. The team also developed a neural network architecture that can handle an arbitrary number of retrieved subjects, inspired by a multi-channel processing technique called transform-average-concatenate.
- MERL's Speech & Audio team ranked 1st out of 7 teams in Task 2 of the 1st SONICOM Listener Acoustic Personalisation (LAP) Challenge, which focused on "Spatial upsampling for obtaining a high-spatial-resolution HRTF from a very low number of directions". The team was led by Yoshiki Masuyama, and also included Gordon Wichern, Francois Germain, MERL intern Christopher Ick, and Jonathan Le Roux.
-
AWARD Jonathan Le Roux elevated to IEEE Fellow Date: January 1, 2024
Awarded to: Jonathan Le Roux
MERL Contact: Jonathan Le Roux
Research Areas: Artificial Intelligence, Machine Learning, Speech & AudioBrief- MERL Distinguished Scientist and Speech & Audio Senior Team Leader Jonathan Le Roux has been elevated to IEEE Fellow, effective January 2024, "for contributions to multi-source speech and audio processing."
Mitsubishi Electric celebrated Dr. Le Roux's elevation and that of another researcher from the company, Dr. Shumpei Kameyama, with a worldwide news release on February 15.
Dr. Jonathan Le Roux has made fundamental contributions to the field of multi-speaker speech processing, especially to the areas of speech separation and multi-speaker end-to-end automatic speech recognition (ASR). His contributions constituted a major advance in realizing a practically usable solution to the cocktail party problem, enabling machines to replicate humans’ ability to concentrate on a specific sound source, such as a certain speaker within a complex acoustic scene—a long-standing challenge in the speech signal processing community. Additionally, he has made key contributions to the measures used for training and evaluating audio source separation methods, developing several new objective functions to improve the training of deep neural networks for speech enhancement, and analyzing the impact of metrics used to evaluate the signal reconstruction quality. Dr. Le Roux’s technical contributions have been crucial in promoting the widespread adoption of multi-speaker separation and end-to-end ASR technologies across various applications, including smart speakers, teleconferencing systems, hearables, and mobile devices.
IEEE Fellow is the highest grade of membership of the IEEE. It honors members with an outstanding record of technical achievements, contributing importantly to the advancement or application of engineering, science and technology, and bringing significant value to society. Each year, following a rigorous evaluation procedure, the IEEE Fellow Committee recommends a select group of recipients for elevation to IEEE Fellow. Less than 0.1% of voting members are selected annually for this member grade elevation.
- MERL Distinguished Scientist and Speech & Audio Senior Team Leader Jonathan Le Roux has been elevated to IEEE Fellow, effective January 2024, "for contributions to multi-source speech and audio processing."
-
AWARD Honorable Mention Award at NeurIPS 23 Instruction Workshop Date: December 15, 2023
Awarded to: Lingfeng Sun, Devesh K. Jha, Chiori Hori, Siddharth Jain, Radu Corcodel, Xinghao Zhu, Masayoshi Tomizuka and Diego Romeres
MERL Contacts: Radu Corcodel; Chiori Hori; Siddarth Jain; Devesh K. Jha; Diego Romeres
Research Areas: Artificial Intelligence, Machine Learning, RoboticsBrief- MERL Researchers received an "Honorable Mention award" at the Workshop on Instruction Tuning and Instruction Following at the NeurIPS 2023 conference in New Orleans. The workshop was on the topic of instruction tuning and Instruction following for Large Language Models (LLMs). MERL researchers presented their work on interactive planning using LLMs for partially observable robotic tasks during the oral presentation session at the workshop.
See All Awards for Artificial Intelligence -
-
News & Events
-
TALK [MERL Seminar Series 2024] Zhaojian Li presents talk titled A Multi-Arm Robotic System for Robotic Apple Harvesting Date & Time: Wednesday, October 2, 2024; 1:00 PM
Speaker: Zhaojian Li, Mivchigan State University
MERL Host: Yebin Wang
Research Areas: Artificial Intelligence, Computer Vision, Control, RoboticsAbstract- Harvesting labor is the single largest cost in apple production in the U.S. Surging cost and growing shortage of labor has forced the apple industry to seek automated harvesting solutions. Despite considerable progress in recent years, the existing robotic harvesting systems still fall short of performance expectations, lacking robustness and proving inefficient or overly complex for practical commercial deployment. In this talk, I will present the development and evaluation of a new dual-arm robotic apple harvesting system. This work is a result of a continuous collaboration between Michigan State University and U.S. Department of Agriculture.
-
TALK [MERL Seminar Series 2024] Tom Griffiths presents talk titled Tools from cognitive science to understand the behavior of large language models Date & Time: Wednesday, September 18, 2024; 1:00 PM
Speaker: Tom Griffiths, Princeton University
MERL Host: Mouhacine Benosman
Research Areas: Artificial Intelligence, Data Analytics, Machine Learning, Human-Computer InteractionAbstract- Large language models have been found to have surprising capabilities, even what have been called “sparks of artificial general intelligence.” However, understanding these models involves some significant challenges: their internal structure is extremely complicated, their training data is often opaque, and getting access to the underlying mechanisms is becoming increasingly difficult. As a consequence, researchers often have to resort to studying these systems based on their behavior. This situation is, of course, one that cognitive scientists are very familiar with — human brains are complicated systems trained on opaque data and typically difficult to study mechanistically. In this talk I will summarize some of the tools of cognitive science that are useful for understanding the behavior of large language models. Specifically, I will talk about how thinking about different levels of analysis (and Bayesian inference) can help us understand some behaviors that don’t seem particularly intelligent, how tasks like similarity judgment can be used to probe internal representations, how axiom violations can reveal interesting mechanisms, and how associations can reveal biases in systems that have been trained to be unbiased.
See All News & Events for Artificial Intelligence -
-
Research Highlights
-
PS-NeuS: A Probability-guided Sampler for Neural Implicit Surface Rendering -
TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models -
Gear-NeRF: Free-Viewpoint Rendering and Tracking with Motion-Aware Spatio-Temporal Sampling -
Steered Diffusion -
Robust Machine Learning -
mmWave Beam-SNR Fingerprinting (mmBSF) -
Video Anomaly Detection -
Biosignal Processing for Human-Machine Interaction
-
-
Internships
-
SA0041: Internship - Audio separation, generation, and analysis
We are seeking graduate students interested in helping advance the fields of generative audio, source separation, speech enhancement, spatial audio, and robust ASR in challenging multi-source and far-field scenarios. The interns will collaborate with MERL researchers to derive and implement new models and optimization methods, conduct experiments, and prepare results for publication. Internships regularly lead to one or more publications in top-tier venues, which can later become part of the intern's doctoral work. The ideal candidates are senior Ph.D. students with experience in some of the following: audio signal processing, microphone array processing, spatial audio reproduction, probabilistic modeling, deep generative modeling, and physics informed machine learning techniques (e.g., neural fields, PINNs, sound field and reverberation modeling). Multiple positions are available with flexible start dates (not just Spring/Summer but throughout 2025) and duration (typically 3-6 months).
-
SA0040: Internship - Sound event and anomaly detection
We are seeking graduate students interested in helping advance the fields of sound event detection/localization, anomaly detection, and physics informed deep learning for machine sounds. The interns will collaborate with MERL researchers to derive and implement novel algorithms, record data, conduct experiments, integrate audio signals with other sensors (electrical, vision, vibration, etc.), and prepare results for publication. Internships regularly lead to one or more publications in top-tier venues, which can later become part of the intern's doctoral work. The ideal candidates are senior Ph.D. students with experience in some of the following: audio signal processing, microphone array processing, physics informed machine learning, outlier detection, and unsupervised learning. Multiple positions are available with flexible start dates (not just Spring/Summer but throughout 2025) and duration (typically 3-6 months).
-
SA0045: Internship - Universal Audio Compression and Generation
We are seeking graduate students interested in helping advance the fields of universal audio compression and generation. We aim to build a single generative model that can perform multiple audio generation tasks conditioned on multimodal context. The interns will collaborate with MERL researchers to derive and implement new models and optimization methods, conduct experiments, and prepare results for publication. Internships regularly lead to one or more publications in top-tier venues, which can later become part of the intern's doctoral work. The ideal candidates are Ph.D. students with experience in some of the following: deep generative modeling, large language models, neural audio codecs. The internship typically lasts 3-6 months.
See All Internships for Artificial Intelligence -
-
Recent Publications
- "Equivariant Spatio-Temporal Self-Supervision for LiDAR Object Detection", European Conference on Computer Vision (ECCV), September 2024.BibTeX TR2024-130 PDF Video Presentation
- @inproceedings{Hegde2024sep,
- author = {{Hegde, Deepti and Lohit, Suhas and Peng, Kuan-Chuan and Jones, Michael J. and Patel, Vishal M.}},
- title = {Equivariant Spatio-Temporal Self-Supervision for LiDAR Object Detection},
- booktitle = {European Conference on Computer Vision (ECCV)},
- year = 2024,
- month = sep,
- url = {https://www.merl.com/publications/TR2024-130}
- }
, - "A Probability-guided Sampler for Neural Implicit Surface Rendering", European Conference on Computer Vision (ECCV), September 2024.BibTeX TR2024-129 PDF
- @inproceedings{Pais2024sep,
- author = {Pais, Goncalo and Piedade, Valter and Chatterjee, Moitreya and Greiff, Marcus and Miraldo, Pedro}},
- title = {A Probability-guided Sampler for Neural Implicit Surface Rendering},
- booktitle = {European Conference on Computer Vision (ECCV)},
- year = 2024,
- month = sep,
- url = {https://www.merl.com/publications/TR2024-129}
- }
, - "TF-Locoformer: Transformer with Local Modeling by Convolution for Speech Separation and Enhancement", International Workshop on Acoustic Signal Enhancement (IWAENC), September 2024.BibTeX TR2024-126 PDF Software
- @inproceedings{Saijo2024sep2,
- author = {Saijo, Kohei and Wichern, Gordon and Germain, François G and Pan, Zexu and Le Roux, Jonathan}},
- title = {TF-Locoformer: Transformer with Local Modeling by Convolution for Speech Separation and Enhancement},
- booktitle = {International Workshop on Acoustic Signal Enhancement (IWAENC)},
- year = 2024,
- month = sep,
- url = {https://www.merl.com/publications/TR2024-126}
- }
, - "Few-shot Transparent Instance Segmentation for Bin Picking", IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), September 2024.BibTeX TR2024-127 PDF
- @inproceedings{Cherian2024sep,
- author = {Cherian, Anoop and Jain, Siddarth and Marks, Tim K.}},
- title = {Few-shot Transparent Instance Segmentation for Bin Picking},
- booktitle = {IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
- year = 2024,
- month = sep,
- url = {https://www.merl.com/publications/TR2024-127}
- }
, - "Disentangled Acoustic Fields For Multimodal Physical Scene Understanding", IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), September 2024.BibTeX TR2024-125 PDF
- @inproceedings{Yin2024sep,
- author = {Yin, Jie and Luo, Andrew and Du, Yilun and Cherian, Anoop and Marks, Tim K. and Le Roux, Jonathan and Gan, Chuang}},
- title = {Disentangled Acoustic Fields For Multimodal Physical Scene Understanding},
- booktitle = {IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
- year = 2024,
- month = sep,
- url = {https://www.merl.com/publications/TR2024-125}
- }
, - "Speech Dereverberation Constrained on Room Impulse Response Characteristics", Interspeech, DOI: 10.21437/Interspeech.2024-1173, September 2024, pp. 622-626.BibTeX TR2024-121 PDF
- @inproceedings{Bahrman2024sep,
- author = {Bahrman, Louis and Fontaine, Mathieu and Le Roux, Jonathan and Richard, Gaël}},
- title = {Speech Dereverberation Constrained on Room Impulse Response Characteristics},
- booktitle = {Interspeech},
- year = 2024,
- pages = {622--626},
- month = sep,
- doi = {10.21437/Interspeech.2024-1173},
- issn = {2958-1796},
- url = {https://www.merl.com/publications/TR2024-121}
- }
, - "Sound Event Bounding Boxes", Interspeech, DOI: 10.21437/Interspeech.2024-2075, September 2024, pp. 562-566.BibTeX TR2024-118 PDF Software
- @inproceedings{Ebbers2024sep,
- author = {Ebbers, Janek and Germain, François G and Wichern, Gordon and Le Roux, Jonathan}},
- title = {Sound Event Bounding Boxes},
- booktitle = {Interspeech},
- year = 2024,
- pages = {562--566},
- month = sep,
- doi = {10.21437/Interspeech.2024-2075},
- issn = {2958-1796},
- url = {https://www.merl.com/publications/TR2024-118}
- }
, - "ZeroST: Zero-Shot Speech Translation", Interspeech, DOI: 10.21437/Interspeech.2024-1088, September 2024, pp. 392-396.BibTeX TR2024-122 PDF
- @inproceedings{Khurana2024sep,
- author = {Khurana, Sameer and Hori, Chiori and Laurent, Antoine and Wichern, Gordon and Le Roux, Jonathan}},
- title = {ZeroST: Zero-Shot Speech Translation},
- booktitle = {Interspeech},
- year = 2024,
- pages = {392--396},
- month = sep,
- doi = {10.21437/Interspeech.2024-1088},
- issn = {2958-1796},
- url = {https://www.merl.com/publications/TR2024-122}
- }
,
- "Equivariant Spatio-Temporal Self-Supervision for LiDAR Object Detection", European Conference on Computer Vision (ECCV), September 2024.
-
Videos
-
Software & Data Downloads
-
DeepBornFNO -
Transformer-based model with LOcal-modeling by COnvolution -
Sound Event Bounding Boxes -
Enhanced Reverberation as Supervision -
Gear Extensions of Neural Radiance Fields -
Long-Tailed Anomaly Detection (LTAD) Dataset -
neural-IIR-field -
Target-Speaker SEParation -
Pixel-Grounded Prototypical Part Networks -
Steered Diffusion -
Hyperbolic Audio Source Separation -
Simple Multimodal Algorithmic Reasoning Task Dataset -
Partial Group Convolutional Neural Networks -
SOurce-free Cross-modal KnowledgE Transfer -
Audio-Visual-Language Embodied Navigation in 3D Environments -
Nonparametric Score Estimators -
3D MOrphable STyleGAN -
Instance Segmentation GAN -
Audio Visual Scene-Graph Segmentor -
Generalized One-class Discriminative Subspaces -
Goal directed RL with Safety Constraints -
Hierarchical Musical Instrument Separation -
Generating Visual Dynamics from Sound and Context -
Adversarially-Contrastive Optimal Transport -
Online Feature Extractor Network -
MotionNet -
FoldingNet++ -
Quasi-Newton Trust Region Policy Optimization -
Landmarks’ Location, Uncertainty, and Visibility Likelihood -
Robust Iterative Data Estimation -
Gradient-based Nikaido-Isoda -
Discriminative Subspace Pooling
-