Artificial Intelligence
Making machines smarter for improved safety, efficiency and comfort.
Our AI research encompasses advances in computer vision, speech and audio processing, as well as data analytics. Key research themes include improved perception based on machine learning techniques, learning control policies through model-based reinforcement learning, as well as cognition and reasoning based on learned semantic representations. We apply our work to a broad range of automotive and robotics applications, as well as building and home systems.
Quick Links
-
Researchers

Jonathan
Le Roux

Toshiaki
Koike-Akino

Ye
Wang

Gordon
Wichern

Anoop
Cherian

Tim K.
Marks

Chiori
Hori

Michael J.
Jones

Jing
Liu

Kieran
Parsons

Suhas
Lohit

Daniel N.
Nikovski

Yoshiki
Masuyama

Kuan-Chuan
Peng

Matthew
Brand

Pu
(Perry)
Wang
Moitreya
Chatterjee

Philip V.
Orlik

Siddarth
Jain

Hassan
Mansour

Petros T.
Boufounos

Radu
Corcodel

Pedro
Miraldo

William S.
Yerazunis

Christoph
Boeddeker

Yebin
Wang

Jianlin
Guo

Arvind
Raghunathan

Hongbo
Sun

Stefano
Di Cairano

Chungwei
Lin

Yanting
Ma

Saviz
Mowlavi

Bingnan
Wang

Takahiro
Edo

Christopher R.
Laughman

Lalit
Manam

Julius
Richter

Alexander
Schperberg

Anthony
Vetro

Jinyun
Zhang

Vedang M.
Deshpande

Kaen
Kogashi

Dehong
Liu

Kei
Suzuki

Abraham P.
Vinod

Kenji
Inomata
-
Awards
-
AWARD MERL team wins the Generative Data Augmentation of Room Acoustics (GenDARA) 2025 Challenge Date: April 7, 2025
Awarded to: Christopher Ick, Gordon Wichern, Yoshiki Masuyama, François G. Germain, and Jonathan Le Roux
MERL Contacts: Jonathan Le Roux; Yoshiki Masuyama; Gordon Wichern
Research Areas: Artificial Intelligence, Machine Learning, Speech & AudioBrief- MERL's Speech & Audio team ranked 1st out of 3 teams in the Generative Data Augmentation of Room Acoustics (GenDARA) 2025 Challenge, which focused on “generating room impulse responses (RIRs) to supplement a small set of measured examples and using the augmented data to train speaker distance estimation (SDE) models". The team was led by MERL intern Christopher Ick, and also included Gordon Wichern, Yoshiki Masuyama, François G. Germain, and Jonathan Le Roux.
The GenDARA Challenge was organized as part of the Generative Data Augmentation (GenDA) workshop at the 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2025), and held on April 7, 2025 in Hyderabad, India. Yoshiki Masuyama presented the team's method, "Data Augmentation Using Neural Acoustic Fields With Retrieval-Augmented Pre-training".
The GenDARA challenge aims to promote the use of generative AI to synthesize RIRs from limited room data, as collecting or simulating RIR datasets at scale remains a significant challenge due to high costs and trade-offs between accuracy and computational efficiency. The challenge asked participants to first develop RIR generation systems capable of expanding a sparse set of labeled room impulse responses by generating RIRs at new source–receiver positions. They were then tasked with using this augmented dataset to train speaker distance estimation systems. Ranking was determined by the overall performance on the downstream SDE task. MERL’s approach to the GenDARA challenge centered on a geometry-aware neural acoustic field model that was first pre-trained on a large external RIR dataset to learn generalizable mappings from 3D room geometry to room impulse responses. For each challenge room, the model was then adapted or fine-tuned using the small number of provided RIRs, enabling high-fidelity generation of RIRs at unseen source–receiver locations. These augmented RIR sets were subsequently used to train the SDE system, improving speaker distance estimation by providing richer and more diverse acoustic training data.
- MERL's Speech & Audio team ranked 1st out of 3 teams in the Generative Data Augmentation of Room Acoustics (GenDARA) 2025 Challenge, which focused on “generating room impulse responses (RIRs) to supplement a small set of measured examples and using the augmented data to train speaker distance estimation (SDE) models". The team was led by MERL intern Christopher Ick, and also included Gordon Wichern, Yoshiki Masuyama, François G. Germain, and Jonathan Le Roux.
-
AWARD MERL Wins Awards at NeurIPS LLM Privacy Challenge Date: December 15, 2024
Awarded to: Jing Liu, Ye Wang, Toshiaki Koike-Akino, Tsunato Nakai, Kento Oonishi, Takuya Higashi
MERL Contacts: Toshiaki Koike-Akino; Jing Liu; Ye Wang
Research Areas: Artificial Intelligence, Machine Learning, Information SecurityBrief- The Mitsubishi Electric Privacy Enhancing Technologies (MEL-PETs) team, consisting of a collaboration of MERL and Mitsubishi Electric researchers, won awards at the NeurIPS 2024 Large Language Model (LLM) Privacy Challenge. In the Blue Team track of the challenge, we won the 3rd Place Award, and in the Red Team track, we won the Special Award for Practical Attack.
-
AWARD MERL team wins the Listener Acoustic Personalisation (LAP) 2024 Challenge Date: August 29, 2024
Awarded to: Yoshiki Masuyama, Gordon Wichern, Francois G. Germain, Christopher Ick, and Jonathan Le Roux
MERL Contacts: Jonathan Le Roux; Gordon Wichern; Yoshiki Masuyama
Research Areas: Artificial Intelligence, Machine Learning, Speech & AudioBrief- MERL's Speech & Audio team ranked 1st out of 7 teams in Task 2 of the 1st SONICOM Listener Acoustic Personalisation (LAP) Challenge, which focused on "Spatial upsampling for obtaining a high-spatial-resolution HRTF from a very low number of directions". The team was led by Yoshiki Masuyama, and also included Gordon Wichern, Francois Germain, MERL intern Christopher Ick, and Jonathan Le Roux.
The LAP Challenge workshop and award ceremony was hosted by the 32nd European Signal Processing Conference (EUSIPCO 24) on August 29, 2024 in Lyon, France. Yoshiki Masuyama presented the team's method, "Retrieval-Augmented Neural Field for HRTF Upsampling and Personalization", and received the award from Prof. Michele Geronazzo (University of Padova, IT, and Imperial College London, UK), Chair of the Challenge's Organizing Committee.
The LAP challenge aims to explore challenges in the field of personalized spatial audio, with the first edition focusing on the spatial upsampling and interpolation of head-related transfer functions (HRTFs). HRTFs with dense spatial grids are required for immersive audio experiences, but their recording is time-consuming. Although HRTF spatial upsampling has recently shown remarkable progress with approaches involving neural fields, HRTF estimation accuracy remains limited when upsampling from only a few measured directions, e.g., 3 or 5 measurements. The MERL team tackled this problem by proposing a retrieval-augmented neural field (RANF). RANF retrieves a subject whose HRTFs are close to those of the target subject at the measured directions from a library of subjects. The HRTF of the retrieved subject at the target direction is fed into the neural field in addition to the desired sound source direction. The team also developed a neural network architecture that can handle an arbitrary number of retrieved subjects, inspired by a multi-channel processing technique called transform-average-concatenate.
- MERL's Speech & Audio team ranked 1st out of 7 teams in Task 2 of the 1st SONICOM Listener Acoustic Personalisation (LAP) Challenge, which focused on "Spatial upsampling for obtaining a high-spatial-resolution HRTF from a very low number of directions". The team was led by Yoshiki Masuyama, and also included Gordon Wichern, Francois Germain, MERL intern Christopher Ick, and Jonathan Le Roux.
See All Awards for Artificial Intelligence -
-
News & Events
-
NEWS MERL Presents 4 Main Conference Papers and 6 Workshop Papers at ICML 2026 Date: July 6, 2026 - July 11, 2026
Where: COEX, Seoul, South Korea
MERL Contacts: Moitreya Chatterjee; Anoop Cherian; Stefano Di Cairano; Toshiaki Koike-Akino; Christopher R. Laughman; Jing Liu; Suhas Lohit; Kuan-Chuan Peng; Alexander Schperberg; Ye Wang; Gordon Wichern
Research Areas: Artificial Intelligence, Computer Vision, Machine Learning, Signal ProcessingBrief- MERL researchers are proud to present 4 main conference papers and 6 workshop papers at ICML 2026. ICML, taking place from July 6-11 in Seoul, South Korea, is a premier international conference in machine learning.
Main Conference Papers with MERL Authors:
1. Understanding Dynamic Compute Allocation in Recurrent Transformers by Ibraheem Muhammad Moosa, Suhas Lohit, Ye Wang, Moitreya Chatterjee, and Wenpeng Yin.
2. LLawCo: Learning Laws of Cooperation for Modeling Embodied Multi-Agent Behavior by Qinhong Zhou, Chuang Gan, and Anoop Cherian.
3. Memory-Distilled Selection for Noise-Robust Anomaly Detection by Sirojbek Safarov, Jaewoo Park, Yoon G. Jung, Kuan-Chuan Peng, Wonchul Kim, Seongdeok Bang, and Octavia Camps.
4. Partial Ring Scan: Revisiting Scan Order in Vision State Space Models by Yi-Kuan Hsieh, Kuan-Chuan Peng, Xin Li, Ming-Ching Chang, Yu-Chee Tseng, and Jun-Wei Hsieh.
Workshop Papers with MERL Authors:
1. WISE: Weighted Iterative Society-of-Experts for Multimodal Multi-Agent Debate with Probabilistic Consensus by Anoop Cherian, Suhas Lohit, and Kuan-Chuan Peng. (Workshop on Scalable Learning and Optimization for Efficient Multimodal AI Agents (SCALE))
2. MIRROR: Multisensory Implicit Rejection-sampled RObotic policy by Amisha Bhaskar, Pratap Tokekar, Stefano Di Cairano, and Alexander Schperberg. (Workshop on Structured Probabilistic Inference & Generative Modeling)
3. Reinforced Neural Processes: Memory-Efficient Time-Series Forecasting with a World-Feedback-Trained Memory Policy by Nibraas Khan, Gordon Wichern, and Christopher R. Laughman. (Workshop on Reinforcement Learning from World Feedback (RLxF))
4. Connecting Low-Rank Adapters and Policy Stability in GRPO Fine-Tuning by Antonin Rottman, Francesco Tonin, Yongtao Wu, Toshiaki Koike-Akino, and Volkan Cevher. (Workshop on Connecting Low-rank Representations in AI (CoLorAI))
5. EinSort: Sorting is All We Need for Tensorizing LLM by Toshiaki Koike-Akino, Jing Liu, and Ye Wang. (Workshop on Connecting Low-rank Representations in AI (CoLorAI))
6. Temper and Tilt Lead to SLOP: Reward Hacking Mitigation with Inference-Time Alignment by Ye Wang, and Jing Liu, and Toshiaki Koike-Akino. (Workshop on Agents in the Wild: Safety, Security, and Beyond)
- MERL researchers are proud to present 4 main conference papers and 6 workshop papers at ICML 2026. ICML, taking place from July 6-11 in Seoul, South Korea, is a premier international conference in machine learning.
-
NEWS MERL researchers present 9 papers at IEEE ICRA 2026 Date: June 1, 2026 - June 5, 2026
Where: Vienna, Austria
MERL Contacts: Radu Corcodel; Stefano Di Cairano; Purnanand Elango; Siddarth Jain; Alexander Schperberg; Kento Tomita
Research Areas: Artificial Intelligence, Computer Vision, Control, Dynamical Systems, Machine Learning, Optimization, RoboticsBrief- MERL researchers presented nine papers at the recently concluded IEEE International Conference on Robotics and Automation (ICRA) 2026 in Vienna, Austria. The papers covered a broad set of topics in robotics, including robot perception, visuo-tactile sensing, contact and pose estimation, manipulation, reinforcement learning, diffusion policies, loco-manipulation, contact-implicit trajectory optimization, legged locomotion, localization, and perception-aware planning.
IEEE ICRA is the flagship conference of the IEEE Robotics and Automation Society and the world’s largest and most comprehensive technical conference focused on research advances and the latest technological developments in robotics. The event attracts nearly 8,000 participants and receives more than 5,000 paper submissions.
- MERL researchers presented nine papers at the recently concluded IEEE International Conference on Robotics and Automation (ICRA) 2026 in Vienna, Austria. The papers covered a broad set of topics in robotics, including robot perception, visuo-tactile sensing, contact and pose estimation, manipulation, reinforcement learning, diffusion policies, loco-manipulation, contact-implicit trajectory optimization, legged locomotion, localization, and perception-aware planning.
See All News & Events for Artificial Intelligence -
-
Research Highlights
-
LLawCo: Learning Laws of Cooperation for Modeling Embodied Multi-Agent Behavior -
Point4Cast: Streaming Dynamic Scene Reconstruction and Forecasting -
AssemblyBench: Physics-Aware Assembly of Complex Industrial Objects -
SLAM-MER: Revisiting Monocular SLAM with Spatio-Temporal Scene Modeling -
Parallel Rigidity Matters for Bundle Adjustment -
LLMPhy: Parameter-Identifiable Physical Reasoning Combining Large Language Models and Physics Engines -
PS-NeuS: A Probability-guided Sampler for Neural Implicit Surface Rendering -
Quantum AI Technology -
TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models -
Gear-NeRF: Free-Viewpoint Rendering and Tracking with Motion-Aware Spatio-Temporal Sampling -
Private, Secure, and Reliable Artificial Intelligence -
Steered Diffusion -
Sustainable AI -
Robust Machine Learning -
mmWave Beam-SNR Fingerprinting (mmBSF) -
Video Anomaly Detection -
Biosignal Processing for Human-Machine Interaction -
Task-aware Unified Source Separation - Audio Examples
-
-
Internships
-
EA0234: Internship - Multi-modal sensor fusion for predictive maintenance
-
SA0191: Internship - Human-Robot Interaction Based on Multimodal Scene Understanding
-
CV0101: Internship - Multimodal Algorithmic Reasoning
See All Internships for Artificial Intelligence -
-
Openings
-
CI0177: Postdoctoral Research Fellow - Agentic AI
-
SA0297: Postdoctoral Research Fellow - AI for Science
See All Openings at MERL -
-
Recent Publications
- , "Temper and Tilt Lead to SLOP: Reward Hacking Mitigation with Inference-Time Alignment", International Conference on Machine Learning (ICML) Workshop on Agents in the Wild: Safety, Security, and Beyond, July 2026.BibTeX TR2026-094 PDF Presentation
- @inproceedings{Wang2026jul,
- author = {{Wang, Ye and Liu, Jing and Koike-Akino, Toshiaki}},
- title = {{Temper and Tilt Lead to SLOP: Reward Hacking Mitigation with Inference-Time Alignment}},
- booktitle = {International Conference on Machine Learning (ICML) Workshop on Agents in the Wild: Safety, Security, and Beyond},
- year = 2026,
- month = jul,
- url = {https://www.merl.com/publications/TR2026-094}
- }
- , "EinSort: Sorting is All We Need for Tensorizing LLM", International Conference on Machine Learning (ICML) Workshop, July 2026.BibTeX TR2026-093 PDF Presentation
- @inproceedings{Koike-Akino2026jul,
- author = {{Koike-Akino, Toshiaki and Liu, Jing and Wang, Ye}},
- title = {{EinSort: Sorting is All We Need for Tensorizing LLM}},
- booktitle = {International Conference on Machine Learning (ICML) Workshop},
- year = 2026,
- month = jul,
- url = {https://www.merl.com/publications/TR2026-093}
- }
- , "Connecting Low-Rank Adapters and Policy Stability in GRPO Fine-Tuning", International Conference on Machine Learning (ICML) Workshop, July 2026.BibTeX TR2026-092 PDF
- @inproceedings{Rottman2026jul,
- author = {Rottman, Antonin and Tonin, Francesco and Wu, Yongtao and Koike-Akino, Toshiaki and Cevher, Volkan},
- title = {{Connecting Low-Rank Adapters and Policy Stability in GRPO Fine-Tuning}},
- booktitle = {International Conference on Machine Learning (ICML) Workshop},
- year = 2026,
- month = jul,
- url = {https://www.merl.com/publications/TR2026-092}
- }
- , "MIRROR: Multisensory Implicit Rejection-sampled RObotic policy", ICML 2026 Workshop on Structured Probabilistic Inference & Generative Modeling, July 2026.BibTeX TR2026-096 PDF
- @inproceedings{Bhaskar2026jul,
- author = {Bhaskar, Amisha and Tokekar, Pratap and {Di Cairano}, Stefano and Schperberg, Alexander},
- title = {{MIRROR: Multisensory Implicit Rejection-sampled RObotic policy}},
- booktitle = {ICML 2026 Workshop on Structured Probabilistic Inference \& Generative Modeling},
- year = 2026,
- month = jul,
- url = {https://www.merl.com/publications/TR2026-096}
- }
- , "Partial Ring Scan: Revisiting Scan Order in Vision State Space Models", International Conference on Machine Learning (ICML), July 2026.BibTeX TR2026-091 PDF
- @inproceedings{Hsieh2026jul,
- author = {Hsieh, Yi-Kuan and Peng, Kuan-Chuan and Li, Xin and Chang, Ming-Ching and Tseng, Yu-Chee and Hsieh, Jun-Wei},
- title = {{Partial Ring Scan: Revisiting Scan Order in Vision State Space Models}},
- booktitle = {International Conference on Machine Learning (ICML)},
- year = 2026,
- month = jul,
- url = {https://www.merl.com/publications/TR2026-091}
- }
- , "Reinforced Neural Processes: Memory-Efficient Time-Series Forecasting with a World-Feedback-Trained Memory Policy", ICML Workshop on Reinforcement Learning from World Feedback (RLxF), July 2026.BibTeX TR2026-095 PDF
- @inproceedings{Khan2026jul,
- author = {Khan, Nibraas and Wichern, Gordon and Laughman, Christopher R.},
- title = {{Reinforced Neural Processes: Memory-Efficient Time-Series Forecasting with a World-Feedback-Trained Memory Policy}},
- booktitle = {ICML Workshop on Reinforcement Learning from World Feedback (RLxF)},
- year = 2026,
- month = jul,
- url = {https://www.merl.com/publications/TR2026-095}
- }
- , "Understanding Dynamic Compute Allocation in Recurrent Transformers", International Conference on Machine Learning (ICML), July 2026.BibTeX TR2026-090 PDF Software Presentation
- @inproceedings{Moosa2026jul,
- author = {{Moosa, Ibraheem Muhammad and Lohit, Suhas and Wang, Ye and Chatterjee, Moitreya and Yin, Wenpeng}},
- title = {{Understanding Dynamic Compute Allocation in Recurrent Transformers}},
- booktitle = {International Conference on Machine Learning (ICML)},
- year = 2026,
- month = jul,
- url = {https://www.merl.com/publications/TR2026-090}
- }
- , "Memory-Distilled Selection for Noise-Robust Anomaly Detection", International Conference on Machine Learning (ICML), July 2026.BibTeX TR2026-089 PDF
- @inproceedings{Safarov2026jul,
- author = {{Safarov, Sirojbek and Park, Jaewoo and Jung, Yoon G. and Peng, Kuan-Chuan and Kim, Wonchul and Bang, Seongdeok and Camps, Octavia}},
- title = {{Memory-Distilled Selection for Noise-Robust Anomaly Detection}},
- booktitle = {International Conference on Machine Learning (ICML)},
- year = 2026,
- month = jul,
- url = {https://www.merl.com/publications/TR2026-089}
- }
- , "Temper and Tilt Lead to SLOP: Reward Hacking Mitigation with Inference-Time Alignment", International Conference on Machine Learning (ICML) Workshop on Agents in the Wild: Safety, Security, and Beyond, July 2026.
-
Videos
-
Software & Data Downloads
-
Understanding Dynamic Compute Allocation in Recurrent Transformers -
Physics-Aware Assembly of Complex Industrial Objects -
Mitsubishi Electric Research framework for visual SLAM -
Parameter-Identifiable Physical Reasoning Combining Large Language Models and Physics Engines -
MMHOI Dataset: Modeling Complex 3D Multi-Human Multi-Object Interactions -
Embracing Cacophony -
Subject- and Dataset-Aware Neural Field for HRTF Modeling -
Open Vocabulary Attribute Detection Dataset -
Long-Tailed Online Anomaly Detection dataset -
Group Representation Networks -
Task-Aware Unified Source Separation -
Local Density-Based Anomaly Score Normalization for Domain Generalization -
Retrieval-Augmented Neural Field for HRTF Upsampling and Personalization -
Self-Monitored Inference-Time INtervention for Generative Music Transformers -
MEL-PETs Defense for LLM Privacy Challenge -
MEL-PETs Joint-Context Attack for LLM Privacy Challenge -
Transformer-based model with LOcal-modeling by COnvolution -
Sound Event Bounding Boxes -
Enhanced Reverberation as Supervision -
Zero-Shot Image Conditioning for Text-to-Video Diffusion Models -
Gear Extensions of Neural Radiance Fields -
Long-Tailed Anomaly Detection Dataset -
Neural IIR Filter Field for HRTF Upsampling and Personalization -
Target-Speaker SEParation -
Pixel-Grounded Prototypical Part Networks -
Steered Diffusion -
Learned Born Operator for Reflection Tomographic Imaging -
Hyperbolic Audio Source Separation -
Simple Multimodal Algorithmic Reasoning Task Dataset -
Partial Group Convolutional Neural Networks -
SOurce-free Cross-modal KnowledgE Transfer -
Audio-Visual-Language Embodied Navigation in 3D Environments -
Nonparametric Score Estimators -
3D MOrphable STyleGAN -
Instance Segmentation GAN -
Audio Visual Scene-Graph Segmentor -
Generalized One-class Discriminative Subspaces -
Goal directed RL with Safety Constraints -
Hierarchical Musical Instrument Separation -
Generating Visual Dynamics from Sound and Context -
Adversarially-Contrastive Optimal Transport -
Online Feature Extractor Network -
MotionNet -
FoldingNet++ -
Quasi-Newton Trust Region Policy Optimization -
Landmarks’ Location, Uncertainty, and Visibility Likelihood -
Robust Iterative Data Estimation -
Gradient-based Nikaido-Isoda -
Discriminative Subspace Pooling
-