Machine Learning
Data-driven approaches to design intelligent algorithms.
MERL has a long history of research activity in machine learning, including the development of various boosting algorithms and contributing to the theory and practice of highly scalable collaborative filtering. Our recent work has focused on deep learning and reinforcement learning, with application to a wide range of applications including automotive, robotics, factory automation, transportation, as well as building and home systems.
Quick Links
-
Researchers

Toshiaki
Koike-Akino

Ye
Wang

Jonathan
Le Roux

Gordon
Wichern

Anoop
Cherian

Tim K.
Marks

Michael J.
Jones

Pu
(Perry)
Wang
Kieran
Parsons

Christopher R.
Laughman

Stefano
Di Cairano

Philip V.
Orlik

Daniel N.
Nikovski

Diego
Romeres

Chiori
Hori

Suhas Anand
Lohit

Jing
Liu

Bingnan
Wang

Yebin
Wang

Hassan
Mansour

Matthew
Brand

Petros T.
Boufounos

Kuan-Chuan
Peng

Moitreya
Chatterjee

Yoshiki
Masuyama

Abraham P.
Vinod

Arvind
Raghunathan

Vedang M.
Deshpande

Jianlin
Guo

Siddarth
Jain

Pedro
Miraldo

Hongtao
Qiao

Scott A.
Bortoff

Saviz
Mowlavi

Radu
Corcodel

William S.
Yerazunis

Chungwei
Lin

Dehong
Liu

Hongbo
Sun

Joshua
Rapp

Wael H.
Ali

Yanting
Ma

Anthony
Vetro

Jinyun
Zhang

Christoph Benedikt Josef
Boeddeker

Purnanand
Elango

Abraham
Goldsmith

Zhaolin
Ren

Alexander
Schperberg

Avishai
Weiss

Kenji
Inomata

Kei
Suzuki
-
Awards
-
AWARD MERL team wins the Generative Data Augmentation of Room Acoustics (GenDARA) 2025 Challenge Date: April 7, 2025
Awarded to: Christopher Ick, Gordon Wichern, Yoshiki Masuyama, François G. Germain, and Jonathan Le Roux
MERL Contacts: Jonathan Le Roux; Yoshiki Masuyama; Gordon Wichern
Research Areas: Artificial Intelligence, Machine Learning, Speech & AudioBrief- MERL's Speech & Audio team ranked 1st out of 3 teams in the Generative Data Augmentation of Room Acoustics (GenDARA) 2025 Challenge, which focused on “generating room impulse responses (RIRs) to supplement a small set of measured examples and using the augmented data to train speaker distance estimation (SDE) models". The team was led by MERL intern Christopher Ick, and also included Gordon Wichern, Yoshiki Masuyama, François G. Germain, and Jonathan Le Roux.
The GenDARA Challenge was organized as part of the Generative Data Augmentation (GenDA) workshop at the 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2025), and held on April 7, 2025 in Hyderabad, India. Yoshiki Masuyama presented the team's method, "Data Augmentation Using Neural Acoustic Fields With Retrieval-Augmented Pre-training".
The GenDARA challenge aims to promote the use of generative AI to synthesize RIRs from limited room data, as collecting or simulating RIR datasets at scale remains a significant challenge due to high costs and trade-offs between accuracy and computational efficiency. The challenge asked participants to first develop RIR generation systems capable of expanding a sparse set of labeled room impulse responses by generating RIRs at new source–receiver positions. They were then tasked with using this augmented dataset to train speaker distance estimation systems. Ranking was determined by the overall performance on the downstream SDE task. MERL’s approach to the GenDARA challenge centered on a geometry-aware neural acoustic field model that was first pre-trained on a large external RIR dataset to learn generalizable mappings from 3D room geometry to room impulse responses. For each challenge room, the model was then adapted or fine-tuned using the small number of provided RIRs, enabling high-fidelity generation of RIRs at unseen source–receiver locations. These augmented RIR sets were subsequently used to train the SDE system, improving speaker distance estimation by providing richer and more diverse acoustic training data.
- MERL's Speech & Audio team ranked 1st out of 3 teams in the Generative Data Augmentation of Room Acoustics (GenDARA) 2025 Challenge, which focused on “generating room impulse responses (RIRs) to supplement a small set of measured examples and using the augmented data to train speaker distance estimation (SDE) models". The team was led by MERL intern Christopher Ick, and also included Gordon Wichern, Yoshiki Masuyama, François G. Germain, and Jonathan Le Roux.
-
AWARD Mitsubishi Electric Team Wins Awards at GalFer Contest Date: June 23, 2025
Awarded to: Bingnan Wang, Tatsuya Yamamoto, Yusuke Sakamoto, Siyuan Sun, Toshiaki Koike-Akino, and Ye Wang
MERL Contacts: Toshiaki Koike-Akino; Bingnan Wang; Ye Wang
Research Areas: Machine Learning, Multi-Physical Modeling, OptimizationBrief- The MELSUR (Mitsubishi Electric SURrogate) team, consisting of a group of MERL and Mitsubishi Electric researchers, ranked first in two out of three categories in the GalFer Contest.
The GalFer (Galileo Ferraris) contest aims to compare the accuracy and efficiency of data-driven methodologies for the multi-physics simulation of traction electric machines. A total of 26 teams worldwide participated in the contest, which consists of three categories. The MELSUR team, including MERL staff Bingnan Wang, Toshiaki Koike-Akino, Ye Wang, MERL intern Siyuan Sun, Mitsubishi Electric researchers Tatsuya Yamamoto and Yusuke Sakamoto, ranked first for the category of "Novelty" and "Interpolation". The results were announced during an award ceremony at the COMPUMAG 2025 conference in Naples, Italy.
- The MELSUR (Mitsubishi Electric SURrogate) team, consisting of a group of MERL and Mitsubishi Electric researchers, ranked first in two out of three categories in the GalFer Contest.
-
AWARD MERL work receives IEEE Transactions on Automation Science and Engineering Best New Application Paper Award from IEEE Robotics and Automation Society Date: May 19, 2025
Awarded to: Yehan Ma, Yebin Wang, Stefano Di Cairano, Toshiaki Koike-Akino, Jianlin Guo, Philip Orlik, Xinping Guan and Chenyang Lu
MERL Contacts: Stefano Di Cairano; Jianlin Guo; Toshiaki Koike-Akino; Philip V. Orlik; Yebin Wang
Research Areas: Communications, Control, Machine LearningBrief- The paper “Smart Actuation for End-Edge Industrial Control Systems”, co-authored by MERL intern Yehan Ma, MERL researchers Yebin Wang, Stefano Di Cairano, Toshiaki Koike-Akino, Jianlin Guo, and Philip Orlik, and academic collaborators Xinping Guan and Chenyang Lu, was recognized as the Best New Application Paper of the IEEE Transactions on Automation Science and Engineering (T-ASE), for "a new industrial automation solution that ensures safety operation through coordinated co-design of edge model predictive control and local actuation".
The award recognizes the best application paper published in T-ASE over the previous calendar year, for the significance of new applications, technical merit, originality, potential impact on the field, and clarity of presentation.
- The paper “Smart Actuation for End-Edge Industrial Control Systems”, co-authored by MERL intern Yehan Ma, MERL researchers Yebin Wang, Stefano Di Cairano, Toshiaki Koike-Akino, Jianlin Guo, and Philip Orlik, and academic collaborators Xinping Guan and Chenyang Lu, was recognized as the Best New Application Paper of the IEEE Transactions on Automation Science and Engineering (T-ASE), for "a new industrial automation solution that ensures safety operation through coordinated co-design of edge model predictive control and local actuation".
See All Awards for Machine Learning -
-
News & Events
-
NEWS MERL Researchers at NeurIPS 2025 presented 2 conference papers, 5 workshop papers, and organized a workshop. Date: December 2, 2025 - December 7, 2025
Where: San Diego
MERL Contacts: Petros T. Boufounos; Anoop Cherian; Radu Corcodel; Stefano Di Cairano; Chiori Hori; Christopher R. Laughman; Suhas Anand Lohit; Pedro Miraldo; Saviz Mowlavi; Kuan-Chuan Peng; Arvind Raghunathan; Diego Romeres; Yuki Shirai; Abraham P. Vinod; Pu (Perry) Wang
Research Areas: Artificial Intelligence, Computational Sensing, Computer Vision, Control, Data Analytics, Dynamical Systems, Machine Learning, Multi-Physical Modeling, Optimization, Robotics, Signal Processing, Speech & AudioBrief- MERL researchers presented 2 main-conference papers and 5 workshop papers, as well as organized a workshop, at NeurIPS 2025.
Main Conference Papers:
1) Sorachi Kato, Ryoma Yataka, Pu Wang, Pedro Miraldo, Takuya Fujihashi, and Petros Boufounos, "RAPTR: Radar-based 3D Pose Estimation using Transformer", Code available at: https://github.com/merlresearch/radar-pose-transformer
2) Runyu Zhang, Arvind Raghunathan, Jeff Shamma, and Na Li, "Constrained Optimization From a Control Perspective via Feedback Linearization"
Workshop Papers:
1) Yuyou Zhang, Radu Corcodel, Chiori Hori, Anoop Cherian, and Ding Zhao, "SpinBench: Perspective and Rotation as a Lens on Spatial Reasoning in VLMs", NeuriIPS 2025 Workshop on SPACE in Vision, Language, and Embodied AI (SpaVLE) (Best Paper Runner-up)
2) Xiaoyu Xie, Saviz Mowlavi, and Mouhacine Benosman, "Smooth and Sparse Latent Dynamics in Operator Learning with Jerk Regularization", Workshop on Machine Learning and the Physical Sciences (ML4PS)
3) Spencer Hutchinson, Abraham Vinod, François Germain, Stefano Di Cairano, Christopher Laughman, and Ankush Chakrabarty, "Quantile-SMPC for Grid-Interactive Buildings with Multivariate Temporal Fusion Transformers", Workshop on UrbanAI: Harnessing Artificial Intelligence for Smart Cities (UrbanAI)
4) Yuki Shirai, Kei Ota, Devesh Jha, and Diego Romeres, "Sim-to-Real Contact-Rich Pivoting via Optimization-Guided RL with Vision and Touch", Worskhop on Embodied World Models for Decision Making
5) Mark Van der Merwe and Devesh Jha, "In-Context Policy Iteration for Dynamic Manipulation", Workshop on Embodied World Models for Decision Making
Workshop Organized:
MERL members co-organized the Multimodal Algorithmic Reasoning (MAR) Workshop (https://marworkshop.github.io/neurips25/). Organizers: Anoop Cherian (Mitsubishi Electric Research Laboratories), Kuan-Chuan Peng (Mitsubishi Electric Research Laboratories), Suhas Lohit (Mitsubishi Electric Research Laboratories), Honglu Zhou (Salesforce AI Research), Kevin Smith (Massachusetts Institute of Technology), and Joshua B. Tenenbaum (Massachusetts Institute of Technology).
- MERL researchers presented 2 main-conference papers and 5 workshop papers, as well as organized a workshop, at NeurIPS 2025.
-
EVENT SANE 2025 - Speech and Audio in the Northeast Date: Friday, November 7, 2025
Location: Google, New York, NY
MERL Contacts: Jonathan Le Roux; Yoshiki Masuyama
Research Areas: Artificial Intelligence, Machine Learning, Speech & AudioBrief- SANE 2025, a one-day event gathering researchers and students in speech and audio from the Northeast of the American continent, was held on Friday November 7, 2025 at Google, in New York, NY.
It was the 12th edition in the SANE series of workshops, which started in 2012 and is typically held every year alternately in Boston and New York. Since the first edition, the audience has grown to about 200 participants and 50 posters each year, and SANE has established itself as a vibrant, must-attend event for the speech and audio community across the northeast and beyond.
SANE 2025 featured invited talks by six leading researchers from the Northeast as well as from the wider community: Dan Ellis (Google Deepmind), Leibny Paola Garcia Perera (Johns Hopkins University), Yuki Mitsufuji (Sony AI), Julia Hirschberg (Columbia University), Yoshiki Masuyama (MERL), and Robin Scheibler (Google Deepmind). It also featured a lively poster session with 50 posters.
MERL Speech and Audio Team's Yoshiki Masuyama presented a well-received overview of the team's recent work on "Neural Fields for Spatial Audio Modeling". His talk highlighted how neural fields are reshaping spatial audio research by enabling flexible, data-driven interpolation of head-related transfer functions and room impulse responses. He also discussed the integration of sound-propagation physics into neural field models through physics-informed neural networks, showcasing MERL’s advances at the intersection of acoustics and deep learning.
SANE 2025 was co-organized by Jonathan Le Roux (MERL), Quan Wang (Google Deepmind), and John R. Hershey (Google Deepmind). SANE remained a free event thanks to generous sponsorship by Google, MERL, Apple, Bose, and Carnegie Mellon University.
Slides and videos of the talks are available from the SANE workshop website and via a YouTube playlist.
- SANE 2025, a one-day event gathering researchers and students in speech and audio from the Northeast of the American continent, was held on Friday November 7, 2025 at Google, in New York, NY.
See All News & Events for Machine Learning -
-
Research Highlights
-
SAC-GNC: SAmple Consensus for adaptive Graduated Non-Convexity -
PS-NeuS: A Probability-guided Sampler for Neural Implicit Surface Rendering -
Quantum AI Technology -
TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models -
Gear-NeRF: Free-Viewpoint Rendering and Tracking with Motion-Aware Spatio-Temporal Sampling -
Private, Secure, and Reliable Artificial Intelligence -
Steered Diffusion -
Sustainable AI -
Edge-Assisted Internet of Vehicles for Smart Mobility -
Robust Machine Learning -
mmWave Beam-SNR Fingerprinting (mmBSF) -
Video Anomaly Detection -
Biosignal Processing for Human-Machine Interaction -
MERL Shopping Dataset -
Task-aware Unified Source Separation - Audio Examples
-
-
Internships
-
ST0247: Internship - Geometry-Aware Surrogate Modeling for Fluid Dynamics
-
SA0191: Internship - Human-Robot Interaction Based on Multimodal Scene Understanding
-
EA0226: Internship - Sample Efficient Safe Reinforcement Learning
See All Internships for Machine Learning -
-
Openings
-
CI0177: Postdoctoral Research Fellow - Agentic AI
-
MS0268: Research Scientist - Multiphysical Systems
-
CA0093: Research Scientist - Control for Autonomous Systems
See All Openings at MERL -
-
Recent Publications
- , "Robot Confirmation Generation and Action Planning Using Long-context Q-Former Integrated with Multimodal LLM", IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), December 2025.BibTeX TR2025-167 PDF
- @inproceedings{Hori2025dec,
- author = {Hori, Chiori and Masuyama, Yoshiki and Jain, Siddarth and Corcodel, Radu and Jha, Devesh K. and Romeres, Diego and {Le Roux}, Jonathan},
- title = {{Robot Confirmation Generation and Action Planning Using Long-context Q-Former Integrated with Multimodal LLM}},
- booktitle = {IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)},
- year = 2025,
- month = dec,
- url = {https://www.merl.com/publications/TR2025-167}
- }
- , "Smooth and Sparse Latent Dynamics in Operator Learning with Jerk Regularization", Advances in Neural Information Processing Systems (NeurIPS) workshop on Machine Learning and the Physical Sciences (ML4PS), December 2025.BibTeX TR2025-166 PDF
- @inproceedings{Xie2025dec,
- author = {{{Xie, Xiaoyu and Mowlavi, Saviz and Benosman, Mouhacine}}},
- title = {{{Smooth and Sparse Latent Dynamics in Operator Learning with Jerk Regularization}}},
- booktitle = {Advances in Neural Information Processing Systems (NeurIPS) workshop on Machine Learning and the Physical Sciences (ML4PS)},
- year = 2025,
- month = dec,
- url = {https://www.merl.com/publications/TR2025-166}
- }
- , "Towards Open-Vocabulary Multimodal 3D Object Detection with Attributes", British Machine Vision Conference (BMVC), November 2025.BibTeX TR2025-162 PDF Video Data Presentation
- @inproceedings{Xiang2025nov,
- author = {{{Xiang, Xinhao and Peng, Kuan-Chuan and Lohit, Suhas and Jones, Michael J. and Zhang, Jiawei}}},
- title = {{{Towards Open-Vocabulary Multimodal 3D Object Detection with Attributes}}},
- booktitle = {British Machine Vision Conference (BMVC)},
- year = 2025,
- month = nov,
- url = {https://www.merl.com/publications/TR2025-162}
- }
- , "Neural Fields for Spatial Audio Modeling," Tech. Rep. TR2025-171, Speech and Audio in the Northeast (SANE), November 2025.BibTeX TR2025-171 PDF
- @techreport{Masuyama2025nov,
- author = {Masuyama, Yoshiki},
- title = {{Neural Fields for Spatial Audio Modeling}},
- institution = {Speech and Audio in the Northeast (SANE)},
- year = 2025,
- month = nov,
- url = {https://www.merl.com/publications/TR2025-171}
- }
- , "Handling Domain Shifts for Anomalous Sound Detection: A Review of DCASE-Related Work", Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE), DOI: 10.5281/zenodo.17251589, October 2025, pp. 20-24.BibTeX TR2025-157 PDF
- @inproceedings{Wilkinghoff2025oct,
- author = {Wilkinghoff, Kevin and Fujimura, Takuya and Imoto, Keisuke and {Le Roux}, Jonathan and Tan, Zheng-Hua and Toda, Tomoki},
- title = {{Handling Domain Shifts for Anomalous Sound Detection: A Review of DCASE-Related Work}},
- booktitle = {Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE)},
- year = 2025,
- pages = {20--24},
- month = oct,
- doi = {10.5281/zenodo.17251589},
- isbn = {978-84-09-77652-8},
- url = {https://www.merl.com/publications/TR2025-157}
- }
- , "Meta-Learning for Physically-Constrained Neural System Identification", Neurocomputing, DOI: 10.1016/j.neucom.2025.130945, Vol. 651, pp. 130945, October 2025.BibTeX TR2025-159 PDF
- @article{Chakrabarty2025nov,
- author = {Chakrabarty, Ankush and Wichern, Gordon and Deshpande, Vedang M. and Vinod, Abraham P. and Berntorp, Karl and Laughman, Christopher R.},
- title = {{Meta-Learning for Physically-Constrained Neural System Identification}},
- journal = {Neurocomputing},
- year = 2025,
- volume = 651,
- pages = 130945,
- month = nov,
- doi = {10.1016/j.neucom.2025.130945},
- issn = {0925-2312},
- url = {https://www.merl.com/publications/TR2025-159}
- }
- , "Switchgear Partial Discharge Diagnosis Using Scarce Fault Records", IEEE PES Innovative Smart Grid Technologies Conference - Europe (ISGT Europe), October 2025.BibTeX TR2025-155 PDF
- @inproceedings{Sun2025oct,
- author = {Sun, Hongbo and Otake, Yasutomo and Matsuyama, Kotaro and Raghunathan, Arvind},
- title = {{Switchgear Partial Discharge Diagnosis Using Scarce Fault Records}},
- booktitle = {IEEE PES Innovative Smart Grid Technologies Conference - Europe (ISGT Europe)},
- year = 2025,
- month = oct,
- url = {https://www.merl.com/publications/TR2025-155}
- }
- , "Radar-Conditioned 3D Bounding Box Diffusion for Indoor Human Perception", IEEE International Conference on Computer Vision (ICCV) Workshop, October 2025.BibTeX TR2025-154 PDF
- @inproceedings{Yataka2025oct,
- author = {Yataka, Ryoma and Wang, Pu and Boufounos, Petros T. and Takahashi, Ryuhei},
- title = {{Radar-Conditioned 3D Bounding Box Diffusion for Indoor Human Perception}},
- booktitle = {IEEE International Conference on Computer Vision (ICCV) Workshop},
- year = 2025,
- month = oct,
- url = {https://www.merl.com/publications/TR2025-154}
- }
- , "Robot Confirmation Generation and Action Planning Using Long-context Q-Former Integrated with Multimodal LLM", IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), December 2025.
-
Videos
-
Software & Data Downloads
-
MEL-PETs Defense for LLM Privacy Challenge -
MMHOI Dataset: Modeling Complex 3D Multi-Human Multi-Object Interactions -
Generalization in Deep RL with a Robust Adaptation Module -
MEL-PETs Joint-Context Attack for LLM Privacy Challenge -
Subject- and Dataset-Aware Neural Field for HRTF Modeling -
Radar-based 3D Pose Estimation using Transformer -
Learned Born Operator for Reflection Tomographic Imaging -
Open Vocabulary Attribute Detection Dataset -
Long-Tailed Online Anomaly Detection dataset -
Group Representation Networks -
Stabilizing Subject Transfer in EEG Classification with Divergence Estimation -
Task-Aware Unified Source Separation -
Local Density-Based Anomaly Score Normalization for Domain Generalization -
Retrieval-Augmented Neural Field for HRTF Upsampling and Personalization -
ComplexVAD Dataset -
Self-Monitored Inference-Time INtervention for Generative Music Transformers -
Radar dEtection TRansformer -
Millimeter-wave Multi-View Radar Dataset -
Gear Extensions of Neural Radiance Fields -
Long-Tailed Anomaly Detection Dataset -
Target-Speaker SEParation -
Pixel-Grounded Prototypical Part Networks -
Steered Diffusion -
BAyesian Network for adaptive SAmple Consensus -
Meta-Learning State Space Models -
Explainable Video Anomaly Localization -
Simple Multimodal Algorithmic Reasoning Task Dataset -
Partial Group Convolutional Neural Networks -
SOurce-free Cross-modal KnowledgE Transfer -
Audio-Visual-Language Embodied Navigation in 3D Environments -
Nonparametric Score Estimators -
3D MOrphable STyleGAN -
Instance Segmentation GAN -
Audio Visual Scene-Graph Segmentor -
Generalized One-class Discriminative Subspaces -
Hierarchical Musical Instrument Separation -
Generating Visual Dynamics from Sound and Context -
Adversarially-Contrastive Optimal Transport -
Online Feature Extractor Network -
MotionNet -
FoldingNet++ -
Quasi-Newton Trust Region Policy Optimization -
Landmarks’ Location, Uncertainty, and Visibility Likelihood -
Robust Iterative Data Estimation -
Gradient-based Nikaido-Isoda -
Circular Maze Environment -
Discriminative Subspace Pooling -
Kernel Correlation Network -
Fast Resampling on Point Clouds via Graphs -
FoldingNet -
Deep Category-Aware Semantic Edge Detection -
MERL Shopping Dataset
-