Artificial Intelligence
Making machines smarter for improved safety, efficiency and comfort.
Our AI research encompasses advances in computer vision, speech and audio processing, as well as data analytics. Key research themes include improved perception based on machine learning techniques, learning control policies through model-based reinforcement learning, as well as cognition and reasoning based on learned semantic representations. We apply our work to a broad range of automotive and robotics applications, as well as building and home systems.
Quick Links
-
Researchers
Jonathan
Le Roux
Toshiaki
Koike-Akino
Ye
Wang
Gordon
Wichern
Anoop
Cherian
Tim K.
Marks
Chiori
Hori
Michael J.
Jones
Kieran
Parsons
François
Germain
Daniel N.
Nikovski
Devesh K.
Jha
Jing
Liu
Suhas
Lohit
Matthew
Brand
Philip V.
Orlik
Diego
Romeres
Pu
(Perry)
WangPetros T.
Boufounos
Moitreya
Chatterjee
Siddarth
Jain
Hassan
Mansour
Kuan-Chuan
Peng
William S.
Yerazunis
Radu
Corcodel
Yoshiki
Masuyama
Arvind
Raghunathan
Pedro
Miraldo
Hongbo
Sun
Yebin
Wang
Ankush
Chakrabarty
Jianlin
Guo
Chungwei
Lin
Yanting
Ma
Bingnan
Wang
Ryo
Aihara
Stefano
Di Cairano
Saviz
Mowlavi
Anthony
Vetro
Jinyun
Zhang
Vedang M.
Deshpande
Christopher R.
Laughman
Dehong
Liu
Alexander
Schperberg
Wataru
Tsujita
Abraham P.
Vinod
Kenji
Inomata
Na
Li
-
Awards
-
AWARD MERL Wins Awards at NeurIPS LLM Privacy Challenge Date: December 15, 2024
Awarded to: Jing Liu, Ye Wang, Toshiaki Koike-Akino, Tsunato Nakai, Kento Oonishi, Takuya Higashi
MERL Contacts: Toshiaki Koike-Akino; Jing Liu; Ye Wang
Research Areas: Artificial Intelligence, Machine Learning, Information SecurityBrief- The Mitsubishi Electric Privacy Enhancing Technologies (MEL-PETs) team, consisting of a collaboration of MERL and Mitsubishi Electric researchers, won awards at the NeurIPS 2024 Large Language Model (LLM) Privacy Challenge. In the Blue Team track of the challenge, we won the 3rd Place Award, and in the Red Team track, we won the Special Award for Practical Attack.
-
AWARD University of Padua and MERL team wins the AI Olympics with RealAIGym competition at IROS24 Date: October 17, 2024
Awarded to: Niccolò Turcato, Alberto Dalla Libera, Giulio Giacomuzzo, Ruggero Carli, Diego Romeres
MERL Contact: Diego Romeres
Research Areas: Artificial Intelligence, Dynamical Systems, Machine Learning, RoboticsBrief- The team composed of the control group at the University of Padua and MERL's Optimization and Robotic team ranked 1st out of the 4 finalist teams that arrived to the 2nd AI Olympics with RealAIGym competition at IROS 24, which focused on control of under-actuated robots. The team was composed by Niccolò Turcato, Alberto Dalla Libera, Giulio Giacomuzzo, Ruggero Carli and Diego Romeres. The competition was organized by the German Research Center for Artificial Intelligence (DFKI), Technical University of Darmstadt and Chalmers University of Technology.
The competition and award ceremony was hosted by IEEE International Conference on Intelligent Robots and Systems (IROS) on October 17, 2024 in Abu Dhabi, UAE. Diego Romeres presented the team's method, based on a model-based reinforcement learning algorithm called MC-PILCO.
- The team composed of the control group at the University of Padua and MERL's Optimization and Robotic team ranked 1st out of the 4 finalist teams that arrived to the 2nd AI Olympics with RealAIGym competition at IROS 24, which focused on control of under-actuated robots. The team was composed by Niccolò Turcato, Alberto Dalla Libera, Giulio Giacomuzzo, Ruggero Carli and Diego Romeres. The competition was organized by the German Research Center for Artificial Intelligence (DFKI), Technical University of Darmstadt and Chalmers University of Technology.
-
AWARD MERL team wins the Listener Acoustic Personalisation (LAP) 2024 Challenge Date: August 29, 2024
Awarded to: Yoshiki Masuyama, Gordon Wichern, Francois G. Germain, Christopher Ick, and Jonathan Le Roux
MERL Contacts: François Germain; Jonathan Le Roux; Gordon Wichern; Yoshiki Masuyama
Research Areas: Artificial Intelligence, Machine Learning, Speech & AudioBrief- MERL's Speech & Audio team ranked 1st out of 7 teams in Task 2 of the 1st SONICOM Listener Acoustic Personalisation (LAP) Challenge, which focused on "Spatial upsampling for obtaining a high-spatial-resolution HRTF from a very low number of directions". The team was led by Yoshiki Masuyama, and also included Gordon Wichern, Francois Germain, MERL intern Christopher Ick, and Jonathan Le Roux.
The LAP Challenge workshop and award ceremony was hosted by the 32nd European Signal Processing Conference (EUSIPCO 24) on August 29, 2024 in Lyon, France. Yoshiki Masuyama presented the team's method, "Retrieval-Augmented Neural Field for HRTF Upsampling and Personalization", and received the award from Prof. Michele Geronazzo (University of Padova, IT, and Imperial College London, UK), Chair of the Challenge's Organizing Committee.
The LAP challenge aims to explore challenges in the field of personalized spatial audio, with the first edition focusing on the spatial upsampling and interpolation of head-related transfer functions (HRTFs). HRTFs with dense spatial grids are required for immersive audio experiences, but their recording is time-consuming. Although HRTF spatial upsampling has recently shown remarkable progress with approaches involving neural fields, HRTF estimation accuracy remains limited when upsampling from only a few measured directions, e.g., 3 or 5 measurements. The MERL team tackled this problem by proposing a retrieval-augmented neural field (RANF). RANF retrieves a subject whose HRTFs are close to those of the target subject at the measured directions from a library of subjects. The HRTF of the retrieved subject at the target direction is fed into the neural field in addition to the desired sound source direction. The team also developed a neural network architecture that can handle an arbitrary number of retrieved subjects, inspired by a multi-channel processing technique called transform-average-concatenate.
- MERL's Speech & Audio team ranked 1st out of 7 teams in Task 2 of the 1st SONICOM Listener Acoustic Personalisation (LAP) Challenge, which focused on "Spatial upsampling for obtaining a high-spatial-resolution HRTF from a very low number of directions". The team was led by Yoshiki Masuyama, and also included Gordon Wichern, Francois Germain, MERL intern Christopher Ick, and Jonathan Le Roux.
See All Awards for Artificial Intelligence -
-
News & Events
-
EVENT SANE 2023 - Speech and Audio in the Northeast Date: Thursday, October 26, 2023
Location: New York University, Brooklyn, New York, NY
MERL Contacts: Jonathan Le Roux; Gordon Wichern
Research Areas: Artificial Intelligence, Machine Learning, Speech & AudioBrief- SANE 2023, a one-day event gathering researchers and students in speech and audio from the Northeast of the American continent, was held on Thursday October 26, 2023 at NYU in Brooklyn, New York.
It was the 10th edition in the SANE series of workshops, which started in 2012 and is typically held every year alternately in Boston and New York. Since the first edition, the audience has steadily grown, and SANE 2023 broke SANE 2019's record with 200 participants and 51 posters.
This year's SANE took place in conjunction with the WASPAA workshop, held October 22-25 in upstate New York.
SANE 2023 featured invited talks by seven leading researchers from the Northeast and beyond: Arsha Nagrani (Google), Gaël Richard (Télécom Paris), Gordon Wichern (MERL), Kyunghyun Cho (NYU / Prescient Design), Anna Huang (Google DeepMind / MILA), Wenwu Wang (University of Surrey), and Yuan Gong (MIT). It also featured a lively poster session with 51 posters.
SANE 2023 was co-organized by Jonathan Le Roux (MERL), Juan P. Bello (NYU), and John R. Hershey (Google). SANE remained a free event thanks to generous sponsorship by NYU, MERL, Google, Adobe, Bose, Meta Reality Labs, and Amazon.
Slides and videos of the talks are available from the SANE workshop website.
- SANE 2023, a one-day event gathering researchers and students in speech and audio from the Northeast of the American continent, was held on Thursday October 26, 2023 at NYU in Brooklyn, New York.
-
NEWS Suhas Lohit presents invited talk at Boston Symmetry Day 2025 Date: March 31, 2025
Where: Northeastern University, Boston, MA
MERL Contact: Suhas Lohit
Research Areas: Artificial Intelligence, Computer Vision, Machine LearningBrief- MERL researcher Suhas Lohit was an invited speaker at Boston Symmetry Day, held at Northeastern University. Boston Symmetry Day, an annual workshop organized by researchers at MIT and Northeastern, brought together attendees interested in symmetry-informed machine learning and its applications. Suhas' talk, titled “Efficiency for Equivariance, and Efficiency through Equivariance” discussed recent MERL works that show how to build general and efficient equivariant neural networks, and how equivariance can be utilized in self-supervised learning to yield improved 3D object detection. The abstract and slides can be found in the link below.
See All News & Events for Artificial Intelligence -
-
Research Highlights
-
PS-NeuS: A Probability-guided Sampler for Neural Implicit Surface Rendering -
Quantum AI Technology -
TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models -
Gear-NeRF: Free-Viewpoint Rendering and Tracking with Motion-Aware Spatio-Temporal Sampling -
Steered Diffusion -
Sustainable AI -
Robust Machine Learning -
mmWave Beam-SNR Fingerprinting (mmBSF) -
Video Anomaly Detection -
Biosignal Processing for Human-Machine Interaction -
Task-aware Unified Source Separation - Audio Examples
-
-
Internships
-
CV0101: Internship - Multimodal Algorithmic Reasoning
MERL is looking for a self-motivated intern to research on problems at the intersection of multimodal large language models and neural algorithmic reasoning. An ideal intern would be a Ph.D. student with a strong background in machine learning and computer vision. The candidate must have prior experience with training multimodal LLMs for solving vision-and-language tasks. Experience in participating and winning mathematical Olympiads is desired. Publications in theoretical machine learning venues would be a strong plus. The intern is expected to collaborate with researchers in the computer vision team at MERL to develop algorithms and prepare manuscripts for scientific publications.
Required Specific Experience
- Experience with training large vision-and-language models
- Experience with solving mathematical reasoning problems
- Experience with programming in Python using PyTorch
- Enrolled in a PhD program
- Strong track record of publications in top-tier computer vision and machine learning venues (such as CVPR, NeurIPS, etc.).
-
CV0075: Internship - Multimodal Embodied AI
MERL is looking for a self-motivated intern to work on problems at the intersection of multimodal large language models and embodied AI in dynamic indoor environments. The ideal candidate would be a PhD student with a strong background in machine learning and computer vision, as demonstrated by top-tier publications. The candidate must have prior experience in designing synthetic scenes (e.g., 3D games) using popular graphics software, embodied AI, large language models, reinforcement learning, and the use of simulators such as Habitat/SoundSpaces. Hands on experience in using animated 3D human shape models (e.g., SMPL and variants) is desired. The intern is expected to collaborate with researchers in computer vision at MERL to develop algorithms and prepare manuscripts for scientific publications.
Required Specific Experience
- Experience in designing 3D interactive scenes
- Experience with vision based embodied AI using simulators (implementation on real robotic hardware would be a plus).
- Experience training large language models on multimodal data
- Experience with training reinforcement learning algorithms
- Strong foundations in machine learning and programming
- Strong track record of publications in top-tier computer vision and machine learning venues (such as CVPR, NeurIPS, etc.).
-
ST0105: Internship - Surrogate Modeling for Sound Propagation
MERL is seeking a motivated and qualified individual to work on fast surrogate models for sound emission and propagation from complex vibrating structures, with applications in HVAC noise reduction. The ideal candidate will be a PhD student in engineering or related fields with a solid background in frequency-domain acoustic modeling and numerical techniques for partial differential equations (PDEs). Preferred skills include knowledge of the boundary element method (BEM), data-driven modeling, and physics-informed machine learning. Publication of the results obtained during the internship is expected. The duration is expected to be at least 3 months with a flexible start date.
See All Internships for Artificial Intelligence -
-
Openings
See All Openings at MERL -
Recent Publications
- "Quantum-PEFT: Ultra Parameter-Efficient Fine-Tuning", International Conference on Learning Representations (ICLR), April 2025.BibTeX TR2025-051 PDF
- @inproceedings{Koike-Akino2025apr,
- author = {Koike-Akino, Toshiaki and Tonin,Francesco and Wu,Yongtao and Wu,Frank Zhengqing and Candogan,Leyla Naz and Cevher, Volkan},
- title = {{Quantum-PEFT: Ultra Parameter-Efficient Fine-Tuning}},
- booktitle = {International Conference on Learning Representations (ICLR)},
- year = 2025,
- month = apr,
- url = {https://www.merl.com/publications/TR2025-051}
- }
, - "Programmatic Video Prediction Using Large Language Models", International Conference on Learning Representations Workshops (ICLRW), April 2025.BibTeX TR2025-049 PDF
- @inproceedings{Tang2025apr,
- author = {Tang, Hao and Ellis, Kevin and Lohit, Suhas and Jones, Michael J. and Chatterjee, Moitreya},
- title = {{Programmatic Video Prediction Using Large Language Models}},
- booktitle = {International Conference on Learning Representations Workshops (ICLRW)},
- year = 2025,
- month = apr,
- url = {https://www.merl.com/publications/TR2025-049}
- }
, - "30+ Years of Source Separation Research: Achievements and Future Challenges", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), April 2025.BibTeX TR2025-036 PDF
- @inproceedings{Araki2025mar,
- author = {Araki, Shoko and Ito, Nobutaka and Haeb-Umbach, Reinhold and Wichern, Gordon and Wang, Zhong-Qiu and Mitsufuji, Yuki},
- title = {{30+ Years of Source Separation Research: Achievements and Future Challenges}},
- booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
- year = 2025,
- month = mar,
- url = {https://www.merl.com/publications/TR2025-036}
- }
, - "No Class Left Behind: A Closer Look at Class Balancing for Audio Tagging", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), April 2025.BibTeX TR2025-037 PDF
- @inproceedings{Ebbers2025mar,
- author = {Ebbers, Janek and Germain, François G and Wilkinghoff, Kevin and Wichern, Gordon and {Le Roux}, Jonathan},
- title = {{No Class Left Behind: A Closer Look at Class Balancing for Audio Tagging}},
- booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
- year = 2025,
- month = mar,
- url = {https://www.merl.com/publications/TR2025-037}
- }
, - "O-EENC-SD: Efficient Online End-to-End Neural Clustering for Speaker Diarization", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), April 2025.BibTeX TR2025-031 PDF
- @inproceedings{Gruttadauria2025mar,
- author = {Gruttadauria, Elio and Fontaine, Mathieu and {Le Roux}, Jonathan and Essid, Slim},
- title = {{O-EENC-SD: Efficient Online End-to-End Neural Clustering for Speaker Diarization}},
- booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
- year = 2025,
- month = mar,
- url = {https://www.merl.com/publications/TR2025-031}
- }
, - "Interactive Robot Action Replanning using Multimodal LLM Trained from Human Demonstration Videos", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), April 2025.BibTeX TR2025-034 PDF
- @inproceedings{Hori2025mar,
- author = {Hori, Chiori and Kambara, Motonari and Sugiura, Komei and Ota, Kei and Khurana, Sameer and Jain, Siddarth and Corcodel, Radu and Jha, Devesh K. and Romeres, Diego and {Le Roux}, Jonathan},
- title = {{Interactive Robot Action Replanning using Multimodal LLM Trained from Human Demonstration Videos}},
- booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
- year = 2025,
- month = mar,
- url = {https://www.merl.com/publications/TR2025-034}
- }
, - "Retrieval-Augmented Neural Field for HRTF Upsampling and Personalization", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), April 2025.BibTeX TR2025-029 PDF Software
- @inproceedings{Masuyama2025mar,
- author = {Masuyama, Yoshiki and Wichern, Gordon and Germain, François G and Ick, Christopher and {Le Roux}, Jonathan},
- title = {{Retrieval-Augmented Neural Field for HRTF Upsampling and Personalization}},
- booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
- year = 2025,
- month = mar,
- url = {https://www.merl.com/publications/TR2025-029}
- }
, - "Leveraging Audio-Only Data for Text-Queried Target Sound Extraction", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), April 2025.BibTeX TR2025-033 PDF
- @inproceedings{Saijo2025mar2,
- author = {Saijo, Kohei and Ebbers, Janek and Germain, François G and Khurana, Sameer and Wichern, Gordon and {Le Roux}, Jonathan},
- title = {{Leveraging Audio-Only Data for Text-Queried Target Sound Extraction}},
- booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
- year = 2025,
- month = mar,
- url = {https://www.merl.com/publications/TR2025-033}
- }
,
- "Quantum-PEFT: Ultra Parameter-Efficient Fine-Tuning", International Conference on Learning Representations (ICLR), April 2025.
-
Videos
-
Software & Data Downloads
-
MEL-PETs Joint-Context Attack for LLM Privacy Challenge -
MEL-PETs Defense for LLM Privacy Challenge -
Learned Born Operator for Reflection Tomographic Imaging -
Retrieval-Augmented Neural Field for HRTF Upsampling and Personalization -
Self-Monitored Inference-Time INtervention for Generative Music Transformers -
Transformer-based model with LOcal-modeling by COnvolution -
Sound Event Bounding Boxes -
Enhanced Reverberation as Supervision -
Gear Extensions of Neural Radiance Fields -
Long-Tailed Anomaly Detection Dataset -
Neural IIR Filter Field for HRTF Upsampling and Personalization -
Target-Speaker SEParation -
Pixel-Grounded Prototypical Part Networks -
Steered Diffusion -
Hyperbolic Audio Source Separation -
Simple Multimodal Algorithmic Reasoning Task Dataset -
Partial Group Convolutional Neural Networks -
SOurce-free Cross-modal KnowledgE Transfer -
Audio-Visual-Language Embodied Navigation in 3D Environments -
Nonparametric Score Estimators -
3D MOrphable STyleGAN -
Instance Segmentation GAN -
Audio Visual Scene-Graph Segmentor -
Generalized One-class Discriminative Subspaces -
Goal directed RL with Safety Constraints -
Hierarchical Musical Instrument Separation -
Generating Visual Dynamics from Sound and Context -
Adversarially-Contrastive Optimal Transport -
Online Feature Extractor Network -
MotionNet -
FoldingNet++ -
Quasi-Newton Trust Region Policy Optimization -
Landmarks’ Location, Uncertainty, and Visibility Likelihood -
Robust Iterative Data Estimation -
Gradient-based Nikaido-Isoda -
Discriminative Subspace Pooling
-