Artificial Intelligence
Making machines smarter for improved safety, efficiency and comfort.
Our AI research encompasses advances in computer vision, speech and audio processing, as well as data analytics. Key research themes include improved perception based on machine learning techniques, learning control policies through model-based reinforcement learning, as well as cognition and reasoning based on learned semantic representations. We apply our work to a broad range of automotive and robotics applications, as well as building and home systems.
Quick Links
-
Researchers
Jonathan
Le Roux
Toshiaki
Koike-Akino
Ye
Wang
Gordon
Wichern
Anoop
Cherian
Tim K.
Marks
Chiori
Hori
Michael J.
Jones
Kieran
Parsons
François
Germain
Daniel N.
Nikovski
Devesh K.
Jha
Jing
Liu
Suhas
Lohit
Matthew
Brand
Philip V.
Orlik
Diego
Romeres
Pu
(Perry)
WangPetros T.
Boufounos
Siddarth
Jain
Moitreya
Chatterjee
Hassan
Mansour
Kuan-Chuan
Peng
William S.
Yerazunis
Radu
Corcodel
Yoshiki
Masuyama
Arvind
Raghunathan
Hongbo
Sun
Yebin
Wang
Ankush
Chakrabarty
Jianlin
Guo
Chungwei
Lin
Yanting
Ma
Pedro
Miraldo
Bingnan
Wang
Ryo
Aihara
Stefano
Di Cairano
Saviz
Mowlavi
James
Queeney
Anthony
Vetro
Jinyun
Zhang
Vedang M.
Deshpande
Christopher R.
Laughman
Dehong
Liu
Alexander
Schperberg
Wataru
Tsujita
Abraham P.
Vinod
Na
Li
-
Awards
-
AWARD MERL Wins Awards at NeurIPS LLM Privacy Challenge Date: December 15, 2024
Awarded to: Jing Liu, Ye Wang, Toshiaki Koike-Akino, Tsunato Nakai, Kento Oonishi, Takuya Higashi
MERL Contacts: Toshiaki Koike-Akino; Jing Liu; Ye Wang
Research Areas: Artificial Intelligence, Machine Learning, Information SecurityBrief- The Mitsubishi Electric Privacy Enhancing Technologies (MEL-PETs) team, consisting of a collaboration of MERL and Mitsubishi Electric researchers, won awards at the NeurIPS 2024 Large Language Model (LLM) Privacy Challenge. In the Blue Team track of the challenge, we won the 3rd Place Award, and in the Red Team track, we won the Special Award for Practical Attack.
-
AWARD University of Padua and MERL team wins the AI Olympics with RealAIGym competition at IROS24 Date: October 17, 2024
Awarded to: Niccolò Turcato, Alberto Dalla Libera, Giulio Giacomuzzo, Ruggero Carli, Diego Romeres
MERL Contact: Diego Romeres
Research Areas: Artificial Intelligence, Dynamical Systems, Machine Learning, RoboticsBrief- The team composed of the control group at the University of Padua and MERL's Optimization and Robotic team ranked 1st out of the 4 finalist teams that arrived to the 2nd AI Olympics with RealAIGym competition at IROS 24, which focused on control of under-actuated robots. The team was composed by Niccolò Turcato, Alberto Dalla Libera, Giulio Giacomuzzo, Ruggero Carli and Diego Romeres. The competition was organized by the German Research Center for Artificial Intelligence (DFKI), Technical University of Darmstadt and Chalmers University of Technology.
The competition and award ceremony was hosted by IEEE International Conference on Intelligent Robots and Systems (IROS) on October 17, 2024 in Abu Dhabi, UAE. Diego Romeres presented the team's method, based on a model-based reinforcement learning algorithm called MC-PILCO.
- The team composed of the control group at the University of Padua and MERL's Optimization and Robotic team ranked 1st out of the 4 finalist teams that arrived to the 2nd AI Olympics with RealAIGym competition at IROS 24, which focused on control of under-actuated robots. The team was composed by Niccolò Turcato, Alberto Dalla Libera, Giulio Giacomuzzo, Ruggero Carli and Diego Romeres. The competition was organized by the German Research Center for Artificial Intelligence (DFKI), Technical University of Darmstadt and Chalmers University of Technology.
-
AWARD MERL team wins the Listener Acoustic Personalisation (LAP) 2024 Challenge Date: August 29, 2024
Awarded to: Yoshiki Masuyama, Gordon Wichern, Francois G. Germain, Christopher Ick, and Jonathan Le Roux
MERL Contacts: François Germain; Jonathan Le Roux; Gordon Wichern; Yoshiki Masuyama
Research Areas: Artificial Intelligence, Machine Learning, Speech & AudioBrief- MERL's Speech & Audio team ranked 1st out of 7 teams in Task 2 of the 1st SONICOM Listener Acoustic Personalisation (LAP) Challenge, which focused on "Spatial upsampling for obtaining a high-spatial-resolution HRTF from a very low number of directions". The team was led by Yoshiki Masuyama, and also included Gordon Wichern, Francois Germain, MERL intern Christopher Ick, and Jonathan Le Roux.
The LAP Challenge workshop and award ceremony was hosted by the 32nd European Signal Processing Conference (EUSIPCO 24) on August 29, 2024 in Lyon, France. Yoshiki Masuyama presented the team's method, "Retrieval-Augmented Neural Field for HRTF Upsampling and Personalization", and received the award from Prof. Michele Geronazzo (University of Padova, IT, and Imperial College London, UK), Chair of the Challenge's Organizing Committee.
The LAP challenge aims to explore challenges in the field of personalized spatial audio, with the first edition focusing on the spatial upsampling and interpolation of head-related transfer functions (HRTFs). HRTFs with dense spatial grids are required for immersive audio experiences, but their recording is time-consuming. Although HRTF spatial upsampling has recently shown remarkable progress with approaches involving neural fields, HRTF estimation accuracy remains limited when upsampling from only a few measured directions, e.g., 3 or 5 measurements. The MERL team tackled this problem by proposing a retrieval-augmented neural field (RANF). RANF retrieves a subject whose HRTFs are close to those of the target subject at the measured directions from a library of subjects. The HRTF of the retrieved subject at the target direction is fed into the neural field in addition to the desired sound source direction. The team also developed a neural network architecture that can handle an arbitrary number of retrieved subjects, inspired by a multi-channel processing technique called transform-average-concatenate.
- MERL's Speech & Audio team ranked 1st out of 7 teams in Task 2 of the 1st SONICOM Listener Acoustic Personalisation (LAP) Challenge, which focused on "Spatial upsampling for obtaining a high-spatial-resolution HRTF from a very low number of directions". The team was led by Yoshiki Masuyama, and also included Gordon Wichern, Francois Germain, MERL intern Christopher Ick, and Jonathan Le Roux.
See All Awards for Artificial Intelligence -
-
News & Events
-
NEWS Yuki Shirai appointed as an Associate Editor for IEEE Robotics and Automation Letters (RA-L). Date: March 4, 2025
Where: IEEE Robotics and Automation Society (RAS)
MERL Contact: Yuki Shirai
Research Areas: Artificial Intelligence, Optimization, RoboticsBrief- MERL researcher, Yuki Shirai, has been appointed to the editorial board of the IEEE Robotics and Automation Letters (RA-L) as an Associate Editor.
IEEE RA-L publishes peer-reviewed articles in the areas of robotics and automation which can also be presented at the annual flagship conferences of IEEE Robotics and Automation Society (RAS), including IEEE International Conference on Robotics and Automation (ICRA) and International Conference on Intelligent Robots and Systems (IROS).
- MERL researcher, Yuki Shirai, has been appointed to the editorial board of the IEEE Robotics and Automation Letters (RA-L) as an Associate Editor.
-
NEWS MERL Papers and Workshops at AAAI 2025 Date: February 25, 2025 - March 4, 2025
Where: The Association for the Advancement of Artificial Intelligence (AAAI)
MERL Contacts: Ankush Chakrabarty; Toshiaki Koike-Akino; Jing Liu; Kuan-Chuan Peng; Diego Romeres; Ye Wang
Research Areas: Artificial Intelligence, Machine Learning, OptimizationBrief- MERL researchers presented 2 conference papers, 2 workshop papers, and co-organized 1 workshop at the AAAI 2025 conference, which was held in Philadelphia from Feb. 25 to Mar. 4, 2025. AAAI is one of the most prestigious and competitive international conferences in artificial intelligence (AI). Details of MERL contributions are provided below.
- AAAI Papers in Main Tracks:
1. "Forget to Flourish: Leveraging Machine-Unlearning on Pretrained Language Models for Privacy Leakage" by M.R.U. Rashid, J. Liu, T. Koike-Akino, Y. Wang, and S. Mehnaz. [Oral Presentation]
This work proposes a novel unlearning-based model poisoning method that amplifies privacy breaches during fine-tuning. Extensive empirical studies show the proposed method’s efficacy on both membership inference and data extraction attacks. The attack is stealthy enough to bypass detection based defenses, and differential privacy cannot effectively defend against the attacks without significantly impacting model utility.
Paper: https://www.merl.com/publications/TR2025-017
2. "User-Preference Meets Pareto-Optimality: Multi-Objective Bayesian Optimization with Local Gradient Search" by J.H.S. Ip, A. Chakrabarty, A. Mesbah, and D. Romeres. [Poster Presentation]
This paper introduces a sample-efficient multi-objective Bayesian optimization method that integrates user preferences with gradient-based search to find near-Pareto optimal solutions. The proposed method achieves high utility and reduces distance to Pareto-front solutions across both synthetic and real-world problems, underscoring the importance of minimizing gradient uncertainty during gradient-based optimization. Additionally, the study introduces a novel utility function that respects Pareto dominance and effectively captures diverse user preferences.
Paper: https://www.merl.com/publications/TR2025-018
- AAAI Workshop Papers:
1. "Quantum Diffusion Models for Few-Shot Learning" by R. Wang, Y. Wang, J. Liu, and T. Koike-Akino.
This work presents the quantum diffusion model (QDM) as an approach to overcome the challenges of quantum few-shot learning (QFSL). It introduces three novel algorithms developed from complementary data-driven and algorithmic perspectives to enhance the performance of QFSL tasks. The extensive experiments demonstrate that these algorithms achieve significant performance gains over traditional baselines, underscoring the potential of QDM to advance QFSL by effectively leveraging quantum noise modeling and label guidance.
Paper: https://www.merl.com/publications/TR2025-025
2. "Quantum Implicit Neural Compression", by T. Fujihashi and T., Koike-Akino.
This work introduces a quantum counterpart of implicit neural representation (quINR) which leverages the exponentially rich expressivity of quantum neural networks to improve the classical INR-based signal compression methods. Evaluations using some benchmark datasets show that the proposed quINR-based compression could improve rate-distortion performance in image compression compared with traditional codecs and classic INR-based coding methods.
Paper: https://www.merl.com/publications/TR2025-024
- AAAI Workshops Contributed by MERL:
1. "Scalable and Efficient Artificial Intelligence Systems (SEAS)"
K.-C. Peng co-organized this workshop, which offers a timely forum for experts to share their perspectives in designing and developing robust computer vision (CV), machine learning (ML), and artificial intelligence (AI) algorithms, and translating them into real-world solutions.
Workshop link: https://seasworkshop.github.io/aaai25/index.html
2. "Quantum Computing and Artificial Intelligence"
T. Koike-Akino served a session chair of Quantum Neural Network in this workshop, which focuses on seeking contributions encompassing theoretical and applied advances in quantum AI, quantum computing (QC) to enhance classical AI, and classical AI to tackle various aspects of QC.
Workshop link: https://sites.google.com/view/qcai2025/
- MERL researchers presented 2 conference papers, 2 workshop papers, and co-organized 1 workshop at the AAAI 2025 conference, which was held in Philadelphia from Feb. 25 to Mar. 4, 2025. AAAI is one of the most prestigious and competitive international conferences in artificial intelligence (AI). Details of MERL contributions are provided below.
See All News & Events for Artificial Intelligence -
-
Research Highlights
-
PS-NeuS: A Probability-guided Sampler for Neural Implicit Surface Rendering -
Quantum AI Technology -
TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models -
Gear-NeRF: Free-Viewpoint Rendering and Tracking with Motion-Aware Spatio-Temporal Sampling -
Steered Diffusion -
Sustainable AI -
Robust Machine Learning -
mmWave Beam-SNR Fingerprinting (mmBSF) -
Video Anomaly Detection -
Biosignal Processing for Human-Machine Interaction -
Task-aware Unified Source Separation - Audio Examples
-
-
Internships
-
OR0127: Internship - Deep Learning for Robotic Manipulation
MERL is looking for a highly motivated and qualified intern to work on deep learning methods for detection and pose estimation of objects using vision and tactile sensing, in manufacturing and assembly environments. This role involves developing, fine-tuning and deploying models on existing hardware. The method will be applied for robotic manipulation where the knowledge of accurate position and orientation of objects within the scene would allow the robot to interact with the objects. The ideal candidate would be a Ph.D. student familiar with the state-of-the-art methods for pose estimation and tracking of objects. The successful candidate will work closely with MERL researchers to develop and implement novel algorithms, conduct experiments, and publish research findings at a top-tier conference. Start date and expected duration of the internship is flexible. Interested candidates are encouraged to apply with their updated CV and list of relevant publications.
Required Specific Experience
- Prior experience in Computer Vision and Robotic Manipulation.
- Experience with ROS and deep learning frameworks such as PyTorch are essential.
- Strong programming skills in Python.
- Experience with simulation tools, such as PyBullet, Issac Lab, or MuJoCo.
-
CI0080: Internship - Efficient AI
We are on the lookout for passionate and skilled interns to join our cutting-edge research team focused on developing efficient machine learning techniques for sustainability. This is an exciting opportunity to make a real impact in the field of AI and environmental conservation, with the aim of publishing at leading AI research venues.
What We're Looking For:
- Advanced research experience in generative models and computationally efficient models
- Hands-on skills for large language models (LLM), vision language models (VLM), large multi-modal models (LMM), foundation models (FoMo)
- Deep understanding of state-of-the-art machine learning methods
- Proficiency in Python and PyTorch
- Familiarity with various deep learning frameworks
- Ph.D. candidates who have completed at least half of their program
Internship Details:
- Duration: approximately 3 months
- Flexible start dates available
- Objective: publish research results at leading AI research venues
If you are a highly motivated individual with a passion for applying AI to sustainability challenges, we want to hear from you! This internship offers a unique chance to work on meaningful projects at the intersection of machine learning and environmental sustainability.
-
EA0076: Internship - Machine Learning for Electric Motor Design
MERL is seeking a motivated and qualified intern to conduct research on machine learning based electric motor design and optimization. Ideal candidates should be Ph.D. students with a solid background and publication record in electric machine design, optimization, and machine learning. Hands-on experience with the implementation of optimization algorithms, machine learning and deep learning methods is required. Strong programming skills using Python/PyTorch are expected. Knowledge and experience with electric machine principle, design and finite-element analysis are highly desirable. Start date for this internship is flexible and the duration is about 3 months.
See All Internships for Artificial Intelligence -
-
Openings
See All Openings at MERL -
Recent Publications
- "30+ Years of Source Separation Research: Achievements and Future Challenges", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), March 2025.BibTeX TR2025-036 PDF
- @inproceedings{Araki2025mar,
- author = {Araki, Shoko and Ito, Nobutaka and Haeb-Umbach, Reinhold and Wichern, Gordon and Wang, Zhong-Qiu and Mitsufuji, Yuki},
- title = {{30+ Years of Source Separation Research: Achievements and Future Challenges}},
- booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
- year = 2025,
- month = mar,
- url = {https://www.merl.com/publications/TR2025-036}
- }
, - "No Class Left Behind: A Closer Look at Class Balancing for Audio Tagging", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), March 2025.BibTeX TR2025-037 PDF
- @inproceedings{Ebbers2025mar,
- author = {Ebbers, Janek and Germain, François G and Wilkinghoff, Kevin and Wichern, Gordon and {Le Roux}, Jonathan},
- title = {{No Class Left Behind: A Closer Look at Class Balancing for Audio Tagging}},
- booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
- year = 2025,
- month = mar,
- url = {https://www.merl.com/publications/TR2025-037}
- }
, - "O-EENC-SD: Efficient Online End-to-End Neural Clustering for Speaker Diarization", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), March 2025.BibTeX TR2025-031 PDF
- @inproceedings{Gruttadauria2025mar,
- author = {Gruttadauria, Elio and Fontaine, Mathieu and {Le Roux}, Jonathan and Essid, Slim},
- title = {{O-EENC-SD: Efficient Online End-to-End Neural Clustering for Speaker Diarization}},
- booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
- year = 2025,
- month = mar,
- url = {https://www.merl.com/publications/TR2025-031}
- }
, - "Interactive Robot Action Replanning using Multimodal LLM Trained from Human Demonstration Videos", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), March 2025.BibTeX TR2025-034 PDF
- @inproceedings{Hori2025mar,
- author = {Hori, Chiori and Kambara, Motonari and Sugiura, Komei and Ota, Kei and Khurana, Sameer and Jain, Siddarth and Corcodel, Radu and Jha, Devesh K. and Romeres, Diego and {Le Roux}, Jonathan},
- title = {{Interactive Robot Action Replanning using Multimodal LLM Trained from Human Demonstration Videos}},
- booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
- year = 2025,
- month = mar,
- url = {https://www.merl.com/publications/TR2025-034}
- }
, - "Retrieval-Augmented Neural Field for HRTF Upsampling and Personalization", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), March 2025.BibTeX TR2025-029 PDF Software
- @inproceedings{Masuyama2025mar,
- author = {Masuyama, Yoshiki and Wichern, Gordon and Germain, François G and Ick, Christopher and {Le Roux}, Jonathan},
- title = {{Retrieval-Augmented Neural Field for HRTF Upsampling and Personalization}},
- booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
- year = 2025,
- month = mar,
- url = {https://www.merl.com/publications/TR2025-029}
- }
, - "Leveraging Audio-Only Data for Text-Queried Target Sound Extraction", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), March 2025.BibTeX TR2025-033 PDF
- @inproceedings{Saijo2025mar2,
- author = {Saijo, Kohei and Ebbers, Janek and Germain, François G and Khurana, Sameer and Wichern, Gordon and {Le Roux}, Jonathan},
- title = {{Leveraging Audio-Only Data for Text-Queried Target Sound Extraction}},
- booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
- year = 2025,
- month = mar,
- url = {https://www.merl.com/publications/TR2025-033}
- }
, - "Task-Aware Unified Source Separation", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), March 2025.BibTeX TR2025-032 PDF
- @inproceedings{Saijo2025mar,
- author = {Saijo, Kohei and Ebbers, Janek and Germain, François G and Wichern, Gordon and {Le Roux}, Jonathan},
- title = {{Task-Aware Unified Source Separation}},
- booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
- year = 2025,
- month = mar,
- url = {https://www.merl.com/publications/TR2025-032}
- }
, - "ESPnet-SpeechLM: An Open Speech Language Model Toolkit", NAACL-HLT (the system demonstration track), March 2025.BibTeX TR2025-038 PDF
- @inproceedings{Tian2025mar,
- author = {Tian, Jinchuan and Shi, Jiatong and Chen, William and Arora, Siddhant and Masuyama, Yoshiki and Takashi, Maekaku and Wu, Yihan and Peng, Junyi and Bharadwaj, Shikhar and Zhao, Yiwen and Cornell, Samuele and Peng, Yifan and Yue, Xiang and Yang, Chao-Han H. and Neubig, Graham and Watanabe, Shinji},
- title = {{ESPnet-SpeechLM: An Open Speech Language Model Toolkit}},
- booktitle = {NAACL-HLT (the system demonstration track)},
- year = 2025,
- month = mar,
- url = {https://www.merl.com/publications/TR2025-038}
- }
,
- "30+ Years of Source Separation Research: Achievements and Future Challenges", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), March 2025.
-
Videos
-
Software & Data Downloads
-
MEL-PETs Joint-Context Attack for LLM Privacy Challenge -
MEL-PETs Defense for LLM Privacy Challenge -
Learned Born Operator for Reflection Tomographic Imaging -
Retrieval-Augmented Neural Field for HRTF Upsampling and Personalization -
Self-Monitored Inference-Time INtervention for Generative Music Transformers -
Transformer-based model with LOcal-modeling by COnvolution -
Sound Event Bounding Boxes -
Enhanced Reverberation as Supervision -
Gear Extensions of Neural Radiance Fields -
Long-Tailed Anomaly Detection Dataset -
Neural IIR Filter Field for HRTF Upsampling and Personalization -
Target-Speaker SEParation -
Pixel-Grounded Prototypical Part Networks -
Steered Diffusion -
Hyperbolic Audio Source Separation -
Simple Multimodal Algorithmic Reasoning Task Dataset -
Partial Group Convolutional Neural Networks -
SOurce-free Cross-modal KnowledgE Transfer -
Audio-Visual-Language Embodied Navigation in 3D Environments -
Nonparametric Score Estimators -
3D MOrphable STyleGAN -
Instance Segmentation GAN -
Audio Visual Scene-Graph Segmentor -
Generalized One-class Discriminative Subspaces -
Goal directed RL with Safety Constraints -
Hierarchical Musical Instrument Separation -
Generating Visual Dynamics from Sound and Context -
Adversarially-Contrastive Optimal Transport -
Online Feature Extractor Network -
MotionNet -
FoldingNet++ -
Quasi-Newton Trust Region Policy Optimization -
Landmarks’ Location, Uncertainty, and Visibility Likelihood -
Robust Iterative Data Estimation -
Gradient-based Nikaido-Isoda -
Discriminative Subspace Pooling
-