Machine Learning
Data-driven approaches to design intelligent algorithms.
MERL has a long history of research activity in machine learning, including the development of various boosting algorithms and contributing to the theory and practice of highly scalable collaborative filtering. Our recent work has focused on deep learning and reinforcement learning, with application to a wide range of applications including automotive, robotics, factory automation, transportation, as well as building and home systems.
Quick Links
-
Researchers
Toshiaki
Koike-Akino
Ye
Wang
Jonathan
Le Roux
Ankush
Chakrabarty
Anoop
Cherian
Gordon
Wichern
Tim K.
Marks
Philip V.
Orlik
Michael J.
Jones
Stefano
Di Cairano
Kieran
Parsons
Daniel N.
Nikovski
Christopher R.
Laughman
Devesh K.
Jha
Pu
(Perry)
WangDiego
Romeres
Chiori
Hori
Bingnan
Wang
Yebin
Wang
Suhas
Lohit
Hassan
Mansour
Matthew
Brand
Petros T.
Boufounos
Arvind
Raghunathan
Moitreya
Chatterjee
Abraham P.
Vinod
Jing
Liu
Jianlin
Guo
Siddarth
Jain
Kuan-Chuan
Peng
Scott A.
Bortoff
Vedang M.
Deshpande
Hongtao
Qiao
William S.
Yerazunis
Radu
Corcodel
François
Germain
Chungwei
Lin
Pedro
Miraldo
Saviz
Mowlavi
Dehong
Liu
Hongbo
Sun
Wataru
Tsujita
Sameer
Khurana
James
Queeney
Ryo
Aihara
Yanting
Ma
Joshua
Rapp
Anthony
Vetro
Jinyun
Zhang
Jose
Amaya
Purnanand
Elango
Abraham
Goldsmith
Alexander
Schperberg
Avishai
Weiss
Janek
Ebbers
-
Awards
-
AWARD University of Padua and MERL team wins the AI Olympics with RealAIGym competition at IROS24 Date: October 17, 2024
Awarded to: Niccolò Turcato, Alberto Dalla Libera, Giulio Giacomuzzo, Ruggero Carli, Diego Romeres
MERL Contact: Diego Romeres
Research Areas: Artificial Intelligence, Dynamical Systems, Machine Learning, RoboticsBrief- The team composed of the control group at the University of Padua and MERL's Optimization and Robotic team ranked 1st out of the 4 finalist teams that arrived to the 2nd AI Olympics with RealAIGym competition at IROS 24, which focused on control of under-actuated robots. The team was composed by Niccolò Turcato, Alberto Dalla Libera, Giulio Giacomuzzo, Ruggero Carli and Diego Romeres. The competition was organized by the German Research Center for Artificial Intelligence (DFKI), Technical University of Darmstadt and Chalmers University of Technology.
The competition and award ceremony was hosted by IEEE International Conference on Intelligent Robots and Systems (IROS) on October 17, 2024 in Abu Dhabi, UAE. Diego Romeres presented the team's method, based on a model-based reinforcement learning algorithm called MC-PILCO.
- The team composed of the control group at the University of Padua and MERL's Optimization and Robotic team ranked 1st out of the 4 finalist teams that arrived to the 2nd AI Olympics with RealAIGym competition at IROS 24, which focused on control of under-actuated robots. The team was composed by Niccolò Turcato, Alberto Dalla Libera, Giulio Giacomuzzo, Ruggero Carli and Diego Romeres. The competition was organized by the German Research Center for Artificial Intelligence (DFKI), Technical University of Darmstadt and Chalmers University of Technology.
-
AWARD MERL team wins the Listener Acoustic Personalisation (LAP) 2024 Challenge Date: August 29, 2024
Awarded to: Yoshiki Masuyama, Gordon Wichern, Francois G. Germain, Christopher Ick, and Jonathan Le Roux
MERL Contacts: François Germain; Jonathan Le Roux; Gordon Wichern; Yoshiki Masuyama
Research Areas: Artificial Intelligence, Machine Learning, Speech & AudioBrief- MERL's Speech & Audio team ranked 1st out of 7 teams in Task 2 of the 1st SONICOM Listener Acoustic Personalisation (LAP) Challenge, which focused on "Spatial upsampling for obtaining a high-spatial-resolution HRTF from a very low number of directions". The team was led by Yoshiki Masuyama, and also included Gordon Wichern, Francois Germain, MERL intern Christopher Ick, and Jonathan Le Roux.
The LAP Challenge workshop and award ceremony was hosted by the 32nd European Signal Processing Conference (EUSIPCO 24) on August 29, 2024 in Lyon, France. Yoshiki Masuyama presented the team's method, "Retrieval-Augmented Neural Field for HRTF Upsampling and Personalization", and received the award from Prof. Michele Geronazzo (University of Padova, IT, and Imperial College London, UK), Chair of the Challenge's Organizing Committee.
The LAP challenge aims to explore challenges in the field of personalized spatial audio, with the first edition focusing on the spatial upsampling and interpolation of head-related transfer functions (HRTFs). HRTFs with dense spatial grids are required for immersive audio experiences, but their recording is time-consuming. Although HRTF spatial upsampling has recently shown remarkable progress with approaches involving neural fields, HRTF estimation accuracy remains limited when upsampling from only a few measured directions, e.g., 3 or 5 measurements. The MERL team tackled this problem by proposing a retrieval-augmented neural field (RANF). RANF retrieves a subject whose HRTFs are close to those of the target subject at the measured directions from a library of subjects. The HRTF of the retrieved subject at the target direction is fed into the neural field in addition to the desired sound source direction. The team also developed a neural network architecture that can handle an arbitrary number of retrieved subjects, inspired by a multi-channel processing technique called transform-average-concatenate.
- MERL's Speech & Audio team ranked 1st out of 7 teams in Task 2 of the 1st SONICOM Listener Acoustic Personalisation (LAP) Challenge, which focused on "Spatial upsampling for obtaining a high-spatial-resolution HRTF from a very low number of directions". The team was led by Yoshiki Masuyama, and also included Gordon Wichern, Francois Germain, MERL intern Christopher Ick, and Jonathan Le Roux.
-
AWARD Jonathan Le Roux elevated to IEEE Fellow Date: January 1, 2024
Awarded to: Jonathan Le Roux
MERL Contact: Jonathan Le Roux
Research Areas: Artificial Intelligence, Machine Learning, Speech & AudioBrief- MERL Distinguished Scientist and Speech & Audio Senior Team Leader Jonathan Le Roux has been elevated to IEEE Fellow, effective January 2024, "for contributions to multi-source speech and audio processing."
Mitsubishi Electric celebrated Dr. Le Roux's elevation and that of another researcher from the company, Dr. Shumpei Kameyama, with a worldwide news release on February 15.
Dr. Jonathan Le Roux has made fundamental contributions to the field of multi-speaker speech processing, especially to the areas of speech separation and multi-speaker end-to-end automatic speech recognition (ASR). His contributions constituted a major advance in realizing a practically usable solution to the cocktail party problem, enabling machines to replicate humans’ ability to concentrate on a specific sound source, such as a certain speaker within a complex acoustic scene—a long-standing challenge in the speech signal processing community. Additionally, he has made key contributions to the measures used for training and evaluating audio source separation methods, developing several new objective functions to improve the training of deep neural networks for speech enhancement, and analyzing the impact of metrics used to evaluate the signal reconstruction quality. Dr. Le Roux’s technical contributions have been crucial in promoting the widespread adoption of multi-speaker separation and end-to-end ASR technologies across various applications, including smart speakers, teleconferencing systems, hearables, and mobile devices.
IEEE Fellow is the highest grade of membership of the IEEE. It honors members with an outstanding record of technical achievements, contributing importantly to the advancement or application of engineering, science and technology, and bringing significant value to society. Each year, following a rigorous evaluation procedure, the IEEE Fellow Committee recommends a select group of recipients for elevation to IEEE Fellow. Less than 0.1% of voting members are selected annually for this member grade elevation.
- MERL Distinguished Scientist and Speech & Audio Senior Team Leader Jonathan Le Roux has been elevated to IEEE Fellow, effective January 2024, "for contributions to multi-source speech and audio processing."
See All Awards for Machine Learning -
-
News & Events
-
TALK [MERL Seminar Series 2024] Samuel Clarke presents talk titled Audio for Object and Spatial Awareness Date & Time: Wednesday, October 30, 2024; 1:00 PM
Speaker: Samuel Clarke, Stanford University
MERL Host: Gordon Wichern
Research Areas: Artificial Intelligence, Machine Learning, Robotics, Speech & AudioAbstract- Acoustic perception is invaluable to humans and robots in understanding objects and events in their environments. These sounds are dependent on properties of the source, the environment, and the receiver. Many humans possess remarkable intuition both to infer key properties of each of these three aspects from a sound and to form expectations of how these different aspects would affect the sound they hear. In order to equip robots and AI agents with similar if not stronger capabilities, our research has taken a two-fold path. First, we collect high-fidelity datasets in both controlled and uncontrolled environments which capture real sounds of objects and rooms. Second, we introduce differentiable physics-based models that can estimate acoustic properties of objects and rooms from minimal amounts of real audio data, then can predict new sounds from these objects and rooms under novel, “unseen” conditions.
-
TALK [MERL Seminar Series 2024] Tom Griffiths presents talk titled Tools from cognitive science to understand the behavior of large language models Date & Time: Wednesday, September 18, 2024; 1:00 PM
Speaker: Tom Griffiths, Princeton University
Research Areas: Artificial Intelligence, Data Analytics, Machine Learning, Human-Computer InteractionAbstract- Large language models have been found to have surprising capabilities, even what have been called “sparks of artificial general intelligence.” However, understanding these models involves some significant challenges: their internal structure is extremely complicated, their training data is often opaque, and getting access to the underlying mechanisms is becoming increasingly difficult. As a consequence, researchers often have to resort to studying these systems based on their behavior. This situation is, of course, one that cognitive scientists are very familiar with — human brains are complicated systems trained on opaque data and typically difficult to study mechanistically. In this talk I will summarize some of the tools of cognitive science that are useful for understanding the behavior of large language models. Specifically, I will talk about how thinking about different levels of analysis (and Bayesian inference) can help us understand some behaviors that don’t seem particularly intelligent, how tasks like similarity judgment can be used to probe internal representations, how axiom violations can reveal interesting mechanisms, and how associations can reveal biases in systems that have been trained to be unbiased.
See All News & Events for Machine Learning -
-
Research Highlights
-
PS-NeuS: A Probability-guided Sampler for Neural Implicit Surface Rendering -
Quantum AI Technology -
TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models -
Gear-NeRF: Free-Viewpoint Rendering and Tracking with Motion-Aware Spatio-Temporal Sampling -
Steered Diffusion -
Sustainable AI -
Edge-Assisted Internet of Vehicles for Smart Mobility -
Robust Machine Learning -
mmWave Beam-SNR Fingerprinting (mmBSF) -
Video Anomaly Detection -
Biosignal Processing for Human-Machine Interaction -
MERL Shopping Dataset
-
-
Internships
-
MS0098: Internship - Control and Estimation for Large=Scale Thermofluid Systems
MERL is seeking a motivated graduate student to research methods for state and parameter estimation and optimization of large-scale systems for process applications. Representative applications include large vapor-compression cycles and other multiphysical systems for energy conversion that couple thermodynamic, fluid, and electrical domains. The ideal candidate would have a solid background in control and estimation, numerical methods, and optimization; strong programming skills and experience with Julia/Python/Matlab are also expected. Knowledge of the fundamental physics of thermofluid flows (e.g., thermodynamics, heat transfer, and fluid mechanics), nonlinear dynamics, or equation-oriented languages (Modelica, gPROMS) is a plus. The expected duration of this internship is 3 months.
-
OR0115: Internship - Whole-body dexterous manipulation
MERL is looking for a highly motivated individual to work on whole-body dexterous manipulation. The research will develop robot motor skills for whole-body, dexterous manipulation using optimization and/or learning algorithms. The ideal candidate should have experience in either one or multiple of the following topics: Optimization Algorithms for contact systems, Reinforcement Learning, control through contacts, and Behavioral cloning. Senior PhD students in robotics and engineering with a focus on contact-rich manipulation are encouraged to apply. Prior experience working with physical robotic systems (and vision and tactile sensors) is required as results need to be implemented on a physical hardware. Good coding skills in Python ML libraries like PyTorch etc. and/or relevant Optimization packages is required. A successful internship will result in submission of results to a peer-reviewed robotics journal in collaboration with MERL researchers. The expected duration of internship is 4-5 months with start date in May/June 2025. This internship is preferred to be onsite at MERL.
Required Specific Experience
- Prior experience working with physical hardware system is required.
- Prior publication experience in robotics venues like ICRA,RSS, CoRL.
-
CV0100: Internship - Simulation for Human-Robot Interaction
MERL is looking for a self-motivated intern to develop a simulation platform to train vision-and-language models for dynamic human-robot interaction. The ideal intern must have a strong background in computer graphics, computer vision, and machine learning, as well as experience in using the latest graphics simulation toolboxes and physics engines. Working knowledge of recent multimodal generative AI methods is desired. The intern is expected to collaborate with researchers in the computer vision team at MERL to develop algorithms and prepare manuscripts for scientific publications.
Required Specific Experience
- Experience in designing novel realistic 3D interactive scenes for robot learning
- Experience with extending vision-based embodied AI simulators
- Strong foundations in machine learning and programming
- Foundations in optimization, specifically scheduling algorithms, would be a strong plus.
- Strong track record of publications in top-tier computer vision and machine learning venues (such as CVPR, NeurIPS, etc.)
- Must be enrolled in a graduate program, ideally towards a Ph.D.
See All Internships for Machine Learning -
-
Openings
-
CA0093: Research Scientist - Control for Autonomous Systems
-
EA0042: Research Scientist - Control & Learning
See All Openings at MERL -
-
Recent Publications
- "Decentralized, Safe, Multi-agent Motion Planning for Drones Under Uncertainty via Filtered Reinforcement Learning", IEEE Transactions on Control Systems Technology, DOI: 10.1109/TCST.2024.3433229, Vol. 32, No. 6, pp. 2492-2499, January 2025.BibTeX TR2024-136 PDF
- @article{Vinod2025jan,
- author = {Vinod, Abraham P. and Safaoui, Sleiman and Summers, Tyler and Yoshikawa, Nobuyuki and Di Cairano, Stefano}},
- title = {Decentralized, Safe, Multi-agent Motion Planning for Drones Under Uncertainty via Filtered Reinforcement Learning},
- journal = {IEEE Transactions on Control Systems Technology},
- year = 2025,
- volume = 32,
- number = 6,
- pages = {2492--2499},
- month = jan,
- doi = {10.1109/TCST.2024.3433229},
- url = {https://www.merl.com/publications/TR2024-136}
- }
, - "Slaying the HyDRA: Parameter-Efficient Hyper Networks with Low-Displacement Rank Adaptation", Advances in Neural Information Processing Systems (NeurIPS), December 2024.BibTeX TR2024-157 PDF
- @inproceedings{Chen2024dec,
- author = {Chen, Xiangyu and Wang, Ye and Brand, Matthew and Wang, Pu and Liu, Jing and Koike-Akino, Toshiaki}},
- title = {Slaying the HyDRA: Parameter-Efficient Hyper Networks with Low-Displacement Rank Adaptation},
- booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
- year = 2024,
- month = dec,
- url = {https://www.merl.com/publications/TR2024-157}
- }
, - "SuperLoRA: Parameter-Efficient Unified Adaptation of Large Foundation Models", British Machine Vision Conference (BMVC), November 2024.BibTeX TR2024-156 PDF
- @inproceedings{Chen2024nov,
- author = {Chen, Xiangyu and Liu, Jing and Wang, Ye and Wang, Pu and Brand, Matthew and Wang, Guanghui and Koike-Akino, Toshiaki}},
- title = {SuperLoRA: Parameter-Efficient Unified Adaptation of Large Foundation Models},
- booktitle = {British Machine Vision Conference (BMVC)},
- year = 2024,
- month = nov,
- url = {https://www.merl.com/publications/TR2024-156}
- }
, - "RETR: Multi-View Radar Detection Transformer for Indoor Perception", Advances in Neural Information Processing Systems (NeurIPS), November 2024.BibTeX TR2024-159 PDF
- @inproceedings{Yataka2024nov3,
- author = {Yataka, Ryoma and Cardace, Adriano and Wang, Pu and Boufounos, Petros T. and Takahashi, Ryuhei}},
- title = {RETR: Multi-View Radar Detection Transformer for Indoor Perception},
- booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
- year = 2024,
- month = nov,
- url = {https://www.merl.com/publications/TR2024-159}
- }
, - "Single-pixel imaging of spatio-temporal flows using differentiable latent dynamics", IEEE Transactions on Computational Imaging, October 2024.BibTeX TR2024-151 PDF
- @article{Sholokhov2024oct,
- author = {{Sholokhov, Aleksei and Nabi, Saleh and Rapp, Joshua and Brunton, Steven and Kutz, Nathan and Boufounos, Petros T. and Mansour, Hassan}},
- title = {Single-pixel imaging of spatio-temporal flows using differentiable latent dynamics},
- journal = {IEEE Transactions on Computational Imaging},
- year = 2024,
- month = oct,
- url = {https://www.merl.com/publications/TR2024-151}
- }
, - "AI-assisted Field Plate Design of GaN HEMT Device", Advanced Theory and Simulation, October 2024.BibTeX TR2024-152 PDF
- @article{Xiang2024oct,
- author = {Xiang, Xiaofeng and Palash, Rafid and Yagyu, Eiji and Dunham, Scott and Teo, Koon Hoo and Chowdhury, Nadim}},
- title = {AI-assisted Field Plate Design of GaN HEMT Device},
- journal = {Advanced Theory and Simulation},
- year = 2024,
- month = oct,
- url = {https://www.merl.com/publications/TR2024-152}
- }
, - "Learning control of underactuated double pendulum with Model-Based Reinforcement Learning", Competition: AI Olympics With RealAIGym, October 2024.BibTeX TR2024-142 PDF
- @inproceedings{DallaLibera2024oct,
- author = {Dalla Libera, Alberto and Turcato, Niccolò and Giacomuzzo, Giulio and Carli, Ruggero and Romeres, Diego}},
- title = {Learning control of underactuated double pendulum with Model-Based Reinforcement Learning},
- booktitle = {Competition: AI Olympics With RealAIGym},
- year = 2024,
- month = oct,
- url = {https://www.merl.com/publications/TR2024-142}
- }
, - "Analyzing Inference Privacy Risks Through Gradients In Machine Learning", ACM Conference on Computer and Communications Security (CCS), October 2024.BibTeX TR2024-141 PDF
- @inproceedings{Li2024oct,
- author = {Li, Zhuohang and Lowy, Andrew and Liu, Jing and Koike-Akino, Toshiaki and Parsons, Kieran and Malin, Bradley and Wang, Ye}},
- title = {Analyzing Inference Privacy Risks Through Gradients In Machine Learning},
- booktitle = {ACM Conference on Computer and Communications Security (CCS)},
- year = 2024,
- month = oct,
- url = {https://www.merl.com/publications/TR2024-141}
- }
,
- "Decentralized, Safe, Multi-agent Motion Planning for Drones Under Uncertainty via Filtered Reinforcement Learning", IEEE Transactions on Control Systems Technology, DOI: 10.1109/TCST.2024.3433229, Vol. 32, No. 6, pp. 2492-2499, January 2025.
-
Videos
-
Software & Data Downloads
-
DeepBornFNO -
eeg-subject-transfer -
ComplexVAD Dataset -
Millimeter-wave Multi-View Radar Dataset -
Gear Extensions of Neural Radiance Fields -
Long-Tailed Anomaly Detection (LTAD) Dataset -
Target-Speaker SEParation -
Pixel-Grounded Prototypical Part Networks -
Steered Diffusion -
BAyesian Network for adaptive SAmple Consensus -
Simple Multimodal Algorithmic Reasoning Task Dataset -
Partial Group Convolutional Neural Networks -
SOurce-free Cross-modal KnowledgE Transfer -
Audio-Visual-Language Embodied Navigation in 3D Environments -
Nonparametric Score Estimators -
3D MOrphable STyleGAN -
Instance Segmentation GAN -
Audio Visual Scene-Graph Segmentor -
Generalized One-class Discriminative Subspaces -
Hierarchical Musical Instrument Separation -
Generating Visual Dynamics from Sound and Context -
Adversarially-Contrastive Optimal Transport -
Online Feature Extractor Network -
MotionNet -
FoldingNet++ -
Quasi-Newton Trust Region Policy Optimization -
Landmarks’ Location, Uncertainty, and Visibility Likelihood -
Robust Iterative Data Estimation -
Gradient-based Nikaido-Isoda -
Circular Maze Environment -
Discriminative Subspace Pooling -
Kernel Correlation Network -
Fast Resampling on Point Clouds via Graphs -
FoldingNet -
Deep Category-Aware Semantic Edge Detection -
MERL Shopping Dataset
-