Machine Learning
Data-driven approaches to design intelligent algorithms.
MERL has a long history of research activity in machine learning, including the development of various boosting algorithms and contributing to the theory and practice of highly scalable collaborative filtering. Our recent work has focused on deep learning and reinforcement learning, with application to a wide range of applications including automotive, robotics, factory automation, transportation, as well as building and home systems.
Quick Links
-
Researchers
Toshiaki
Koike-Akino
Jonathan
Le Roux
Ye
Wang
Ankush
Chakrabarty
Anoop
Cherian
Gordon
Wichern
Philip V.
Orlik
Michael J.
Jones
Tim K.
Marks
Daniel N.
Nikovski
Kieran
Parsons
Devesh K.
Jha
Stefano
Di Cairano
Diego
Romeres
Chiori
Hori
Christopher R.
Laughman
Pu
(Perry)
WangKarl
Berntorp
Yebin
Wang
Bingnan
Wang
Mouhacine
Benosman
Suhas
Lohit
Hassan
Mansour
Matthew
Brand
Petros T.
Boufounos
Arvind
Raghunathan
Moitreya
Chatterjee
Jianlin
Guo
Siddarth
Jain
Kuan-Chuan
Peng
Abraham P.
Vinod
William S.
Yerazunis
Scott A.
Bortoff
Radu
Corcodel
Vedang M.
Deshpande
François
Germain
Chungwei
Lin
Dehong
Liu
Saviz
Mowlavi
Hongtao
Qiao
Hongbo
Sun
Wataru
Tsujita
Sameer
Khurana
Jing
Liu
Pedro
Miraldo
Koon Hoo
Teo
Anthony
Vetro
Ryoma
Yataka
Jinyun
Zhang
Jose
Amaya
Abraham
Goldsmith
Yanting
Ma
James
Queeney
Joshua
Rapp
Avishai
Weiss
Janek
Ebbers
Ryo
Hase
Zexu
Pan
Shinya
Tsuruta
-
Awards
-
AWARD Jonathan Le Roux elevated to IEEE Fellow Date: January 1, 2024
Awarded to: Jonathan Le Roux
MERL Contact: Jonathan Le Roux
Research Areas: Artificial Intelligence, Machine Learning, Speech & AudioBrief- MERL Distinguished Scientist and Speech & Audio Senior Team Leader Jonathan Le Roux has been elevated to IEEE Fellow, effective January 2024, "for contributions to multi-source speech and audio processing."
Mitsubishi Electric celebrated Dr. Le Roux's elevation and that of another researcher from the company, Dr. Shumpei Kameyama, with a worldwide news release on February 15.
Dr. Jonathan Le Roux has made fundamental contributions to the field of multi-speaker speech processing, especially to the areas of speech separation and multi-speaker end-to-end automatic speech recognition (ASR). His contributions constituted a major advance in realizing a practically usable solution to the cocktail party problem, enabling machines to replicate humans’ ability to concentrate on a specific sound source, such as a certain speaker within a complex acoustic scene—a long-standing challenge in the speech signal processing community. Additionally, he has made key contributions to the measures used for training and evaluating audio source separation methods, developing several new objective functions to improve the training of deep neural networks for speech enhancement, and analyzing the impact of metrics used to evaluate the signal reconstruction quality. Dr. Le Roux’s technical contributions have been crucial in promoting the widespread adoption of multi-speaker separation and end-to-end ASR technologies across various applications, including smart speakers, teleconferencing systems, hearables, and mobile devices.
IEEE Fellow is the highest grade of membership of the IEEE. It honors members with an outstanding record of technical achievements, contributing importantly to the advancement or application of engineering, science and technology, and bringing significant value to society. Each year, following a rigorous evaluation procedure, the IEEE Fellow Committee recommends a select group of recipients for elevation to IEEE Fellow. Less than 0.1% of voting members are selected annually for this member grade elevation.
- MERL Distinguished Scientist and Speech & Audio Senior Team Leader Jonathan Le Roux has been elevated to IEEE Fellow, effective January 2024, "for contributions to multi-source speech and audio processing."
-
AWARD Honorable Mention Award at NeurIPS 23 Instruction Workshop Date: December 15, 2023
Awarded to: Lingfeng Sun, Devesh K. Jha, Chiori Hori, Siddharth Jain, Radu Corcodel, Xinghao Zhu, Masayoshi Tomizuka and Diego Romeres
MERL Contacts: Radu Corcodel; Chiori Hori; Siddarth Jain; Devesh K. Jha; Diego Romeres
Research Areas: Artificial Intelligence, Machine Learning, RoboticsBrief- MERL Researchers received an "Honorable Mention award" at the Workshop on Instruction Tuning and Instruction Following at the NeurIPS 2023 conference in New Orleans. The workshop was on the topic of instruction tuning and Instruction following for Large Language Models (LLMs). MERL researchers presented their work on interactive planning using LLMs for partially observable robotic tasks during the oral presentation session at the workshop.
-
AWARD MERL team wins the Audio-Visual Speech Enhancement (AVSE) 2023 Challenge Date: December 16, 2023
Awarded to: Zexu Pan, Gordon Wichern, Yoshiki Masuyama, Francois Germain, Sameer Khurana, Chiori Hori, and Jonathan Le Roux
MERL Contacts: François Germain; Chiori Hori; Sameer Khurana; Jonathan Le Roux; Zexu Pan; Gordon Wichern
Research Areas: Artificial Intelligence, Machine Learning, Speech & AudioBrief- MERL's Speech & Audio team ranked 1st out of 12 teams in the 2nd COG-MHEAR Audio-Visual Speech Enhancement Challenge (AVSE). The team was led by Zexu Pan, and also included Gordon Wichern, Yoshiki Masuyama, Francois Germain, Sameer Khurana, Chiori Hori, and Jonathan Le Roux.
The AVSE challenge aims to design better speech enhancement systems by harnessing the visual aspects of speech (such as lip movements and gestures) in a manner similar to the brain’s multi-modal integration strategies. MERL’s system was a scenario-aware audio-visual TF-GridNet, that incorporates the face recording of a target speaker as a conditioning factor and also recognizes whether the predominant interference signal is speech or background noise. In addition to outperforming all competing systems in terms of objective metrics by a wide margin, in a listening test, MERL’s model achieved the best overall word intelligibility score of 84.54%, compared to 57.56% for the baseline and 80.41% for the next best team. The Fisher’s least significant difference (LSD) was 2.14%, indicating that our model offered statistically significant speech intelligibility improvements compared to all other systems.
- MERL's Speech & Audio team ranked 1st out of 12 teams in the 2nd COG-MHEAR Audio-Visual Speech Enhancement Challenge (AVSE). The team was led by Zexu Pan, and also included Gordon Wichern, Yoshiki Masuyama, Francois Germain, Sameer Khurana, Chiori Hori, and Jonathan Le Roux.
See All Awards for Machine Learning -
-
News & Events
-
NEWS Diego Romeres gave an invited talk at the Padua University's Seminar series on "AI in Action" Date: April 9, 2024
MERL Contact: Diego Romeres
Research Areas: Artificial Intelligence, Dynamical Systems, Machine Learning, Optimization, RoboticsBrief- Diego Romeres, Principal Research Scientist and Team Leader in the Optimization and Robotics Team, was invited to speak as a guest lecturer in the seminar series on "AI in Action" in the Department of Management and Engineering, at the University of Padua.
The talk, entitled "Machine Learning for Robotics and Automation" described MERL's recent research on machine learning and model-based reinforcement learning applied to robotics and automation.
- Diego Romeres, Principal Research Scientist and Team Leader in the Optimization and Robotics Team, was invited to speak as a guest lecturer in the seminar series on "AI in Action" in the Department of Management and Engineering, at the University of Padua.
-
NEWS Saviz Mowlavi gave an invited talk at North Carolina State University Date: April 12, 2024
MERL Contact: Saviz Mowlavi
Research Areas: Control, Dynamical Systems, Machine Learning, OptimizationBrief- Saviz Mowlavi was invited to present remotely at the Computational and Applied Mathematics seminar series in the Department of Mathematics at North Carolina State University.
The talk, entitled "Model-based and data-driven prediction and control of spatio-temporal systems", described the use of temporal smoothness to regularize the training of fast surrogate models for PDEs, user-friendly methods for PDE-constrained optimization, and efficient strategies for learning feedback controllers for PDEs.
- Saviz Mowlavi was invited to present remotely at the Computational and Applied Mathematics seminar series in the Department of Mathematics at North Carolina State University.
See All News & Events for Machine Learning -
-
Research Highlights
-
TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models -
Steered Diffusion -
Edge-Assisted Internet of Vehicles for Smart Mobility -
Robust Machine Learning -
mmWave Beam-SNR Fingerprinting (mmBSF) -
Video Anomaly Detection -
Biosignal Processing for Human-Machine Interaction -
MERL Shopping Dataset
-
-
Internships
-
OR2103: Human Robot Collaboration in Assembly Tasks
MERL is looking for a self-motivated and qualified candidate to work on human-robot-interaction for manipulation and assembly collaborative scenarios. The ideal candidate is a PhD student and should have experience and records in one or multiple of the following areas. 1) Control, estimation and perception for Robotic manipulation 2) Task and Motion Planning 3) Learning from demonstration algorithms applied to robotic manipulation 4) Machine learning techniques for modeling and control as well as regression and classification problems. 5) Experience in working with robotic systems and familiarity with physics engine simulators like Mujoco, Isaac Gym, PyBullet. The successful candidate will be expected to develop, in collaboration with MERL employees, state of the art algorithms to solve complex manipulation tasks that involve human and robot collaborations. Proficiency in Python and ROS are required. The expectation is that the research will lead to one or more scientific publications. The expected duration s 3-4 months, with a flexible starting date.
-
SA2073: Multimodal scene-understanding
We are looking for a graduate student interested in helping advance the field of multimodal scene understanding, with a focus on scene understanding using natural language for robot dialog and/or indoor monitoring using a large language model. The intern will collaborate with MERL researchers to derive and implement new models and optimization methods, conduct experiments, and prepare results for publication. Internships regularly lead to one or more publications in top-tier venues, which can later become part of the intern''s doctoral work. The ideal candidates are senior Ph.D. students with experience in deep learning for audio-visual, signal, and natural language processing. Good programming skills in Python and knowledge of deep learning frameworks such as PyTorch are essential. Multiple positions are available with flexible start date (not just Spring/Summer but throughout 2024) and duration (typically 3-6 months).
-
CV2119: Conditional Video Generation
We seek a highly motivated intern to conduct original research in generative models for conditional video generation. We are interested in applications to various tasks such as video generation from text, images, and diagrams. The successful candidate will collaborate with MERL researchers to design and implement new models, conduct experiments, and prepare results for publication. The candidate should be a PhD student (or postdoc) in computer vision and machine learning with a strong publication record including at least one paper in a top-tier computer vision or machine learning venue such as CVPR, ECCV, ICCV, ICML, ICLR, NeurIPS, AAAI, or TPAMI. Strong programming skills, experience developing and implementing new models in deep learning platforms such as PyTorch, and broad knowledge of machine learning and deep learning methods are expected, including experience in the latest advances in conditional video generation. Start date is flexible; duration should be at least 3 months.
See All Internships for Machine Learning -
-
Openings
-
OR2137: Research Scientist - Optimization & Intelligent Robotics
-
EA2051: Research Scientist - Electric Systems Automation
See All Openings at MERL -
-
Recent Publications
- "Long-Tailed Anomaly Detection with Learnable Class Names", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2024.BibTeX TR2024-040 PDF
- @inproceedings{Ho2024jun,
- author = {Ho, Chih-Hui and Peng, Kuan-Chuan and Vasconcelos, Nuno},
- title = {Long-Tailed Anomaly Detection with Learnable Class Names},
- booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
- year = 2024,
- month = jun,
- url = {https://www.merl.com/publications/TR2024-040}
- }
, - "SIRA: Scalable Inter-frame Relation and Association for Radar Perception", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2024.BibTeX TR2024-041 PDF
- @inproceedings{Yataka2024jun,
- author = {Yataka, Ryoma and Wang, Pu and Boufounos, Petros T. and Takahashi, Ryuhei},
- title = {SIRA: Scalable Inter-frame Relation and Association for Radar Perception},
- booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
- year = 2024,
- month = jun,
- url = {https://www.merl.com/publications/TR2024-041}
- }
, - "Optimal Transport Perturbations for Safe Reinforcement Learning with Robustness Guarantees", Transactions on Machine Learning Research (TMLR), April 2024.BibTeX TR2024-037 PDF
- @article{Queeney2024apr,
- author = {Queeney, James and Ozcan, Erhan Can and Paschalidis, Ioannis Ch. and Cassandras, Christos G.},
- title = {Optimal Transport Perturbations for Safe Reinforcement Learning with Robustness Guarantees},
- journal = {Transactions on Machine Learning Research (TMLR)},
- year = 2024,
- month = apr,
- issn = {2835-8856},
- url = {https://www.merl.com/publications/TR2024-037}
- }
, - "LMI-Based Neural Observer for State and Nonlinear Function Estimation", International Journal of Robust and Nonlinear Control, DOI: 10.1002/rnc.7327, April 2024.BibTeX TR2024-036 PDF
- @article{Jeon2024apr,
- author = {Jeon, Woongsun and Chakrabarty, Ankush and Zemouche, Ali and Rajamani, Rajesh},
- title = {LMI-Based Neural Observer for State and Nonlinear Function Estimation},
- journal = {International Journal of Robust and Nonlinear Control},
- year = 2024,
- month = apr,
- doi = {10.1002/rnc.7327},
- url = {https://www.merl.com/publications/TR2024-036}
- }
, - "Understanding and Controlling Generative Music Transformers by Probing Individual Attention Heads", IEEE ICASSP Satellite Workshop on Explainable Machine Learning for Speech and Audio (XAI-SA), April 2024.BibTeX TR2024-032 PDF
- @inproceedings{Koo2024apr,
- author = {Koo, Junghyun and Wichern, Gordon and Germain, François G and Khurana, Sameer and Le Roux, Jonathan},
- title = {Understanding and Controlling Generative Music Transformers by Probing Individual Attention Heads},
- booktitle = {IEEE ICASSP Satellite Workshop on Explainable Machine Learning for Speech and Audio (XAI-SA)},
- year = 2024,
- month = apr,
- url = {https://www.merl.com/publications/TR2024-032}
- }
, - "Physics-informed shape optimization using coordinate projection", Scientific Reports, DOI: 10.1038/s41598-024-57137-4, Vol. 14, pp. 6537, April 2024.BibTeX TR2024-035 PDF
- @article{Zhang2024apr,
- author = {Zhang, Zhizhou and Lin, Chungwei and Wang, Bingnan},
- title = {Physics-informed shape optimization using coordinate projection},
- journal = {Scientific Reports},
- year = 2024,
- volume = 14,
- pages = 6537,
- month = apr,
- doi = {10.1038/s41598-024-57137-4},
- url = {https://www.merl.com/publications/TR2024-035}
- }
, - "Multi-level Reasoning for Robotic Assembly: From Sequence Inference to Contact Selection", IEEE International Conference on Robotics and Automation (ICRA), March 2024.BibTeX TR2024-033 PDF Video
- @inproceedings{Zhu2024mar,
- author = {Zhu, Xinghao and Jha, Devesh K. and Romeres, Diego and Sun, Lingfeng and Tomizuka, Masayoshi and Cherian, Anoop},
- title = {Multi-level Reasoning for Robotic Assembly: From Sequence Inference to Contact Selection},
- booktitle = {IEEE International Conference on Robotics and Automation (ICRA)},
- year = 2024,
- month = mar,
- url = {https://www.merl.com/publications/TR2024-033}
- }
, - "Oriented-grid Encoder for 3D Implicit Representations", International Conference on 3D Vision (3DV), March 2024.BibTeX TR2024-031 PDF
- @inproceedings{Gaur2024mar,
- author = {Gaur, Arihant and Pais, Goncalo and Miraldo, Pedro},
- title = {Oriented-grid Encoder for 3D Implicit Representations},
- booktitle = {International Conference on 3D Vision (3DV)},
- year = 2024,
- month = mar,
- url = {https://www.merl.com/publications/TR2024-031}
- }
,
- "Long-Tailed Anomaly Detection with Learnable Class Names", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2024.
-
Videos
-
Software & Data Downloads
-
Long-Tailed Anomaly Detection (LTAD) Dataset -
Pixel-Grounded Prototypical Part Networks -
DeepBornFNO -
BAyesian Network for adaptive SAmple Consensus -
Simple Multimodal Algorithmic Reasoning Task Dataset -
SOurce-free Cross-modal KnowledgE Transfer -
Audio-Visual-Language Embodied Navigation in 3D Environments -
Nonparametric Score Estimators -
3D MOrphable STyleGAN -
Instance Segmentation GAN -
Audio Visual Scene-Graph Segmentor -
Generalized One-class Discriminative Subspaces -
Hierarchical Musical Instrument Separation -
Generating Visual Dynamics from Sound and Context -
Adversarially-Contrastive Optimal Transport -
Online Feature Extractor Network -
MotionNet -
FoldingNet++ -
Quasi-Newton Trust Region Policy Optimization -
Landmarks’ Location, Uncertainty, and Visibility Likelihood -
Robust Iterative Data Estimation -
Gradient-based Nikaido-Isoda -
Circular Maze Environment -
Discriminative Subspace Pooling -
Kernel Correlation Network -
Fast Resampling on Point Clouds via Graphs -
FoldingNet -
Deep Category-Aware Semantic Edge Detection -
MERL Shopping Dataset -
Partial Group Convolutional Neural Networks
-