Machine Learning
Data-driven approaches to design intelligent algorithms.
MERL has a long history of research activity in machine learning, including the development of various boosting algorithms and contributing to the theory and practice of highly scalable collaborative filtering. Our recent work has focused on deep learning and reinforcement learning, with application to a wide range of applications including automotive, robotics, factory automation, transportation, as well as building and home systems.
Quick Links
-
Researchers
Toshiaki
Koike-Akino
Jonathan
Le Roux
Ye
Wang
Ankush
Chakrabarty
Anoop
Cherian
Gordon
Wichern
Philip V.
Orlik
Michael J.
Jones
Tim K.
Marks
Daniel N.
Nikovski
Kieran
Parsons
Stefano
Di Cairano
Devesh K.
Jha
Chiori
Hori
Christopher R.
Laughman
Diego
Romeres
Karl
Berntorp
Pu
(Perry)
WangYebin
Wang
Mouhacine
Benosman
Bingnan
Wang
Suhas
Lohit
Hassan
Mansour
Matthew
Brand
Arvind
Raghunathan
Petros T.
Boufounos
Moitreya
Chatterjee
Jianlin
Guo
Siddarth
Jain
Kuan-Chuan
Peng
Abraham P.
Vinod
William S.
Yerazunis
Scott A.
Bortoff
Radu
Corcodel
Vedang M.
Deshpande
Dehong
Liu
Saviz
Mowlavi
Hongtao
Qiao
Hongbo
Sun
Wataru
Tsujita
Chungwei
Lin
Jing
Liu
François
Germain
Sameer
Khurana
Koon Hoo
Teo
Anthony
Vetro
Jinyun
Zhang
Jose
Amaya
Abraham
Goldsmith
Yanting
Ma
Pedro
Miraldo
Avishai
Weiss
Janek
Ebbers
Ryo
Hase
Zexu
Pan
Shinya
Tsuruta
Ryoma
Yataka
-
Awards
-
AWARD Jonathan Le Roux elevated to IEEE Fellow Date: January 1, 2024
Awarded to: Jonathan Le Roux
MERL Contact: Jonathan Le Roux
Research Areas: Artificial Intelligence, Machine Learning, Speech & AudioBrief- MERL Distinguished Scientist and Speech & Audio Senior Team Leader Jonathan Le Roux has been elevated to IEEE Fellow, effective January 2024, "for contributions to multi-source speech and audio processing."
Mitsubishi Electric celebrated Dr. Le Roux's elevation and that of another researcher from the company, Dr. Shumpei Kameyama, with a worldwide news release on February 15.
Dr. Jonathan Le Roux has made fundamental contributions to the field of multi-speaker speech processing, especially to the areas of speech separation and multi-speaker end-to-end automatic speech recognition (ASR). His contributions constituted a major advance in realizing a practically usable solution to the cocktail party problem, enabling machines to replicate humans’ ability to concentrate on a specific sound source, such as a certain speaker within a complex acoustic scene—a long-standing challenge in the speech signal processing community. Additionally, he has made key contributions to the measures used for training and evaluating audio source separation methods, developing several new objective functions to improve the training of deep neural networks for speech enhancement, and analyzing the impact of metrics used to evaluate the signal reconstruction quality. Dr. Le Roux’s technical contributions have been crucial in promoting the widespread adoption of multi-speaker separation and end-to-end ASR technologies across various applications, including smart speakers, teleconferencing systems, hearables, and mobile devices.
IEEE Fellow is the highest grade of membership of the IEEE. It honors members with an outstanding record of technical achievements, contributing importantly to the advancement or application of engineering, science and technology, and bringing significant value to society. Each year, following a rigorous evaluation procedure, the IEEE Fellow Committee recommends a select group of recipients for elevation to IEEE Fellow. Less than 0.1% of voting members are selected annually for this member grade elevation.
- MERL Distinguished Scientist and Speech & Audio Senior Team Leader Jonathan Le Roux has been elevated to IEEE Fellow, effective January 2024, "for contributions to multi-source speech and audio processing."
-
AWARD Honorable Mention Award at NeurIPS 23 Instruction Workshop Date: December 15, 2023
Awarded to: Lingfeng Sun, Devesh K. Jha, Chiori Hori, Siddharth Jain, Radu Corcodel, Xinghao Zhu, Masayoshi Tomizuka and Diego Romeres
MERL Contacts: Radu Corcodel; Chiori Hori; Siddarth Jain; Devesh K. Jha; Diego Romeres
Research Areas: Artificial Intelligence, Machine Learning, RoboticsBrief- MERL Researchers received an "Honorable Mention award" at the Workshop on Instruction Tuning and Instruction Following at the NeurIPS 2023 conference in New Orleans. The workshop was on the topic of instruction tuning and Instruction following for Large Language Models (LLMs). MERL researchers presented their work on interactive planning using LLMs for partially observable robotic tasks during the oral presentation session at the workshop.
-
AWARD MERL team wins the Audio-Visual Speech Enhancement (AVSE) 2023 Challenge Date: December 16, 2023
Awarded to: Zexu Pan, Gordon Wichern, Yoshiki Masuyama, Francois Germain, Sameer Khurana, Chiori Hori, and Jonathan Le Roux
MERL Contacts: François Germain; Chiori Hori; Sameer Khurana; Jonathan Le Roux; Zexu Pan; Gordon Wichern
Research Areas: Artificial Intelligence, Machine Learning, Speech & AudioBrief- MERL's Speech & Audio team ranked 1st out of 12 teams in the 2nd COG-MHEAR Audio-Visual Speech Enhancement Challenge (AVSE). The team was led by Zexu Pan, and also included Gordon Wichern, Yoshiki Masuyama, Francois Germain, Sameer Khurana, Chiori Hori, and Jonathan Le Roux.
The AVSE challenge aims to design better speech enhancement systems by harnessing the visual aspects of speech (such as lip movements and gestures) in a manner similar to the brain’s multi-modal integration strategies. MERL’s system was a scenario-aware audio-visual TF-GridNet, that incorporates the face recording of a target speaker as a conditioning factor and also recognizes whether the predominant interference signal is speech or background noise. In addition to outperforming all competing systems in terms of objective metrics by a wide margin, in a listening test, MERL’s model achieved the best overall word intelligibility score of 84.54%, compared to 57.56% for the baseline and 80.41% for the next best team. The Fisher’s least significant difference (LSD) was 2.14%, indicating that our model offered statistically significant speech intelligibility improvements compared to all other systems.
- MERL's Speech & Audio team ranked 1st out of 12 teams in the 2nd COG-MHEAR Audio-Visual Speech Enhancement Challenge (AVSE). The team was led by Zexu Pan, and also included Gordon Wichern, Yoshiki Masuyama, Francois Germain, Sameer Khurana, Chiori Hori, and Jonathan Le Roux.
See All Awards for Machine Learning -
-
News & Events
-
TALK [MERL Seminar Series 2024] Melanie Mitchell presents talk titled "The Debate Over 'Understanding' in AI's Large Language Models" Date & Time: Tuesday, February 13, 2024; 1:00 PM
Speaker: Melanie Mitchell, Santa Fe Institute
MERL Host: Suhas Lohit
Research Areas: Artificial Intelligence, Computer Vision, Machine Learning, Human-Computer InteractionAbstract- I will survey a current, heated debate in the AI research community on whether large pre-trained language models can be said to "understand" language -- and the physical and social situations language encodes -- in any important sense. I will describe arguments that have been made for and against such understanding, and, more generally, will discuss what methods can be used to fairly evaluate understanding and intelligence in AI systems. I will conclude with key questions for the broader sciences of intelligence that have arisen in light of these discussions.
-
TALK [MERL Seminar Series 2024] Greta Tuckute presents talk titled Computational models of human auditory and language processing Date & Time: Wednesday, January 31, 2024; 12:00 PM
Speaker: Greta Tuckute, MIT
MERL Host: Sameer Khurana
Research Areas: Artificial Intelligence, Machine Learning, Speech & AudioAbstract- Advances in machine learning have led to powerful models for audio and language, proficient in tasks like speech recognition and fluent language generation. Beyond their immense utility in engineering applications, these models offer valuable tools for cognitive science and neuroscience. In this talk, I will demonstrate how these artificial neural network models can be used to understand how the human brain processes language. The first part of the talk will cover how audio neural networks serve as computational accounts for brain activity in the auditory cortex. The second part will focus on the use of large language models, such as those in the GPT family, to non-invasively control brain activity in the human language system.
See All News & Events for Machine Learning -
-
Research Highlights
-
Internships
-
CA2132: Optimization Algorithms for Motion Planning and Predictive Control
MERL is looking for a highly motivated and qualified individual to work on tailored computational algorithms for optimization-based motion planning and predictive control applications in autonomous systems (vehicles, mobile robots). The ideal candidate should have experience in either one or multiple of the following topics: convex and non-convex optimization, stochastic predictive control (e.g., scenario trees), interaction-aware motion planning, machine learning, learning-based model predictive control, mathematical programs with complementarity constraints (MPCCs), optimal control, and real-time optimization. PhD students in engineering or mathematics, especially with a focus on research related to any of the above topics are encouraged to apply. Publication of relevant results in conference proceedings or journals is expected. Capability of implementing the designs and algorithms in MATLAB/Python is required; coding parts of the algorithms in C/C++ is a plus. The expected duration of the internship is 3 months, and the start date is flexible.
-
OR2116: Collaborative robotic manipulation
MERL is offering a new research internship opportunity in the field of robotic manipulation. The position requires a robotics background, excellent programming skills and experience with Deep RL and Computer Vision. The position is open to graduate students on a PhD track only, and the length of the internship is three months with the possibility of extending if required. The intern is expected to disseminate this research in top tier scientific conferences such as RSS, IROS, ICRA etc., and if applicable, help with filing associated patents. Start and end dates are flexible.
-
SA2114: Multilayer broadband metalenses
MERL is seeking a talented researcher to collaborate in the development of design algorithms for metalenses that are freeform, multilayer, and broadband. The ideal applicant will have a strong background in the relevant physics & maths, and has some fluency with the topology optimization and EM simulation tools commonly used in metasurface optics. Also desirable: familiarity with machine learning / AI tools and methods.
See All Internships for Machine Learning -
-
Openings
-
OR2137: Research Scientist - Optimization & Intelligent Robotics
-
EA2051: Research Scientist - Electric Systems Automation
See All Openings at MERL -
-
Recent Publications
- "Object Trajectory Estimation with Multi-Band Wi-Fi Neural Dynamic Fusion", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), March 2024.BibTeX TR2024-019 PDF
- @inproceedings{Kato2024mar,
- author = {Kato, Sorachi and Wang, Pu and Koike-Akino, Toshiaki and Fujihashi, Takuya and Mansour, Hassan and Boufounos, Petros T.},
- title = {Object Trajectory Estimation with Multi-Band Wi-Fi Neural Dynamic Fusion},
- booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
- year = 2024,
- month = mar,
- url = {https://www.merl.com/publications/TR2024-019}
- }
, - "Wi-Fi based Indoor Monitoring Enhanced by Multimodal Fusion", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), March 2024.BibTeX TR2024-012 PDF
- @inproceedings{Hori2024mar,
- author = {Hori, Chiori and Wang, Pu and Rahman, Mahbub and Vaca-Rubio, Cristian and Khurana, Sameer and Cherian, Anoop and Le Roux, Jonathan},
- title = {Wi-Fi based Indoor Monitoring Enhanced by Multimodal Fusion},
- booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
- year = 2024,
- month = mar,
- url = {https://www.merl.com/publications/TR2024-012}
- }
, - "Why Does Differential Privacy with Large ε Defend Against Practical Membership Inference Attacks?", AAAI Workshop on Privacy-Preserving Artificial Intelligence, February 2024.BibTeX TR2024-009 PDF
- @inproceedings{Lowy2024feb2,
- author = {Lowy, Andrew and Li, Zhuohang and Liu, Jing and Koike-Akino, Toshiaki and Parsons, Kieran and Wang, Ye},
- title = {Why Does Differential Privacy with Large ε Defend Against Practical Membership Inference Attacks?},
- booktitle = {AAAI Workshop on Privacy-Preserving Artificial Intelligence},
- year = 2024,
- month = feb,
- url = {https://www.merl.com/publications/TR2024-009}
- }
, - "TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker Embeddings", IEEE/ACM Transactions on Audio, Speech, and Language Processing, DOI: 10.1109/TASLP.2024.3350887, Vol. 32, pp. 1185-1197, February 2024.BibTeX TR2024-006 PDF
- @article{Boeddeker2024feb,
- author = {Boeddeker, Christoph and Subramanian, Aswin Shanmugam and Wichern, Gordon and Haeb-Umbach, Reinhold and Le Roux, Jonathan},
- title = {TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker Embeddings},
- journal = {IEEE/ACM Transactions on Audio, Speech, and Language Processing},
- year = 2024,
- volume = 32,
- pages = {1185--1197},
- month = feb,
- doi = {10.1109/TASLP.2024.3350887},
- issn = {2329-9304},
- url = {https://www.merl.com/publications/TR2024-006}
- }
, - "Pixel-Grounded Prototypical Part Networks", IEEE Winter Conference on Applications of Computer Vision (WACV), January 2024.BibTeX TR2024-002 PDF Presentation
- @inproceedings{Carmichael2024jan,
- author = {Carmichael, Zachariah and Jones, Lohit, Suhas and Cherian, Anoop and Michael J. and Scheirer, Walter},
- title = {Pixel-Grounded Prototypical Part Networks},
- booktitle = {IEEE Winter Conference on Applications of Computer Vision (WACV)},
- year = 2024,
- month = jan,
- url = {https://www.merl.com/publications/TR2024-002}
- }
, - "CAVEN: An Embodied Conversational Agent for Efficient Audio-Visual Navigation in Noisy Environments", AAAI Conference on Artificial Intelligence, December 2023.BibTeX TR2023-154 PDF
- @inproceedings{Liu2023dec2,
- author = {Liu, Xiulong and Paul, Sudipta and Chatterjee, Moitreya and Cherian, Anoop},
- title = {CAVEN: An Embodied Conversational Agent for Efficient Audio-Visual Navigation in Noisy Environments},
- booktitle = {AAAI Conference on Artificial Intelligence},
- year = 2023,
- month = dec,
- url = {https://www.merl.com/publications/TR2023-154}
- }
, - "Stochastic Learning Manipulation of Object Pose With Under-Actuated Impulse Generator Arrays", International Conference on Machine Learning and Applications (ICMLA), DOI: 10.1109/ICMLA58977.2023.00024, December 2023, pp. 112-119.BibTeX TR2023-151 PDF
- @inproceedings{Kong2023dec,
- author = {Kong, Chuizheng and Yerazunis, William S. and Nikovski, Daniel},
- title = {Stochastic Learning Manipulation of Object Pose With Under-Actuated Impulse Generator Arrays},
- booktitle = {International Conference on Machine Learning and Applications (ICMLA)},
- year = 2023,
- pages = {112--119},
- month = dec,
- doi = {10.1109/ICMLA58977.2023.00024},
- url = {https://www.merl.com/publications/TR2023-151}
- }
, - "LoDA: Low-Dimensional Adaptation of Large Language Models", Advances in Neural Information Processing Systems (NeurIPS) workshop, December 2023.BibTeX TR2023-150 PDF
- @inproceedings{Liu2023dec,
- author = {Liu, Jing and Koike-Akino, Toshiaki and Wang, Pu and Brand, Matthew and Wang, Ye and Parsons, Kieran},
- title = {LoDA: Low-Dimensional Adaptation of Large Language Models},
- booktitle = {Advances in Neural Information Processing Systems (NeurIPS) workshop},
- year = 2023,
- month = dec,
- url = {https://www.merl.com/publications/TR2023-150}
- }
,
- "Object Trajectory Estimation with Multi-Band Wi-Fi Neural Dynamic Fusion", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), March 2024.
-
Videos
-
Software & Data Downloads
-
DeepBornFNO -
Pixel-Grounded Prototypical Part Networks -
BAyesian Network for adaptive SAmple Consensus -
Simple Multimodal Algorithmic Reasoning Task Dataset -
SOurce-free Cross-modal KnowledgE Transfer -
Audio-Visual-Language Embodied Navigation in 3D Environments -
Nonparametric Score Estimators -
Instance Segmentation GAN -
Audio Visual Scene-Graph Segmentor -
Generalized One-class Discriminative Subspaces -
Hierarchical Musical Instrument Separation -
Generating Visual Dynamics from Sound and Context -
Adversarially-Contrastive Optimal Transport -
Online Feature Extractor Network -
MotionNet -
FoldingNet++ -
Quasi-Newton Trust Region Policy Optimization -
Landmarks’ Location, Uncertainty, and Visibility Likelihood -
Robust Iterative Data Estimation -
Gradient-based Nikaido-Isoda -
Circular Maze Environment -
Discriminative Subspace Pooling -
Kernel Correlation Network -
Fast Resampling on Point Clouds via Graphs -
FoldingNet -
Deep Category-Aware Semantic Edge Detection -
MERL Shopping Dataset -
Partial Group Convolutional Neural Networks
-