Artificial Intelligence
Making machines smarter for improved safety, efficiency and comfort.
Our AI research encompasses advances in computer vision, speech and audio processing, as well as data analytics. Key research themes include improved perception based on machine learning techniques, learning control policies through model-based reinforcement learning, as well as cognition and reasoning based on learned semantic representations. We apply our work to a broad range of automotive and robotics applications, as well as building and home systems.
Quick Links
-
Researchers
Jonathan
Le Roux
Toshiaki
Koike-Akino
Ye
Wang
Gordon
Wichern
Anoop
Cherian
Tim K.
Marks
Chiori
Hori
Michael J.
Jones
Daniel N.
Nikovski
Kieran
Parsons
Devesh K.
Jha
Philip V.
Orlik
Suhas
Lohit
Petros T.
Boufounos
Matthew
Brand
Hassan
Mansour
Diego
Romeres
Siddarth
Jain
William S.
Yerazunis
Moitreya
Chatterjee
Francois
Germain
Pu
(Perry)
WangMouhacine
Benosman
Kuan-Chuan
Peng
Arvind
Raghunathan
Radu
Corcodel
Hongbo
Sun
Yebin
Wang
Jianlin
Guo
Chungwei
Lin
Yanting
Ma
Bingnan
Wang
Stefano
Di Cairano
Anthony
Vetro
Jinyun
Zhang
Jose
Amaya
Karl
Berntorp
Ankush
Chakrabarty
Vedang M.
Deshpande
Marcus
Greiff
Sameer
Khurana
Dehong
Liu
Wataru
Tsujita
Abraham P.
Vinod
Ryo
Hase
Jing
Liu
Zexu
Pan
James
Queeney
Shinya
Tsuruta
Ryoma
Yataka
-
Awards
-
AWARD Joint University of Padua-MERL team wins Challenge 'AI Olympics With RealAIGym' Date: August 25, 2023
Awarded to: Alberto Dalla Libera, Niccolo' Turcato, Giulio Giacomuzzo, Ruggero Carli, Diego Romeres
MERL Contact: Diego Romeres
Research Areas: Artificial Intelligence, Machine Learning, RoboticsBrief- A joint team consisting of members of University of Padua and MERL ranked 1st in the IJCAI2023 Challenge "Al Olympics With RealAlGym: Is Al Ready for Athletic Intelligence in the Real World?". The team was composed by MERL researcher Diego Romeres and a team from University Padua (UniPD) consisting of Alberto Dalla Libera, Ph.D., Ph.D. Candidates: Niccolò Turcato, Giulio Giacomuzzo and Prof. Ruggero Carli from University of Padua.
The International Joint Conference on Artificial Intelligence (IJCAI) is a premier gathering for AI researchers and organizes several competitions. This year the competition CC7 "AI Olympics With RealAIGym: Is AI Ready for Athletic Intelligence in the Real World?" consisted of two stages: simulation and real-robot experiments on two under-actuated robotic systems. The two robotics systems were treated as separate tracks and one final winner was selected for each track based on specific performance criteria in the control tasks.
The UniPD-MERL team competed and won in both tracks. The team's system made strong use of a Model-based Reinforcement Learning algorithm called (MC-PILCO) that we recently published in the journal IEEE Transaction on Robotics.
- A joint team consisting of members of University of Padua and MERL ranked 1st in the IJCAI2023 Challenge "Al Olympics With RealAlGym: Is Al Ready for Athletic Intelligence in the Real World?". The team was composed by MERL researcher Diego Romeres and a team from University Padua (UniPD) consisting of Alberto Dalla Libera, Ph.D., Ph.D. Candidates: Niccolò Turcato, Giulio Giacomuzzo and Prof. Ruggero Carli from University of Padua.
-
AWARD MERL Intern and Researchers Win ICASSP 2023 Best Student Paper Award Date: June 9, 2023
Awarded to: Darius Petermann, Gordon Wichern, Aswin Subramanian, Jonathan Le Roux
MERL Contacts: Jonathan Le Roux; Gordon Wichern
Research Areas: Artificial Intelligence, Machine Learning, Speech & AudioBrief- Former MERL intern Darius Petermann (Ph.D. Candidate at Indiana University) has received a Best Student Paper Award at the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2023) for the paper "Hyperbolic Audio Source Separation", co-authored with MERL researchers Gordon Wichern and Jonathan Le Roux, and former MERL researcher Aswin Subramanian. The paper presents work performed during Darius's internship at MERL in the summer 2022. The paper introduces a framework for audio source separation using embeddings on a hyperbolic manifold that compactly represent the hierarchical relationship between sound sources and time-frequency features. Additionally, the code associated with the paper is publicly available at https://github.com/merlresearch/hyper-unmix.
ICASSP is the flagship conference of the IEEE Signal Processing Society (SPS). ICASSP 2023 was held in the Greek island of Rhodes from June 04 to June 10, 2023, and it was the largest ICASSP in history, with more than 4000 participants, over 6128 submitted papers and 2709 accepted papers. Darius’s paper was first recognized as one of the Top 3% of all papers accepted at the conference, before receiving one of only 5 Best Student Paper Awards during the closing ceremony.
- Former MERL intern Darius Petermann (Ph.D. Candidate at Indiana University) has received a Best Student Paper Award at the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2023) for the paper "Hyperbolic Audio Source Separation", co-authored with MERL researchers Gordon Wichern and Jonathan Le Roux, and former MERL researcher Aswin Subramanian. The paper presents work performed during Darius's internship at MERL in the summer 2022. The paper introduces a framework for audio source separation using embeddings on a hyperbolic manifold that compactly represent the hierarchical relationship between sound sources and time-frequency features. Additionally, the code associated with the paper is publicly available at https://github.com/merlresearch/hyper-unmix.
-
AWARD MERL’s Paper on Wi-Fi Sensing Earns Top 3% Paper Recognition at ICASSP 2023, Selected as a Best Student Paper Award Finalist Date: June 9, 2023
Awarded to: Cristian J. Vaca-Rubio, Pu Wang, Toshiaki Koike-Akino, Ye Wang, Petros Boufounos and Petar Popovski
MERL Contacts: Petros T. Boufounos; Toshiaki Koike-Akino; Pu (Perry) Wang; Ye Wang
Research Areas: Artificial Intelligence, Communications, Computational Sensing, Dynamical Systems, Machine Learning, Signal ProcessingBrief- A MERL Paper on Wi-Fi sensing was recognized as a Top 3% Paper among all 2709 accepted papers at the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2023). Co-authored by Cristian Vaca-Rubio and Petar Popovski from Aalborg University, Denmark, and MERL researchers Pu Wang, Toshiaki Koike-Akino, Ye Wang, and Petros Boufounos, the paper "MmWave Wi-Fi Trajectory Estimation with Continous-Time Neural Dynamic Learning" was also a Best Student Paper Award finalist.
Performed during Cristian’s stay at MERL first as a visiting Marie Skłodowska-Curie Fellow and then as a full-time intern in 2022, this work capitalizes on standards-compliant Wi-Fi signals to perform indoor localization and sensing. The paper uses a neural dynamic learning framework to address technical issues such as low sampling rate and irregular sampling intervals.
ICASSP, a flagship conference of the IEEE Signal Processing Society (SPS), was hosted on the Greek island of Rhodes from June 04 to June 10, 2023. ICASSP 2023 marked the largest ICASSP in history, boasting over 4000 participants and 6128 submitted papers, out of which 2709 were accepted.
- A MERL Paper on Wi-Fi sensing was recognized as a Top 3% Paper among all 2709 accepted papers at the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2023). Co-authored by Cristian Vaca-Rubio and Petar Popovski from Aalborg University, Denmark, and MERL researchers Pu Wang, Toshiaki Koike-Akino, Ye Wang, and Petros Boufounos, the paper "MmWave Wi-Fi Trajectory Estimation with Continous-Time Neural Dynamic Learning" was also a Best Student Paper Award finalist.
See All Awards for Artificial Intelligence -
-
News & Events
-
NEWS MERL Researchers give a Tutorial Talk on Quantum Machine Learning for Sensing and Communications at IEEE VCC Date: November 28, 2023 - November 30, 2023
Where: Virtual
MERL Contacts: Toshiaki Koike-Akino; Pu (Perry) Wang
Research Areas: Artificial Intelligence, Communications, Computational Sensing, Machine Learning, Signal ProcessingBrief- On November 28, 2023, MERL researchers Toshiaki Koike-Akino and Pu (Perry) Wang will give a 3-hour tutorial presentation at the first IEEE Virtual Conference on Communications (VCC). The talk, titled "Post-Deep Learning Era: Emerging Quantum Machine Learning for Sensing and Communications," addresses recent trends, challenges, and advances in sensing and communications. P. Wang presents use cases, industry trends, signal processing, and deep learning for Wi-Fi integrated sensing and communications (ISAC), while T. Koike-Akino discusses the future of deep learning, giving a comprehensive overview of artificial intelligence (AI) technologies, natural computing, emerging quantum AI, and their diverse applications. The tutorial is conducted virtually.
IEEE VCC is a new fully virtual conference launched from the IEEE Communications Society, gathering researchers from academia and industry who are unable to travel but wish to present their recent scientific results and engage in conducive interactive discussions with fellow researchers working in their fields. It is designed to resolve potential hardship such as pandemic restrictions, visa issues, travel problems, or financial difficulties.
- On November 28, 2023, MERL researchers Toshiaki Koike-Akino and Pu (Perry) Wang will give a 3-hour tutorial presentation at the first IEEE Virtual Conference on Communications (VCC). The talk, titled "Post-Deep Learning Era: Emerging Quantum Machine Learning for Sensing and Communications," addresses recent trends, challenges, and advances in sensing and communications. P. Wang presents use cases, industry trends, signal processing, and deep learning for Wi-Fi integrated sensing and communications (ISAC), while T. Koike-Akino discusses the future of deep learning, giving a comprehensive overview of artificial intelligence (AI) technologies, natural computing, emerging quantum AI, and their diverse applications. The tutorial is conducted virtually.
-
NEWS Anoop Cherian gives a podcast interview with AI Business Date: September 26, 2023
Where: Virtual
MERL Contact: Anoop Cherian
Research Areas: Artificial Intelligence, Computer Vision, Machine LearningBrief- Anoop Cherian, a Senior Principal Research Scientist in the Computer Vision team at MERL, gave a podcast interview with award-winning journalist, Deborah Yao. Deborah is the editor of AI Business -- a leading content platform for artificial intelligence and its applications in the real world, delivering its readers up-to-the-minute insights into how AI technologies are currently affecting the global economy and society. The podcast was based on the recent research that Anoop and his colleagues did at MERL with his collaborators at MIT; this research attempts to objectively answer the pertinent question: are current deep neural networks smarter than second graders? The podcast discusses shortcomings in the recent artificial general intelligence systems with regard to their capabilities for knowledge abstraction, learning, and generalization, which are brought out by this research.
See All News & Events for Artificial Intelligence -
-
Research Highlights
-
Internships
-
CI2075: Human-Machine Interface with Biosignal Processing
MERL is seeking an intern to work on research for human-machine interface with multi-modal bio-sensors. The ideal candidate is an experienced PhD student or post-graduate researcher having an excellent background in brain-machine interface (BMI), deep learning, mixed reality (XR), remote robot manipulation, bionics, and bio sensing. The expected duration of the internship is 3-6 months, with a flexible start date.
-
OR2110: Shared Autonomy for Human-Robot Interaction
MERL is looking for a highly motivated and qualified intern to work on human-robot interaction (HRI) research. The ideal candidate would be a Ph.D. student with a strong background in HRI, focusing on robotic manipulation, deep learning, probabilistic modeling, or reinforcement learning. Several topics are available for consideration, including Intent Recognition in Multi-Object Scenes, Shared Autonomy, Cooperative Manipulation, Human-Robot Handovers, and Representation Learning for HRI. Experience working with robotics hardware and physics engine simulators like PyBullet, Issac Gym, or Mujoco is preferred. Proficiency in Python programming is necessary, and experience with ROS is a plus. The successful candidate will collaborate with MERL researchers, and publication of the relevant results is expected. The start date is flexible, and the expected duration of the internship is 3-4 months. Interested candidates are encouraged to apply with their recent CV and list of publications in related topics.
-
OR2111: Deep Learning for Robotic Manipulation
MERL is seeking a highly motivated and qualified intern to work on deep learning for visual feedback in robotic manipulation. The ideal candidate would be a Ph.D. student with a strong background in deep learning and robotic manipulation. Several topics are available for consideration, including Object Pose Estimation, Goal-driven Grasping, Diffusion policy for Industrial Tasks, and Deformable Object Manipulation. The project requires the development of novel algorithms with implementation and evaluation on a robotic platform. Preferred qualifications include experience working with a physics engine simulator like PyBullet, Isaac Gym, or Mujoco, proficiency in Python programming, and experience with ROS. The successful candidate will collaborate with MERL researchers, and publication of relevant results is expected. The start date is flexible, and the expected duration of the internship is 3-4 months. Interested candidates are encouraged to apply with their recent CV and a list of publications in related topics
See All Internships for Artificial Intelligence -
-
Recent Publications
- "On the Use of Pretrained Deep Audio Encoders for Automated Audio Captioning Tasks", International Symposium on Future Active Safety Technology toward zero traffic accidents (FAST-zero), November 2023.BibTeX TR2023-141 PDF
- @inproceedings{Wu2023nov,
- author = {Wu, Shih-Lun and Chang, Xuankai and Wichern, Gordon and Jung, Jee-weon and Germain, François G and Le Roux, Jonathan and Watanabe, Shinji},
- title = {On the Use of Pretrained Deep Audio Encoders for Automated Audio Captioning Tasks},
- booktitle = {International Symposium on Future Active Safety Technology toward zero traffic accidents (FAST-zero)},
- year = 2023,
- month = nov,
- url = {https://www.merl.com/publications/TR2023-141}
- }
, - "Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional Image Synthesis", IEEE International Conference on Computer Vision (ICCV), October 2023.BibTeX TR2023-126 PDF Presentation
- @inproceedings{Nair2023sep,
- author = {Nair, Nithin Gopalakrishnan and Cherian, Anoop and Lohit, Suhas and Wang, Ye and Koike-Akino, Toshiaki and Patel, Vishal M. and Marks, Tim K.},
- title = {Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional Image Synthesis},
- booktitle = {IEEE International Conference on Computer Vision (ICCV)},
- year = 2023,
- month = oct,
- url = {https://www.merl.com/publications/TR2023-126}
- }
, - "Tensor Factorization for Leveraging Cross-Modal Knowledge in Data-Constrained Infrared Object Detection", IEEE International Conference on Computer Vision Workshops (ICCV), October 2023, pp. 924-932.BibTeX TR2023-125 PDF Presentation
- @inproceedings{Sharma2023oct,
- author = {Sharma, Manish and Chatterjee, Moitreya and Peng, Kuan-Chuan and Lohit, Suhas and Jones, Michael J.},
- title = {Tensor Factorization for Leveraging Cross-Modal Knowledge in Data-Constrained Infrared Object Detection},
- booktitle = {IEEE International Conference on Computer Vision Workshops (ICCV)},
- year = 2023,
- pages = {924--932},
- month = oct,
- url = {https://www.merl.com/publications/TR2023-125}
- }
, - "EARL: Eye-on-Hand Reinforcement Learner for Dynamic Grasping with Active Pose Estimation", 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), October 2023.BibTeX TR2023-118 PDF Video
- @inproceedings{Huang2023oct,
- author = {Huang, Baichuan and Yu, Jingjin and Jain, Siddarth},
- title = {EARL: Eye-on-Hand Reinforcement Learner for Dynamic Grasping with Active Pose Estimation},
- booktitle = {2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
- year = 2023,
- month = oct,
- url = {https://www.merl.com/publications/TR2023-118}
- }
, - "Location as supervision for weakly supervised multi-channel source separation of machine sounds", IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), DOI: 10.1109/WASPAA58266.2023.10248128, September 2023.BibTeX TR2023-119 PDF Presentation
- @inproceedings{FalconPerez2023aug,
- author = {Falcon Perez, Ricardo and Wichern, Gordon and Germain, Francois and Le Roux, Jonathan},
- title = {Location as supervision for weakly supervised multi-channel source separation of machine sounds},
- booktitle = {IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)},
- year = 2023,
- month = sep,
- publisher = {IEEE},
- doi = {10.1109/WASPAA58266.2023.10248128},
- issn = {1947-1629},
- isbn = {979-8-3503-2372-6},
- url = {https://www.merl.com/publications/TR2023-119}
- }
, - "Hyperbolic Unsupervised Anomalous Sound Detection", IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), DOI: 10.1109/WASPAA58266.2023.10248092, September 2023.BibTeX TR2023-108 PDF Video Presentation
- @inproceedings{Germain2023aug,
- author = {Germain, Francois and Wichern, Gordon and Le Roux, Jonathan},
- title = {Hyperbolic Unsupervised Anomalous Sound Detection},
- booktitle = {IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)},
- year = 2023,
- month = sep,
- publisher = {IEEE},
- doi = {10.1109/WASPAA58266.2023.10248092},
- issn = {1947-1629},
- isbn = {979-8-3503-2372-6},
- url = {https://www.merl.com/publications/TR2023-108}
- }
, - "Tackling the Cocktail Fork Problem for Separation and Transcription of Real-World Soundtracks", IEEE/ACM Transactions on Audio, Speech, and Language Processing, DOI: 10.1109/TASLP.2023.3290428, Vol. 31, pp. 2592-2605, September 2023.BibTeX TR2023-113 PDF
- @article{Petermann2023sep,
- author = {Petermann, Darius and Wichern, Gordon and Subramanian, Aswin Shanmugam and Wang, Zhong-Qiu and Le Roux, Jonathan},
- title = {Tackling the Cocktail Fork Problem for Separation and Transcription of Real-World Soundtracks},
- journal = {IEEE/ACM Transactions on Audio, Speech, and Language Processing},
- year = 2023,
- volume = 31,
- pages = {2592--2605},
- month = sep,
- doi = {10.1109/TASLP.2023.3290428},
- issn = {2329-9304},
- url = {https://www.merl.com/publications/TR2023-113}
- }
, - "Overview of the Tenth Dialog System Technology Challenge: DSTC10", IEE/ACM Transactions on Audio, Speech, and Language Processing, DOI: 10.1109/TASLP.2023.3293030, pp. 1-14, August 2023.BibTeX TR2023-109 PDF
- @article{Yoshino2023aug,
- author = {Yoshino, Koichiro and Chen, Yun-Nung and Crook, Paul and Kottur, Satwik and Li, Jinchao and Hedayatnia, Behnam and Moon, Seungwhan and Fe, Zhengcong and Li, Zekang and Zhang, Jinchao and Fen, Yang and Zhou, Jie and Kim, Seokhwan and Liu, Yang and Jin, Di and Papangelis, Alexandros and Gopalakrishnan, Karthik and Hakkani-Tur, Dilek and Damavandi, Babak and Geramifard, Alborz and
Hori, Chiori and Shah, Ankit and Zhang, Chen and Li, Haizhou and Sedoc, João and D’Haro, Luis F. and Banchs, Rafael and Rudnicky, Alexander}, - title = {Overview of the Tenth Dialog System Technology Challenge: DSTC10},
- journal = {IEE/ACM Transactions on Audio, Speech, and Language Processing},
- year = 2023,
- pages = {1--14},
- month = aug,
- doi = {10.1109/TASLP.2023.3293030},
- issn = {2329-9290},
- url = {https://www.merl.com/publications/TR2023-109}
- }
,
- "On the Use of Pretrained Deep Audio Encoders for Automated Audio Captioning Tasks", International Symposium on Future Active Safety Technology toward zero traffic accidents (FAST-zero), November 2023.
-
Videos
[MERL Seminar Series Fall 2023] The Confluence of Vision, Language, and Robotics Are Deep Neural Networks SMARTer than Second Graders? [MERL Seminar Series Spring 2023] Fine-grained wildlife sound recognition: Towards the accuracy of a naturalist [MERL Seminar Series Spring 2023] Pitfalls and Opportunities in Interpretable Machine Learning Human Perspective Scene Understanding via Multimodal Sensing [MERL Seminar Series Spring 2022] Self-Supervised Scene Representation Learning [MERL Seminar Series Spring 2022] Learning Speech Representations with Multimodal Self-Supervision [MERL Seminar Series 2021] Deep probabilistic regression [MERL Seminar Series 2021] Learning to See by Moving: Self-supervising 3D scene representations for perception, control, and visual reasoning [MERL Seminar Series 2021] Look and Listen: From Semantic to Spatial Audio-Visual Perception Application of Deep Learning for Nanophotonic Device Design (Invited) Machine Learning Power Amplifier Scene-Aware Interaction Technology
-
Software & Data Downloads
-
DeepBornFNO -
Hyperbolic Audio Source Separation -
Simple Multimodal Algorithmic Reasoning Task Dataset -
SOurce-free Cross-modal KnowledgE Transfer -
Audio-Visual-Language Embodied Navigation in 3D Environments -
Nonparametric Score Estimators -
Instance Segmentation GAN -
Audio Visual Scene-Graph Segmentor -
Generalized One-class Discriminative Subspaces -
Goal directed RL with Safety Constraints -
Hierarchical Musical Instrument Separation -
Generating Visual Dynamics from Sound and Context -
Adversarially-Contrastive Optimal Transport -
Online Feature Extractor Network -
MotionNet -
FoldingNet++ -
Quasi-Newton Trust Region Policy Optimization -
Landmarks’ Location, Uncertainty, and Visibility Likelihood -
Robust Iterative Data Estimation -
Gradient-based Nikaido-Isoda -
Discriminative Subspace Pooling -
Partial Group Convolutional Neural Networks
-