-
SA0191: Human-Robot Interaction Based on Multimodal Scene Understanding
We are looking for a graduate student interested in advancing the field of multimodal scene understanding, focusing on scene understanding using natural language for robot dialog and/or indoor monitoring with a large language model. The intern will collaborate with MERL researchers to derive and implement new models and optimization methods, conduct experiments, and prepare results for publication. Internships regularly lead to one or more publications in top-tier venues, which can later become part of the intern's doctoral work. The ideal candidates are senior Ph.D. students with experience in deep learning for audio-visual, signal, and natural language processing. Good programming skills in Python and knowledge of deep learning frameworks such as PyTorch are essential. Multiple positions are available with a flexible start date (not just Spring/Summer but throughout 2026) and duration (typically 3-6 months).
Required Specific Experience
- Experience with ROS2, C/C++, Python, and deep learning frameworks such as PyTorch are essential.
The pay range for this internship position will be 6-8K per month.
- Research Areas: Artificial Intelligence, Machine Learning, Robotics, Speech & Audio
- Host: Chiori Hori
- Apply Now
-
SA0187: Internship - Sound event and anomaly detection
We are seeking graduate students interested in helping advance the fields of machine sound source separation, sound event detection/localization, anomaly detection, and physics informed deep learning for machine sounds in extremely noisy conditions. The interns will collaborate with MERL researchers to derive and implement novel algorithms, record data, conduct experiments, integrate audio signals with other sensors (electrical, vision, vibration, etc.), and prepare results for publication. Internships regularly lead to one or more publications in top-tier venues, which can later become part of the intern's doctoral work.
The ideal candidates are senior Ph.D. students with experience in some of the following: audio signal processing, audio source separation (music, speech, or general sounds), microphone array processing, sound event localization and detection, anomaly detection, and physics informed machine learning.
Multiple positions are available with flexible start dates (not just Spring/Summer but throughout 2026) and duration (typically 3-6 months).
The pay range for this internship position will be6-8K per month.
- Research Areas: Speech & Audio, Signal Processing, Machine Learning, Artificial Intelligence
- Host: Gordon Wichern
- Apply Now
-
SA0186: Internship - Neural Spatial Audio Processing and Understanding
We are seeking graduate students interested in advancing the fields of spatial audio, room acoustics, physics informed machine learning, and scene understanding (e.g., sound source localization and spatial-aware captioning). The interns will work closely with MERL researchers to develop novel algorithms, record data, conduct experiments, and prepare results for publication. Internships regularly lead to one or more publications in top-tier venues, which can later become part of the intern's doctoral work. The ideal candidates are senior Ph.D. students with experience in some of the following: microphone array processing, physics informed machine learning, and 3D modeling in computer vision. Multiple positions are available with flexible start date (not just Spring/Summer but throughout 2026) and duration (typically 3-6 months).
The pay range for this internship position will be6-8K per month..
- Research Areas: Speech & Audio, Machine Learning, Signal Processing
- Host: Yoshiki Masuyama
- Apply Now
-
SA0188: Internship - Audio separation, generation, and analysis
We are seeking graduate students interested in helping advance the fields of generative audio, source separation, speech enhancement, and robust ASR in challenging multi-source and far-field scenarios. The interns will collaborate with MERL researchers to derive and implement new models and optimization methods, conduct experiments, and prepare results for publication. Internships regularly lead to one or more publications in top-tier venues, which can later become part of the intern's doctoral work.
The ideal candidates are senior Ph.D. students with experience in some of the following: audio signal processing, microphone array processing, probabilistic modeling, and deep generative modeling.
Multiple positions are available with flexible start dates (not just Spring/Summer but throughout 2026) and duration (typically 3-6 months).
The pay range for this internship position will be 6-8K per month.
- Research Areas: Speech & Audio, Machine Learning, Artificial Intelligence
- Host: Jonathan Le Roux
- Apply Now