Internship Openings

SA0302: Internship - Audio Processing for Moving Sounds
- We are seeking graduate students interested in helping advance the understanding of applying sophisticated audio processing techniques (e.g., source separation, localization, anomalous sound detection) to moving sound sources (e.g., vehicles). The interns will collaborate with MERL researchers to derive and implement novel algorithms, record data, conduct experiments, and prepare results for publication. Internships regularly lead to one or more publications in top-tier venues, which can later become part of the intern's doctoral work. The ideal candidates are senior Ph.D. students with experience in some of the following: audio signal processing, microphone array processing, source separation, physics informed machine learning, outlier detection, and unsupervised learning.
  The pay range for this internship position will be 6-8K per month.
- Research Areas: Artificial Intelligence, Speech & Audio, Machine Learning
- Host: Gordon Wichern
- Apply Now
SA0307: Internship - Neural Spatial Audio Processing
- We are seeking graduate students interested in advancing the fields of spatial audio, room acoustics, physics informed machine learning, and scene understanding (e.g., sound source localization and spatial-aware captioning). The interns will work closely with MERL researchers to develop novel algorithms, record data, conduct experiments, and prepare results for publication. Internships regularly lead to one or more publications in top-tier venues, which can later become part of the intern's doctoral work. The ideal candidates are senior Ph.D. students with experience in some of the following: microphone array processing, physics informed machine learning, and 3D modeling in computer vision. Multiple positions are available with flexible start date (not just Spring/Summer but throughout 2026) and duration (typically 3-6 months).
  Required Specific Experience
  - Experience with deep learning frameworks (e.g., PyTorch and JAX)
  - Research experience in spatial audio and/or array signal processing
  The pay range for this internship position will be 6-8K per month.
- Research Area: Speech & Audio
- Host: Yoshiki Masuyama
- Apply Now
SA0191: Internship - Human-Robot Interaction Based on Multimodal Scene Understanding
- We are looking for a graduate student interested in advancing the field of multimodal scene understanding, focusing on scene understanding using natural language for robot dialog and/or indoor monitoring with a large language model. The intern will collaborate with MERL researchers to derive and implement new models and optimization methods, conduct experiments, and prepare results for publication. Internships regularly lead to one or more publications in top-tier venues, which can later become part of the intern's doctoral work. The ideal candidates are senior Ph.D. students with experience in deep learning for audio-visual, signal, and natural language processing. Good programming skills in Python and knowledge of deep learning frameworks such as PyTorch are essential. Multiple positions are available with a flexible start date (not just Spring/Summer but throughout 2026) and duration (typically 3-6 months).
  Required Specific Experience
  - Experience with ROS2, C/C++, Python, and deep learning frameworks such as PyTorch are essential.
  The pay range for this internship position will be 6-8K per month.
- Research Areas: Artificial Intelligence, Machine Learning, Robotics, Speech & Audio
- Host: Chiori Hori
- Apply Now
CV0075: Internship - Multimodal Embodied AI
- MERL is looking for a self-motivated intern to work on problems at the intersection of multimodal large language models and embodied AI in dynamic indoor environments. The ideal candidate would be a PhD student with a strong background in machine learning and computer vision, as demonstrated by top-tier publications. The candidate must have prior experience in designing synthetic scenes (e.g., 3D games) using popular graphics software, embodied AI, large language models, reinforcement learning, and the use of simulators such as Habitat/SoundSpaces. Hands on experience in using animated 3D human shape models (e.g., SMPL and variants) is desired. The intern is expected to collaborate with researchers in computer vision at MERL to develop algorithms and prepare manuscripts for scientific publications.
  Required Specific Experience
  - Experience in designing 3D interactive scenes
  - Experience with vision based embodied AI using simulators (implementation on real robotic hardware would be a plus).
  - Experience training large language models on multimodal data
  - Experience with training reinforcement learning algorithms
  - Strong foundations in machine learning and programming
  - Strong track record of publications in top-tier computer vision and machine learning venues (such as CVPR, NeurIPS, etc.).
- Research Areas: Artificial Intelligence, Computer Vision, Speech & Audio, Robotics, Machine Learning
- Host: Anoop Cherian
- Apply Now