-
SA0044: Internship - Multimodal scene-understanding
We are looking for a graduate student interested in helping advance the field of multimodal scene understanding, focusing on scene understanding using natural language for robot dialog and/or indoor monitoring using a large language model. The intern will collaborate with MERL researchers to derive and implement new models and optimization methods, conduct experiments, and prepare results for publication. Internships regularly lead to one or more publications in top-tier venues, which can later become part of the intern''''s doctoral work. The ideal candidates are senior Ph.D. students with experience in deep learning for audio-visual, signal, and natural language processing. Good programming skills in Python and knowledge of deep learning frameworks such as PyTorch are essential. Multiple positions are available with flexible start date (not just Spring/Summer but throughout 2024) and duration (typically 3-6 months).
- Research Areas: Artificial Intelligence, Computer Vision, Control, Machine Learning, Robotics, Speech & Audio
- Host: Chiori Hori
- Apply Now
-
SA0045: Internship - Universal Audio Compression and Generation
We are seeking graduate students interested in helping advance the fields of universal audio compression and generation. We aim to build a single generative model that can perform multiple audio generation tasks conditioned on multimodal context. The interns will collaborate with MERL researchers to derive and implement new models and optimization methods, conduct experiments, and prepare results for publication. Internships regularly lead to one or more publications in top-tier venues, which can later become part of the intern's doctoral work. The ideal candidates are Ph.D. students with experience in some of the following: deep generative modeling, large language models, neural audio codecs. The internship typically lasts 3-6 months.
- Research Areas: Artificial Intelligence, Machine Learning, Speech & Audio
- Host: Sameer Khurana
- Apply Now
-
SA0040: Internship - Sound event and anomaly detection
We are seeking graduate students interested in helping advance the fields of sound event detection/localization, anomaly detection, and physics informed deep learning for machine sounds. The interns will collaborate with MERL researchers to derive and implement novel algorithms, record data, conduct experiments, integrate audio signals with other sensors (electrical, vision, vibration, etc.), and prepare results for publication. Internships regularly lead to one or more publications in top-tier venues, which can later become part of the intern's doctoral work. The ideal candidates are senior Ph.D. students with experience in some of the following: audio signal processing, microphone array processing, physics informed machine learning, outlier detection, and unsupervised learning. Multiple positions are available with flexible start dates (not just Spring/Summer but throughout 2025) and duration (typically 3-6 months).
- Research Areas: Artificial Intelligence, Speech & Audio, Machine Learning, Data Analytics
- Host: Gordon Wichern
- Apply Now
-
SA0041: Internship - Audio separation, generation, and analysis
We are seeking graduate students interested in helping advance the fields of generative audio, source separation, speech enhancement, spatial audio, and robust ASR in challenging multi-source and far-field scenarios. The interns will collaborate with MERL researchers to derive and implement new models and optimization methods, conduct experiments, and prepare results for publication. Internships regularly lead to one or more publications in top-tier venues, which can later become part of the intern's doctoral work. The ideal candidates are senior Ph.D. students with experience in some of the following: audio signal processing, microphone array processing, spatial audio reproduction, probabilistic modeling, deep generative modeling, and physics informed machine learning techniques (e.g., neural fields, PINNs, sound field and reverberation modeling). Multiple positions are available with flexible start dates (not just Spring/Summer but throughout 2025) and duration (typically 3-6 months).
- Research Areas: Speech & Audio, Machine Learning, Artificial Intelligence
- Host: Jonathan Le Roux
- Apply Now