-
ST0096: Internship - Multimodal Tracking and Imaging
MERL is seeking a motivated intern to assist in developing hardware and algorithms for multimodal imaging applications. The project involves integration of radar, camera, and depth sensors in a variety of sensing scenarios. The ideal candidate should have experience with FMCW radar and/or depth sensing, and be fluent in Python and scripting methods. Familiarity with optical tracking of humans and experience with hardware prototyping is desired. Good knowledge of computational imaging and/or radar imaging methods is a plus.
Required Specific Experience
- Experience with Python and Python Deep Learning Frameworks.
- Experience with FMCW radar and/or Depth Sensors.
- Research Areas: Computer Vision, Machine Learning, Signal Processing, Computational Sensing
- Host: Petros Boufounos
- Apply Now
-
CA0129: Internship - LLM-guided Active SLAM for Mobile Robots
MERL is seeking interns passionate about robotics to contribute to the development of an Active Simultaneous Localization and Mapping (Active SLAM) framework guided by Large Language Models (LLM). The core objective is to achieve autonomous behavior for mobile robots. The methods will be implemented and evaluated in high performance simulators and (time-permitting) in actual robotic platforms, such as legged and wheeled robots. The expectation at the end of the internship is a publication at a top-tier robotic or computer vision conference and/or journal.
The internship has a flexible start date (Spring/Summer 2025), with a duration of 3-6 months depending on agreed scope and intermediate progress.
Required Specific Experience
- Current/Past Enrollment in a PhD Program in Computer Engineering, Computer Science, Electrical Engineering, Mechanical Engineering, or related field
- Experience with employing and fine-tuning LLM and/or Visual Language Models (VLM) for high-level context-aware planning and navigation
- 2+ years experience with 3D computer vision (e.g., point cloud, voxels, camera pose estimation) and mapping, filter-based methods (e.g., EKF), and in at least some of: motion planning algorithms, factor graphs, control, and optimization
- Excellent programming skills in Python and/or C/C++, with prior knowledge in ROS2 and high-fidelity simulators such as Gazebo, Isaac Lab, and/or Mujoco
Additional Desired Experience
- Prior experience with implementation and/or development of SLAM algorithms on robotic hardware, including acquisition, processing, and fusion of multimodal sensor data such as proprioceptive and exteroceptive sensors
- Research Areas: Artificial Intelligence, Computer Vision, Control, Machine Learning, Optimization, Robotics
- Host: Alexander Schperberg
- Apply Now
-
SA0044: Internship - Multimodal scene-understanding
We are looking for a graduate student interested in helping advance the field of multimodal scene understanding, focusing on scene understanding using natural language for robot dialog and/or indoor monitoring using a large language model. The intern will collaborate with MERL researchers to derive and implement new models and optimization methods, conduct experiments, and prepare results for publication. Internships regularly lead to one or more publications in top-tier venues, which can later become part of the intern''''s doctoral work. The ideal candidates are senior Ph.D. students with experience in deep learning for audio-visual, signal, and natural language processing. Good programming skills in Python and knowledge of deep learning frameworks such as PyTorch are essential. Multiple positions are available with flexible start date (not just Spring/Summer but throughout 2024) and duration (typically 3-6 months).
Required Specific Experience
- Experience with ROS2, C/C++, Python, and deep learning frameworks such as PyTorch are essential.
- Research Areas: Artificial Intelligence, Computer Vision, Control, Machine Learning, Robotics, Speech & Audio
- Host: Chiori Hori
- Apply Now
-
OR0127: Internship - Deep Learning for Robotic Manipulation
MERL is looking for a highly motivated and qualified intern to work on deep learning methods for detection and pose estimation of objects using vision and tactile sensing, in manufacturing and assembly environments. This role involves developing, fine-tuning and deploying models on existing hardware. The method will be applied for robotic manipulation where the knowledge of accurate position and orientation of objects within the scene would allow the robot to interact with the objects. The ideal candidate would be a Ph.D. student familiar with the state-of-the-art methods for pose estimation and tracking of objects. The successful candidate will work closely with MERL researchers to develop and implement novel algorithms, conduct experiments, and publish research findings at a top-tier conference. Start date and expected duration of the internship is flexible. Interested candidates are encouraged to apply with their updated CV and list of relevant publications.
Required Specific Experience
- Prior experience in Computer Vision and Robotic Manipulation.
- Experience with ROS and deep learning frameworks such as PyTorch are essential.
- Strong programming skills in Python.
- Experience with simulation tools, such as PyBullet, Issac Lab, or MuJoCo.
- Research Areas: Computer Vision, Robotics, Artificial Intelligence
- Host: Siddarth Jain
- Apply Now
-
OR0088: Internship - Robot Learning
MERL is looking for a highly motivated and qualified PhD student in the areas of machine learning and robotics, to participate in research on advanced algorithms for learning control of robots and other mechanisms. Solid background and hands-on experience with various machine learning algorithms is expected, and in particular with deep learning algorithms for image processing and object detection. Exposure to deep reinforcement learning and/or learning from demonstration is highly desirable. Familiarity with the use of machine learning algorithms for system identification of mechanical systems would be a plus, along with background in other areas of automatic control. Solid experimental skills and hands-on experience in coding in Python, PyTorch, and OpenCV are required for the position. Some experience with ROS2 and familiarity with classical mechanics and computational physics engines would be helpful, but is not required. The position will provide opportunities for exploring fundamental problems in incremental learning in humans and machines, leading to publishable results. The duration of the internship is 3 to 5 months, with a flexible starting date.
Required Specific Experience
- Python, PyTorch, OpenCV
- Research Areas: Artificial Intelligence, Computer Vision, Control, Machine Learning, Robotics
- Host: Daniel Nikovski
- Apply Now
-
OR0087: Internship - Human-Robot Collaboration with Shared Autonomy
MERL is looking for a highly motivated and qualified intern to contribute to research in human-robot interaction (HRI). The ideal candidate is a Ph.D. student with expertise in robotic manipulation, perception, deep learning, probabilistic modeling, or reinforcement learning. We have several research topics available, including assistive teleoperation, visual scene reconstruction, safety in HRI, shared autonomy, intent recognition, cooperative manipulation, and robot learning. The selected intern will work closely with MERL researchers to develop and implement novel algorithms, conduct experiments, and present research findings. We publish our research at top-tier conferences. Start date is flexible, and the expected duration of the internship is 3-4 months. Interested candidates are encouraged to apply with their updated CV and list of publications.
Required Specific Experience
- Experience with ROS and deep learning frameworks such as PyTorch are essential.
- Strong programming skills in Python and/or C/C++
- Experience with simulation tools, such as PyBullet, Issac Lab, or MuJoCo.
- Prior experience in human-robot interaction, perception, or robotic manipulation.
- Research Areas: Robotics, Computer Vision, Machine Learning
- Host: Siddarth Jain
- Apply Now
-
CV0056: Internship - "Small" Large Generative Models for Vision and Language
MERL is looking for research interns to conduct research into novel architectures for "small" large generative models. We are currently exploring 0.5 - 2 billion parameter language models, text-to-image models and text-to-video models. Interesting research directions include (a) efficient learning for such models that improves the pareto front of current scaling laws for these sizes, (b) enhancing current transformer-based architectures, and (c) new architectural paradigms beyond transformers such as incorporating explicitly temporal designs. Prior experience with machine learning/computer vision/natural language processing research, and proficiency in building and experimenting with machine learning models using a framework like PyTorch are required. Candidates well into their PhD program with publications in top-tier machine learning, natural language processing or computer vision venues, ideally connected to building generative models, are strongly preferred. Candidates are also expected to collaborate with MERL researchers for preparing manuscripts for scientific publications based on the results obtained during the internship. Duration of the internship is 3 months with a flexible start date.
Required Specific Experience
- Research experience with recent vision and text generative models
- Deep understanding of neural network architectures
- Proficiency in machine learning frameworks like PyTorch
- Research Areas: Artificial Intelligence, Computer Vision, Machine Learning
- Host: Suhas Lohit
- Apply Now
-
CV0079: Internship - Novel View Synthesis of Dynamic Scenes
MERL is looking for a highly motivated intern to work on an original research project in rendering dynamic scenes from novel views. A strong background in 3D computer vision and/or computer graphics is required. Experience with the latest advances in volumetric rendering, such as neural radiance fields (NeRFs) and Gaussian Splatting (GS), is desired. The successful candidate is expected to have published at least one paper in a top-tier computer vision/graphics or machine learning venue, such as CVPR, ECCV, ICCV, SIGGRAPH, 3DV, ICML, ICLR, NeurIPS or AAAI, and possess solid programming skills in Python and popular deep learning frameworks like Pytorch. The candidate will collaborate with MERL researchers to develop algorithms and prepare manuscripts for scientific publications. The position is available for graduate students on a Ph.D. track or those that have recently graduated with a Ph.D. Duration and start date are flexible but the internship is expected to last for at least 3 months.
Required Specific Experience
- Prior publications in top computer vision/graphics and/or machine learning venues, such as CVPR, ECCV, ICCV, SIGGRAPH, 3DV, ICML, ICLR, NeurIPS or AAAI.
- Experienced in the latest novel-view synthesis approaches such as Neural Radiance Fields (NeRFs) or Gaussian Splatting (GS).
- Proficiency in coding (particularly scripting languages like Python) and familiarity with deep learning frameworks, such as PyTorch or Tensorflow.
- Research Areas: Computer Vision, Artificial Intelligence, Machine Learning
- Host: Moitreya Chatterjee
- Apply Now
-
CV0064: Internship - Robust Estimation for Computer Vision
MERL is looking for a self-motivated graduate student to work on robust estimation in Computer Vision. Based on the candidate’s interests, the intern can work on a variety of topics such as (but not limited to) camera pose estimation, 3D registration, camera calibration, pose-graph optimization, and transformation averaging. The ideal candidate would be a PhD student with a strong background in 3D computer vision, RANSAC, and graduated non-convexity algorithms, and good programming skills in C/C++ and/or Python. The candidate must have published at least one paper in a top-tier computer vision, machine learning, or robotics venue, such as CVPR, ECCV, ICCV, NeurIPS, ICRA, or IROS. The intern will collaborate with MERL researchers to derive and implement new algorithms for V-SLAM, conduct experiments, and report findings. A submission to a top-tier conference is expected. The duration of the internship and start date are flexible.
Required Specific Experience
- Experience with 3D computer vision, RANSAC, or graduated non-convexity algorithms for computer vision.
- Research Areas: Computer Vision, Computational Sensing, Robotics
- Host: Pedro Miraldo
- Apply Now
-
CV0051: Internship - Visual-LiDAR fused object detection and recognition
MERL is looking for a self-motivated intern to work on visual-LiDAR fused object detection and recognition using computer vision. The relevant topics in the scope include (but not limited to): open-vocabulary visual-LiDAR object detection and recognition, domain adaptation or generalization in visual-LiDAR object detection, data-efficient methods for visual-LiDAR object detection, small object detection with visual-LiDAR input, etc. The candidates with experiences of object recognition in LiDAR are strongly preferred. The ideal candidate would be a PhD student with a strong background in computer vision and machine learning, and the candidate is expected to have published at least one paper in a top-tier computer vision, machine learning, or artificial intelligence venues, such as CVPR, ECCV, ICCV, ICML, ICLR, NeurIPS, or AAAI. Proficiency in Python programming and familiarity in at least one deep learning framework are necessary. The ideal candidate is required to collaborate with MERL researchers to develop algorithms and prepare manuscripts for scientific publications. The duration of the internship is ideally to be at least 3 months with a flexible start date.
Required Specific Experience
- Experience with Python, PyTorch, and datasets with both images and LiDAR (e.g. the nuScenes dataset).
- Research Areas: Artificial Intelligence, Computer Vision, Machine Learning
- Host: Kuan-Chuan Peng
- Apply Now
-
CV0101: Internship - Multimodal Algorithmic Reasoning
MERL is looking for a self-motivated intern to research on problems at the intersection of multimodal large language models and neural algorithmic reasoning. An ideal intern would be a Ph.D. student with a strong background in machine learning and computer vision. The candidate must have prior experience with training multimodal LLMs for solving vision-and-language tasks. Experience in participating and winning mathematical Olympiads is desired. Publications in theoretical machine learning venues would be a strong plus. The intern is expected to collaborate with researchers in the computer vision team at MERL to develop algorithms and prepare manuscripts for scientific publications.
Required Specific Experience
- Experience with training large vision-and-language models
- Experience with solving mathematical reasoning problems
- Experience with programming in Python using PyTorch
- Enrolled in a PhD program
- Strong track record of publications in top-tier computer vision and machine learning venues (such as CVPR, NeurIPS, etc.).
- Research Areas: Artificial Intelligence, Computer Vision, Machine Learning
- Host: Anoop Cherian
- Apply Now
-
CV0063: Internship - Visual Simultaneous Localization and Mapping
MERL is looking for a self-motivated graduate student to work on Visual Simultaneous Localization and Mapping (V-SLAM). Based on the candidate’s interests, the intern can work on a variety of topics such as (but not limited to): camera pose estimation, feature detection and matching, visual-LiDAR data fusion, pose-graph optimization, loop closure detection, and image-based camera relocalization. The ideal candidate would be a PhD student with a strong background in 3D computer vision and good programming skills in C/C++ and/or Python. The candidate must have published at least one paper in a top-tier computer vision, machine learning, or robotics venue, such as CVPR, ECCV, ICCV, NeurIPS, ICRA, or IROS. The intern will collaborate with MERL researchers to derive and implement new algorithms for V-SLAM, conduct experiments, and report findings. A submission to a top-tier conference is expected. The duration of the internship and start date are flexible.
Required Specific Experience
- Experience with 3D Computer Vision and Simultaneous Localization & Mapping.
- Research Areas: Computer Vision, Robotics, Control
- Host: Pedro Miraldo
- Apply Now
-
CV0075: Internship - Multimodal Embodied AI
MERL is looking for a self-motivated intern to work on problems at the intersection of multimodal large language models and embodied AI in dynamic indoor environments. The ideal candidate would be a PhD student with a strong background in machine learning and computer vision, as demonstrated by top-tier publications. The candidate must have prior experience in designing synthetic scenes (e.g., 3D games) using popular graphics software, embodied AI, large language models, reinforcement learning, and the use of simulators such as Habitat/SoundSpaces. Hands on experience in using animated 3D human shape models (e.g., SMPL and variants) is desired. The intern is expected to collaborate with researchers in computer vision at MERL to develop algorithms and prepare manuscripts for scientific publications.
Required Specific Experience
- Experience in designing 3D interactive scenes
- Experience with vision based embodied AI using simulators (implementation on real robotic hardware would be a plus).
- Experience training large language models on multimodal data
- Experience with training reinforcement learning algorithms
- Strong foundations in machine learning and programming
- Strong track record of publications in top-tier computer vision and machine learning venues (such as CVPR, NeurIPS, etc.).
- Research Areas: Artificial Intelligence, Computer Vision, Speech & Audio, Robotics, Machine Learning
- Host: Anoop Cherian
- Apply Now
-
CV0078: Internship - Audio-Visual Learning with Limited Labeled Data
MERL is looking for a highly motivated intern to work on an original research project on multimodal learning, such as audio-visual learning, using limited labeled data. A strong background in computer vision and deep learning is required. Experience in audio-visual (multimodal) learning, weakly/self-supervised learning, continual learning, and large (vision-) language models is an added plus and will be valued. The successful candidate is expected to have published at least one paper in a top-tier computer vision or machine learning venue, such as CVPR, ECCV, ICCV, ICML, ICLR, NeurIPS or AAAI, and possess solid programming skills in Python and popular deep learning frameworks such as Pytorch. The intern will collaborate with MERL researchers to develop and implement novel algorithms and prepare manuscripts for scientific publications. Successful applicants are typically graduate students on a Ph.D. track or recent Ph.D. graduates. Duration and start date are flexible, but the internship is expected to last for at least 3 months.
Required Specific Experience
- Prior publications in top-tier computer vision and/or machine learning venues, such as CVPR, ECCV, ICCV, ICML, ICLR, NeurIPS or AAAI.
- Knowledge of the latest self-supervised and weakly-supervised learning techniques.
- Experience with Large (Vision-) Language Models.
- Proficiency in scripting languages, such as Python, and deep learning frameworks such as PyTorch or Tensorflow.
- Research Areas: Computer Vision, Machine Learning, Speech & Audio, Artificial Intelligence
- Host: Moitreya Chatterjee
- Apply Now
-
CV0060: Internship - Video Anomaly Detection
MERL is looking for a self-motivated intern to work on the problem of video anomaly detection. The intern will help to develop new ideas for improving the state of the art in detecting anomalous activity in videos. The ideal candidate would be a Ph.D. student with a strong background in machine learning and computer vision and some experience with video anomaly detection in particular. Proficiency in Python programming and Pytorch is necessary. The successful candidate is expected to have published at least one paper in a top-tier computer vision or machine learning venue, such as CVPR, ECCV, ICCV, WACV, ICML, ICLR, NeurIPS or AAAI. The intern will collaborate with MERL researchers to develop and test algorithms and prepare manuscripts for scientific publications. The internship is for 3 months and the start date is flexible.
Required Specific Experience
- Graduate student in Ph.D. program
- Experience with PyTorch.
- Prior publication in computer vision or machine learning conference/journal.
- Research Area: Computer Vision
- Host: Mike Jones
- Apply Now