News & Events

11 News items, Awards, Events or Talks found.



Learn about the MERL Seminar Series.



  •  NEWS    MERL researchers presenting five papers at NeurIPS 2022
    Date: November 29, 2022 - December 9, 2022
    Where: NeurIPS 2022
    MERL Contacts: Moitreya Chatterjee; Anoop Cherian; Michael J. Jones; Suhas Lohit
    Research Areas: Artificial Intelligence, Computer Vision, Machine Learning, Speech & Audio
    Brief
    • MERL researchers are presenting 5 papers at the NeurIPS Conference, which will be held in New Orleans from Nov 29-Dec 1st, with virtual presentations in the following week. NeurIPS is one of the most prestigious and competitive international conferences in machine learning.

      MERL papers in NeurIPS 2022:

      1. “AVLEN: Audio-Visual-Language Embodied Navigation in 3D Environments” by Sudipta Paul, Amit Roy-Chowdhary, and Anoop Cherian

      This work proposes a unified multimodal task for audio-visual embodied navigation where the navigating agent can also interact and seek help from a human/oracle in natural language when it is uncertain of its navigation actions. We propose a multimodal deep hierarchical reinforcement learning framework for solving this challenging task that allows the agent to learn when to seek help and how to use the language instructions. AVLEN agents can interact anywhere in the 3D navigation space and demonstrate state-of-the-art performances when the audio-goal is sporadic or when distractor sounds are present.

      2. “Learning Partial Equivariances From Data” by David W. Romero and Suhas Lohit

      Group equivariance serves as a good prior improving data efficiency and generalization for deep neural networks, especially in settings with data or memory constraints. However, if the symmetry groups are misspecified, equivariance can be overly restrictive and lead to bad performance. This paper shows how to build partial group convolutional neural networks that learn to adapt the equivariance levels at each layer that are suitable for the task at hand directly from data. This improves performance while retaining equivariance properties approximately.

      3. “Learning Audio-Visual Dynamics Using Scene Graphs for Audio Source Separation” by Moitreya Chatterjee, Narendra Ahuja, and Anoop Cherian

      There often exist strong correlations between the 3D motion dynamics of a sounding source and its sound being heard, especially when the source is moving towards or away from the microphone. In this paper, we propose an audio-visual scene-graph that learns and leverages such correlations for improved visually-guided audio separation from an audio mixture, while also allowing predicting the direction of motion of the sound source.

      4. “What Makes a "Good" Data Augmentation in Knowledge Distillation - A Statistical Perspective” by Huan Wang, Suhas Lohit, Michael Jones, and Yun Fu

      This paper presents theoretical and practical results for understanding what makes a particular data augmentation technique (DA) suitable for knowledge distillation (KD). We design a simple metric that works very well in practice to predict the effectiveness of DA for KD. Based on this metric, we also propose a new data augmentation technique that outperforms other methods for knowledge distillation in image recognition networks.

      5. “FeLMi : Few shot Learning with hard Mixup” by Aniket Roy, Anshul Shah, Ketul Shah, Prithviraj Dhar, Anoop Cherian, and Rama Chellappa

      Learning from only a few examples is a fundamental challenge in machine learning. Recent approaches show benefits by learning a feature extractor on the abundant and labeled base examples and transferring these to the fewer novel examples. However, the latter stage is often prone to overfitting due to the small size of few-shot datasets. In this paper, we propose a novel uncertainty-based criteria to synthetically produce “hard” and useful data by mixing up real data samples. Our approach leads to state-of-the-art results on various computer vision few-shot benchmarks.
  •  
  •  TALK    [MERL Seminar Series 2022] Prof. Jiajun Wu presents talk titled Understanding the Visual World Through Naturally Supervised Code
    Date & Time: Tuesday, November 1, 2022; 1:00 PM
    Speaker: Jiajun Wu, Stanford University
    MERL Host: Anoop Cherian
    Research Areas: Artificial Intelligence, Computer Vision, Machine Learning
    Abstract
    • The visual world has its inherent structure: scenes are made of multiple identical objects; different objects may have the same color or material, with a regular layout; each object can be symmetric and have repetitive parts. How can we infer, represent, and use such structure from raw data, without hampering the expressiveness of neural networks? In this talk, I will demonstrate that such structure, or code, can be learned from natural supervision. Here, natural supervision can be from pixels, where neuro-symbolic methods automatically discover repetitive parts and objects for scene synthesis. It can also be from objects, where humans during fabrication introduce priors that can be leveraged by machines to infer regular intrinsics such as texture and material. When solving these problems, structured representations and neural nets play complementary roles: it is more data-efficient to learn with structured representations, and they generalize better to new scenarios with robustly captured high-level information; neural nets effectively extract complex, low-level features from cluttered and noisy visual data.
  •  
  •  EVENT    SANE 2022 - Speech and Audio in the Northeast
    Date: Thursday, October 6, 2022
    Location: Kendall Square, Cambridge, MA
    MERL Contacts: Anoop Cherian; Jonathan Le Roux
    Research Areas: Artificial Intelligence, Computer Vision, Machine Learning, Speech & Audio
    Brief
    • SANE 2022, a one-day event gathering researchers and students in speech and audio from the Northeast of the American continent, was held on Thursday October 6, 2022 in Kendall Square, Cambridge, MA.

      It was the 9th edition in the SANE series of workshops, which started in 2012 and was held every year alternately in Boston and New York until 2019. Since the first edition, the audience has grown to a record 200 participants and 45 posters in 2019. After a 2-year hiatus due to the pandemic, SANE returned with an in-person gathering of 140 students and researchers.

      SANE 2022 featured invited talks by seven leading researchers from the Northeast: Rupal Patel (Northeastern/VocaliD), Wei-Ning Hsu (Meta FAIR), Scott Wisdom (Google), Tara Sainath (Google), Shinji Watanabe (CMU), Anoop Cherian (MERL), and Chuang Gan (UMass Amherst/MIT-IBM Watson AI Lab). It also featured a lively poster session with 29 posters.

      SANE 2022 was co-organized by Jonathan Le Roux (MERL), Arnab Ghoshal (Apple), John Hershey (Google), and Shinji Watanabe (CMU). SANE remained a free event thanks to generous sponsorship by Bose, Google, MERL, and Microsoft.

      Slides and videos of the talks will be released on the SANE workshop website.
  •  
  •  NEWS    MERL presenting 8 papers at ICASSP 2022
    Date: May 22, 2022 - May 27, 2022
    Where: Singapore
    MERL Contacts: Anoop Cherian; Chiori Hori; Toshiaki Koike-Akino; Jonathan Le Roux; Tim K. Marks; Philip V. Orlik; Kuan-Chuan Peng; Pu (Perry) Wang; Gordon Wichern
    Research Areas: Artificial Intelligence, Computer Vision, Signal Processing, Speech & Audio
    Brief
    • MERL researchers are presenting 8 papers at the IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP), which is being held in Singapore from May 22-27, 2022. A week of virtual presentations also took place earlier this month.

      Topics to be presented include recent advances in speech recognition, audio processing, scene understanding, computational sensing, and classification.

      ICASSP is the flagship conference of the IEEE Signal Processing Society, and the world's largest and most comprehensive technical conference focused on the research advances and latest technological development in signal and information processing. The event attracts more than 2000 participants each year.
  •  
  •  NEWS    MERL work on scene-aware interaction featured in IEEE Spectrum
    Date: March 1, 2022
    MERL Contacts: Anoop Cherian; Chiori Hori; Jonathan Le Roux; Tim K. Marks; Anthony Vetro
    Research Areas: Artificial Intelligence, Computer Vision, Machine Learning, Speech & Audio
    Brief
    • MERL's research on scene-aware interaction was recently featured in an IEEE Spectrum article. The article, titled "At Last, A Self-Driving Car That Can Explain Itself" and authored by MERL Senior Principal Research Scientist Chiori Hori and MERL Director Anthony Vetro, gives an overview of MERL's efforts towards developing a system that can analyze multimodal sensing information for highly natural and intuitive interaction with humans through context-dependent generation of natural language. The technology recognizes contextual objects and events based on multimodal sensing information, such as images and video captured with cameras, audio information recorded with microphones, and localization information measured with LiDAR.

      Scene-Aware Interaction for car navigation, one target application that the article focuses on, will provide drivers with intuitive route guidance. Scene-Aware Interaction technology is expected to have wide applicability, including human-machine interfaces for in-vehicle infotainment, interaction with service robots in building and factory automation systems, systems that monitor the health and well-being of people, surveillance systems that interpret complex scenes for humans and encourage social distancing, support for touchless operation of equipment in public areas, and much more. MERL's Scene-Aware Interaction Technology had previously been featured in a Mitsubishi Electric Corporation Press Release.

      IEEE Spectrum is the flagship magazine and website of the IEEE, the world’s largest professional organization devoted to engineering and the applied sciences. IEEE Spectrum has a circulation of over 400,000 engineers worldwide, making it one of the leading science and engineering magazines.
  •  
  •  NEWS    Anoop Cherian gave an invited talk at the Department of Computer Science at the University of Bristol, UK
    Date: September 7, 2021
    MERL Contact: Anoop Cherian
    Research Areas: Artificial Intelligence, Computer Vision, Machine Learning
    Brief
    • Anoop Cherian, a Principal Research Scientist in MERL's Computer Vision group, gave an invited virtual talk on "InSeGAN: An Unsupervised Approach to Identical Instance Segmentation" at the Visual Information Laboratory of University of Bristol, UK. The talk described a new approach to segmenting varied appearances of nearly identical 3D objects in depth images. More details of the talk can be found in the following paper https://arxiv.org/abs/2108.13865, which will be presented at the International Conference on Computer Vision (ICCV'21).
  •  
  •  NEWS    Anoop Cherian gave an invited talk at the Multi-modal Video Analysis Workshop, ECCV 2020
    Date: August 23, 2020
    Where: European Conference on Computer Vision (ECCV), online, 2020
    MERL Contact: Anoop Cherian
    Research Areas: Artificial Intelligence, Computer Vision, Machine Learning, Speech & Audio
    Brief
    • MERL Principal Research Scientist Anoop Cherian gave an invited talk titled "Sound2Sight: Audio-Conditioned Visual Imagination" at the Multi-modal Video Analysis workshop held in conjunction with the European Conference on Computer Vision (ECCV), 2020. The talk was based on a recent ECCV paper that describes a new multimodal reasoning task called Sound2Sight and a generative adversarial machine learning algorithm for producing plausible video sequences conditioned on sound and visual context.
  •  
  •  NEWS    MERL's Scene-Aware Interaction Technology Featured in Mitsubishi Electric Corporation Press Release
    Date: July 22, 2020
    Where: Tokyo, Japan
    MERL Contacts: Anoop Cherian; Chiori Hori; Jonathan Le Roux; Tim K. Marks; Anthony Vetro
    Research Areas: Artificial Intelligence, Computer Vision, Machine Learning, Speech & Audio
    Brief
    • Mitsubishi Electric Corporation announced that the company has developed what it believes to be the world’s first technology capable of highly natural and intuitive interaction with humans based on a scene-aware capability to translate multimodal sensing information into natural language.

      The novel technology, Scene-Aware Interaction, incorporates Mitsubishi Electric’s proprietary Maisart® compact AI technology to analyze multimodal sensing information for highly natural and intuitive interaction with humans through context-dependent generation of natural language. The technology recognizes contextual objects and events based on multimodal sensing information, such as images and video captured with cameras, audio information recorded with microphones, and localization information measured with LiDAR.

      Scene-Aware Interaction for car navigation, one target application, will provide drivers with intuitive route guidance. The technology is also expected to have applicability to human-machine interfaces for in-vehicle infotainment, interaction with service robots in building and factory automation systems, systems that monitor the health and well-being of people, surveillance systems that interpret complex scenes for humans and encourage social distancing, support for touchless operation of equipment in public areas, and much more. The technology is based on recent research by MERL's Speech & Audio and Computer Vision groups.
  •  
  •  NEWS    MERL researchers presenting three papers at ICML 2020
    Date: July 12, 2020 - July 18, 2020
    Where: Vienna, Austria (virtual this year)
    MERL Contacts: Mouhacine Benosman; Anoop Cherian; Devesh K. Jha; Daniel N. Nikovski
    Research Areas: Artificial Intelligence, Computer Vision, Data Analytics, Dynamical Systems, Machine Learning, Optimization, Robotics
    Brief
    • MERL researchers are presenting three papers at the International Conference on Machine Learning (ICML 2020), which is virtually held this year from 12-18th July. ICML is one of the top-tier conferences in machine learning with an acceptance rate of 22%. The MERL papers are:

      1) "Finite-time convergence in Continuous-Time Optimization" by Orlando Romero and Mouhacine Benosman.

      2) "Can Increasing Input Dimensionality Improve Deep Reinforcement Learning?" by Kei Ota, Tomoaki Oiki, Devesh Jha, Toshisada Mariyama, and Daniel Nikovski.

      3) "Representation Learning Using Adversarially-Contrastive Optimal Transport" by Anoop Cherian and Shuchin Aeron.
  •  
  •  NEWS    MERL researchers presenting four papers and organizing two workshops at CVPR 2020 conference
    Date: June 14, 2020 - June 19, 2020
    MERL Contacts: Anoop Cherian; Michael J. Jones; Toshiaki Koike-Akino; Tim K. Marks; Kuan-Chuan Peng; Ye Wang
    Research Areas: Artificial Intelligence, Computer Vision, Machine Learning
    Brief
    • MERL researchers are presenting four papers (two oral papers and two posters) and organizing two workshops at the IEEE/CVF Computer Vision and Pattern Recognition (CVPR 2020) conference.

      CVPR 2020 Orals with MERL authors:
      1. "Dynamic Multiscale Graph Neural Networks for 3D Skeleton Based Human Motion Prediction," by Maosen Li, Siheng Chen, Yangheng Zhao, Ya Zhang, Yanfeng Wang, Qi Tian
      2. "Collaborative Motion Prediction via Neural Motion Message Passing," by Yue Hu, Siheng Chen, Ya Zhang, Xiao Gu

      CVPR 2020 Posters with MERL authors:
      3. "LUVLi Face Alignment: Estimating Landmarks’ Location, Uncertainty, and Visibility Likelihood," by Abhinav Kumar, Tim K. Marks, Wenxuan Mou, Ye Wang, Michael Jones, Anoop Cherian, Toshiaki Koike-Akino, Xiaoming Liu, Chen Feng
      4. "MotionNet: Joint Perception and Motion Prediction for Autonomous Driving Based on Bird’s Eye View Maps," by Pengxiang Wu, Siheng Chen, Dimitris N. Metaxas

      CVPR 2020 Workshops co-organized by MERL researchers:
      1. Fair, Data-Efficient and Trusted Computer Vision
      2. Deep Declarative Networks.
  •  
  •  NEWS    MERL presenting 16 papers at ICASSP 2019
    Date: May 12, 2019 - May 17, 2019
    Where: Brighton, UK
    MERL Contacts: Petros T. Boufounos; Anoop Cherian; Chiori Hori; Toshiaki Koike-Akino; Jonathan Le Roux; Dehong Liu; Hassan Mansour; Tim K. Marks; Philip V. Orlik; Anthony Vetro; Pu (Perry) Wang; Gordon Wichern
    Research Areas: Computational Sensing, Computer Vision, Machine Learning, Signal Processing, Speech & Audio
    Brief
    • MERL researchers will be presenting 16 papers at the IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP), which is being held in Brighton, UK from May 12-17, 2019. Topics to be presented include recent advances in speech recognition, audio processing, scene understanding, computational sensing, and parameter estimation. MERL is also a sponsor of the conference and will be participating in the student career luncheon; please join us at the lunch to learn about our internship program and career opportunities.

      ICASSP is the flagship conference of the IEEE Signal Processing Society, and the world's largest and most comprehensive technical conference focused on the research advances and latest technological development in signal and information processing. The event attracts more than 2000 participants each year.
  •