Moitreya Chatterjee

- Phone: 617-621-7592
- Email:
-
Position:
Research / Technical Staff
Research Scientist -
Education:
Ph.D., University of Illinois at Urbana-Champaign, 2022 -
Research Area:
Moitreya's Quick Links
-
Biography
Moitreya's research interests are in computer vision, and multimodal machine learning with a particular emphasis on learning from audio-visual data. His PhD work received the Joan and Lalit Bahl Fellowship and the Thomas and Margaret Huang Research Award. Earlier, he earned a M.S. degree in Computer Science from the University of Southern California (USC), during which he received an Outstanding Paper Award from the ACM International Conference on Multimodal Interaction (ICMI).
-
Recent News & Events
-
NEWS MERL researchers presenting five papers at NeurIPS 2022 Date: November 29, 2022 - December 9, 2022
Where: NeurIPS 2022
MERL Contacts: Moitreya Chatterjee; Anoop Cherian; Michael J. Jones; Suhas Lohit
Research Areas: Artificial Intelligence, Computer Vision, Machine Learning, Speech & AudioBrief- MERL researchers are presenting 5 papers at the NeurIPS Conference, which will be held in New Orleans from Nov 29-Dec 1st, with virtual presentations in the following week. NeurIPS is one of the most prestigious and competitive international conferences in machine learning.
MERL papers in NeurIPS 2022:
1. “AVLEN: Audio-Visual-Language Embodied Navigation in 3D Environments” by Sudipta Paul, Amit Roy-Chowdhary, and Anoop Cherian
This work proposes a unified multimodal task for audio-visual embodied navigation where the navigating agent can also interact and seek help from a human/oracle in natural language when it is uncertain of its navigation actions. We propose a multimodal deep hierarchical reinforcement learning framework for solving this challenging task that allows the agent to learn when to seek help and how to use the language instructions. AVLEN agents can interact anywhere in the 3D navigation space and demonstrate state-of-the-art performances when the audio-goal is sporadic or when distractor sounds are present.
2. “Learning Partial Equivariances From Data” by David W. Romero and Suhas Lohit
Group equivariance serves as a good prior improving data efficiency and generalization for deep neural networks, especially in settings with data or memory constraints. However, if the symmetry groups are misspecified, equivariance can be overly restrictive and lead to bad performance. This paper shows how to build partial group convolutional neural networks that learn to adapt the equivariance levels at each layer that are suitable for the task at hand directly from data. This improves performance while retaining equivariance properties approximately.
3. “Learning Audio-Visual Dynamics Using Scene Graphs for Audio Source Separation” by Moitreya Chatterjee, Narendra Ahuja, and Anoop Cherian
There often exist strong correlations between the 3D motion dynamics of a sounding source and its sound being heard, especially when the source is moving towards or away from the microphone. In this paper, we propose an audio-visual scene-graph that learns and leverages such correlations for improved visually-guided audio separation from an audio mixture, while also allowing predicting the direction of motion of the sound source.
4. “What Makes a "Good" Data Augmentation in Knowledge Distillation - A Statistical Perspective” by Huan Wang, Suhas Lohit, Michael Jones, and Yun Fu
This paper presents theoretical and practical results for understanding what makes a particular data augmentation technique (DA) suitable for knowledge distillation (KD). We design a simple metric that works very well in practice to predict the effectiveness of DA for KD. Based on this metric, we also propose a new data augmentation technique that outperforms other methods for knowledge distillation in image recognition networks.
5. “FeLMi : Few shot Learning with hard Mixup” by Aniket Roy, Anshul Shah, Ketul Shah, Prithviraj Dhar, Anoop Cherian, and Rama Chellappa
Learning from only a few examples is a fundamental challenge in machine learning. Recent approaches show benefits by learning a feature extractor on the abundant and labeled base examples and transferring these to the fewer novel examples. However, the latter stage is often prone to overfitting due to the small size of few-shot datasets. In this paper, we propose a novel uncertainty-based criteria to synthetically produce “hard” and useful data by mixing up real data samples. Our approach leads to state-of-the-art results on various computer vision few-shot benchmarks.
- MERL researchers are presenting 5 papers at the NeurIPS Conference, which will be held in New Orleans from Nov 29-Dec 1st, with virtual presentations in the following week. NeurIPS is one of the most prestigious and competitive international conferences in machine learning.
-
-
MERL Publications
- "Active Sparse Conversations for Improved Audio-Visual Embodied Navigation", arXiv, June 2023. ,
-
Other Publications
- "A hierarchical variational neural uncertainty model for stochastic video prediction", Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 9751-9761.BibTeX
- @Inproceedings{chatterjee2021hierarchical,
- author = {Chatterjee, Moitreya and Ahuja, Narendra and Cherian, Anoop},
- title = {A hierarchical variational neural uncertainty model for stochastic video prediction},
- booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision},
- year = 2021,
- pages = {9751--9761}
- }
, - "Visual scene graphs for audio source separation", Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 1204-1213.BibTeX
- @Inproceedings{chatterjee2021visual,
- author = {Chatterjee, Moitreya and Le Roux, Jonathan and Ahuja, Narendra and Cherian, Anoop},
- title = {Visual scene graphs for audio source separation},
- booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision},
- year = 2021,
- pages = {1204--1213}
- }
, - "Dynamic graph representation learning for video dialog via multi-modal shuffled transformers", Proceedings of the AAAI Conference on Artificial Intelligence, 2021, vol. 35, pp. 1415-1423.BibTeX
- @Inproceedings{geng2021dynamic,
- author = {Geng, Shijie and Gao, Peng and Chatterjee, Moitreya and Hori, Chiori and Le Roux, Jonathan and Zhang, Yongfeng and Li, Hongsheng and Cherian, Anoop},
- title = {Dynamic graph representation learning for video dialog via multi-modal shuffled transformers},
- booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence},
- year = 2021,
- volume = 35,
- number = 2,
- pages = {1415--1423}
- }
, - "Sound2sight: Generating visual dynamics from sound and context", European Conference on Computer Vision, 2020, pp. 701-719.BibTeX
- @Inproceedings{chatterjee2020sound2sight,
- author = {Chatterjee, Moitreya and Cherian, Anoop},
- title = {Sound2sight: Generating visual dynamics from sound and context},
- booktitle = {European Conference on Computer Vision},
- year = 2020,
- pages = {701--719},
- organization = {Springer}
- }
, - "Coreset-based neural network compression", Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 454-470.BibTeX
- @Inproceedings{dubey2018coreset,
- author = {Dubey, Abhimanyu and Chatterjee, Moitreya and Ahuja, Narendra},
- title = {Coreset-based neural network compression},
- booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
- year = 2018,
- pages = {454--470}
- }
, - "Deep neural networks with inexact matching for person re-identification", Advances in neural information processing systems, Vol. 29, 2016.BibTeX
- @Article{subramaniam2016deep,
- author = {Subramaniam, Arulkumar and Chatterjee, Moitreya and Mittal, Anurag},
- title = {Deep neural networks with inexact matching for person re-identification},
- journal = {Advances in neural information processing systems},
- year = 2016,
- volume = 29
- }
, - "Combining two perspectives on classifying multimodal data for recognizing speaker traits", Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, 2015, pp. 7-14.BibTeX
- @Inproceedings{chatterjee2015combining,
- author = {Chatterjee, Moitreya and Park, Sunghyun and Morency, Louis-Philippe and Scherer, Stefan},
- title = {Combining two perspectives on classifying multimodal data for recognizing speaker traits},
- booktitle = {Proceedings of the 2015 ACM on International Conference on Multimodal Interaction},
- year = 2015,
- pages = {7--14}
- }
,
- "A hierarchical variational neural uncertainty model for stochastic video prediction", Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 9751-9761.