- Date & Time: Tuesday, November 2, 2021; 1:00 PM EST
Speaker: Dr. Hsiao-Yu (Fish) Tung, MIT BCS
Research Areas: Artificial Intelligence, Computer Vision, Machine Learning, Robotics
Abstract
Current state-of-the-art CNNs can localize and name objects in internet photos, yet, they miss the basic knowledge that a two-year-old toddler has possessed: objects persist over time despite changes in the observer’s viewpoint or during cross-object occlusions; objects have 3D extent; solid objects do not pass through each other. In this talk, I will introduce neural architectures that learn to parse video streams of a static scene into world-centric 3D feature maps by disentangling camera motion from scene appearance. I will show the proposed architectures learn object permanence, can imagine RGB views from novel viewpoints in truly novel scenes, can conduct basic spatial reasoning and planning, can infer affordability in sentences, and can learn geometry-aware 3D concepts that allow pose-aware object recognition to happen with weak/sparse labels. Our experiments suggest that the proposed architectures are essential for the models to generalize across objects and locations, and it overcomes many limitations of 2D CNNs. I will show how we can use the proposed 3D representations to build machine perception and physical understanding more close to humans.
-
- Date & Time: Tuesday, October 12, 2021; 1:00 PM EST
Speaker: Prof. Greg Ongie, Marquette University
MERL Host: Hassan Mansour
Research Areas: Computational Sensing, Machine Learning, Signal Processing
Abstract
Deep learning is emerging as powerful tool to solve challenging inverse problems in computational imaging, including basic image restoration tasks like denoising and deblurring, as well as image reconstruction problems in medical imaging. This talk will give an overview of the state-of-the-art supervised learning techniques in this area and discuss two recent innovations: deep equilibrium architectures, which allows one to train an effectively infinite-depth reconstruction network; and model adaptation methods, that allow one to adapt a pre-trained reconstruction network to changes in the imaging forward model at test time.
-
- Date & Time: Tuesday, September 28, 2021; 1:00 PM EST
Speaker: Dr. Ruohan Gao, Stanford University
MERL Host: Gordon Wichern
Research Areas: Computer Vision, Machine Learning, Speech & Audio
Abstract
While computer vision has made significant progress by "looking" — detecting objects, actions, or people based on their appearance — it often does not listen. Yet cognitive science tells us that perception develops by making use of all our senses without intensive supervision. Towards this goal, in this talk I will present my research on audio-visual learning — We disentangle object sounds from unlabeled video, use audio as an efficient preview for action recognition in untrimmed video, decode the monaural soundtrack into its binaural counterpart by injecting visual spatial information, and use echoes to interact with the environment for spatial image representation learning. Together, these are steps towards multimodal understanding of the visual world, where audio serves as both the semantic and spatial signals. In the end, I will also briefly talk about our latest work on multisensory learning for robotics.
-
- Date & Time: Tuesday, September 14, 2021; 1:00 PM EST
Speaker: Prof. David Bergman, University of Connecticut
MERL Host: Arvind Raghunathan
Research Areas: Data Analytics, Machine Learning, Optimization
Abstract
The integration of machine learning and optimization opens the door to new modeling paradigms that have already proven successful across a broad range of industries. Sports betting is a particularly exciting application area, where recent advances in both analytics and optimization can provide a lucrative edge. In this talk we will discuss three algorithmic sports betting games where combinations of machine learning and optimization have netted me significant winnings.
-