- Date: June 17, 2024 - June 21, 2024
Where: Seattle, WA
MERL Contacts: Petros T. Boufounos; Moitreya Chatterjee; Anoop Cherian; Michael J. Jones; Toshiaki Koike-Akino; Jonathan Le Roux; Suhas Lohit; Tim K. Marks; Pedro Miraldo; Jing Liu; Kuan-Chuan Peng; Pu (Perry) Wang; Ye Wang; Matthew Brand
Research Areas: Artificial Intelligence, Computational Sensing, Computer Vision, Machine Learning, Speech & Audio
Brief - MERL researchers are presenting 5 conference papers, 3 workshop papers, and are co-organizing two workshops at the CVPR 2024 conference, which will be held in Seattle, June 17-21. CVPR is one of the most prestigious and competitive international conferences in computer vision. Details of MERL contributions are provided below.
CVPR Conference Papers:
1. "TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models" by H. Ni, B. Egger, S. Lohit, A. Cherian, Y. Wang, T. Koike-Akino, S. X. Huang, and T. K. Marks
This work enables a pretrained text-to-video (T2V) diffusion model to be additionally conditioned on an input image (first video frame), yielding a text+image to video (TI2V) model. Other than using the pretrained T2V model, our method requires no ("zero") training or fine-tuning. The paper uses a "repeat-and-slide" method and diffusion resampling to synthesize videos from a given starting image and text describing the video content.
Paper: https://www.merl.com/publications/TR2024-059
Project page: https://merl.com/research/highlights/TI2V-Zero
2. "Long-Tailed Anomaly Detection with Learnable Class Names" by C.-H. Ho, K.-C. Peng, and N. Vasconcelos
This work aims to identify defects across various classes without relying on hard-coded class names. We introduce the concept of long-tailed anomaly detection, addressing challenges like class imbalance and dataset variability. Our proposed method combines reconstruction and semantic modules, learning pseudo-class names and utilizing a variational autoencoder for feature synthesis to improve performance in long-tailed datasets, outperforming existing methods in experiments.
Paper: https://www.merl.com/publications/TR2024-040
3. "Gear-NeRF: Free-Viewpoint Rendering and Tracking with Motion-aware Spatio-Temporal Sampling" by X. Liu, Y-W. Tai, C-T. Tang, P. Miraldo, S. Lohit, and M. Chatterjee
This work presents a new strategy for rendering dynamic scenes from novel viewpoints. Our approach is based on stratifying the scene into regions based on the extent of motion of the region, which is automatically determined. Regions with higher motion are permitted a denser spatio-temporal sampling strategy for more faithful rendering of the scene. Additionally, to the best of our knowledge, ours is the first work to enable tracking of objects in the scene from novel views - based on the preferences of a user, provided by a click.
Paper: https://www.merl.com/publications/TR2024-042
4. "SIRA: Scalable Inter-frame Relation and Association for Radar Perception" by R. Yataka, P. Wang, P. T. Boufounos, and R. Takahashi
Overcoming the limitations on radar feature extraction such as low spatial resolution, multipath reflection, and motion blurs, this paper proposes SIRA (Scalable Inter-frame Relation and Association) for scalable radar perception with two designs: 1) extended temporal relation, generalizing the existing temporal relation layer from two frames to multiple inter-frames with temporally regrouped window attention for scalability; and 2) motion consistency track with a pseudo-tracklet generated from observational data for better object association.
Paper: https://www.merl.com/publications/TR2024-041
5. "RILA: Reflective and Imaginative Language Agent for Zero-Shot Semantic Audio-Visual Navigation" by Z. Yang, J. Liu, P. Chen, A. Cherian, T. K. Marks, J. L. Roux, and C. Gan
We leverage Large Language Models (LLM) for zero-shot semantic audio visual navigation. Specifically, by employing multi-modal models to process sensory data, we instruct an LLM-based planner to actively explore the environment by adaptively evaluating and dismissing inaccurate perceptual descriptions.
Paper: https://www.merl.com/publications/TR2024-043
CVPR Workshop Papers:
1. "CoLa-SDF: Controllable Latent StyleSDF for Disentangled 3D Face Generation" by R. Dey, B. Egger, V. Boddeti, Y. Wang, and T. K. Marks
This paper proposes a new method for generating 3D faces and rendering them to images by combining the controllability of nonlinear 3DMMs with the high fidelity of implicit 3D GANs. Inspired by StyleSDF, our model uses a similar architecture but enforces the latent space to match the interpretable and physical parameters of the nonlinear 3D morphable model MOST-GAN.
Paper: https://www.merl.com/publications/TR2024-045
2. “Tracklet-based Explainable Video Anomaly Localization” by A. Singh, M. J. Jones, and E. Learned-Miller
This paper describes a new method for localizing anomalous activity in video of a scene given sample videos of normal activity from the same scene. The method is based on detecting and tracking objects in the scene and estimating high-level attributes of the objects such as their location, size, short-term trajectory and object class. These high-level attributes can then be used to detect unusual activity as well as to provide a human-understandable explanation for what is unusual about the activity.
Paper: https://www.merl.com/publications/TR2024-057
MERL co-organized workshops:
1. "Multimodal Algorithmic Reasoning Workshop" by A. Cherian, K-C. Peng, S. Lohit, M. Chatterjee, H. Zhou, K. Smith, T. K. Marks, J. Mathissen, and J. Tenenbaum
Workshop link: https://marworkshop.github.io/cvpr24/index.html
2. "The 5th Workshop on Fair, Data-Efficient, and Trusted Computer Vision" by K-C. Peng, et al.
Workshop link: https://fadetrcv.github.io/2024/
3. "SuperLoRA: Parameter-Efficient Unified Adaptation for Large Vision Models" by X. Chen, J. Liu, Y. Wang, P. Wang, M. Brand, G. Wang, and T. Koike-Akino
This paper proposes a generalized framework called SuperLoRA that unifies and extends different variants of low-rank adaptation (LoRA). Introducing new options with grouping, folding, shuffling, projection, and tensor decomposition, SuperLoRA offers high flexibility and demonstrates superior performance up to 10-fold gain in parameter efficiency for transfer learning tasks.
Paper: https://www.merl.com/publications/TR2024-062
-
- Date: September 26, 2023
Where: Virtual
MERL Contact: Anoop Cherian
Research Areas: Artificial Intelligence, Computer Vision, Machine Learning
Brief - Anoop Cherian, a Senior Principal Research Scientist in the Computer Vision team at MERL, gave a podcast interview with award-winning journalist, Deborah Yao. Deborah is the editor of AI Business -- a leading content platform for artificial intelligence and its applications in the real world, delivering its readers up-to-the-minute insights into how AI technologies are currently affecting the global economy and society. The podcast was based on the recent research that Anoop and his colleagues did at MERL with his collaborators at MIT; this research attempts to objectively answer the pertinent question: are current deep neural networks smarter than second graders? The podcast discusses shortcomings in the recent artificial general intelligence systems with regard to their capabilities for knowledge abstraction, learning, and generalization, which are brought out by this research.
-
- Date: October 2, 2023 - October 6, 2023
Where: Paris/France
MERL Contacts: Moitreya Chatterjee; Anoop Cherian; Michael J. Jones; Toshiaki Koike-Akino; Suhas Lohit; Tim K. Marks; Pedro Miraldo; Kuan-Chuan Peng; Ye Wang
Research Areas: Artificial Intelligence, Computer Vision, Machine Learning
Brief - MERL researchers are presenting 4 papers and organizing the VLAR-SMART-101 workshop at the ICCV 2023 conference, which will be held in Paris, France October 2-6. ICCV is one of the most prestigious and competitive international conferences in computer vision. Details are provided below.
1. Conference paper: “Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional Image Synthesis,” by Nithin Gopalakrishnan Nair, Anoop Cherian, Suhas Lohit, Ye Wang, Toshiaki Koike-Akino, Vishal Patel, and Tim K. Marks
Conditional generative models typically demand large annotated training sets to achieve high-quality synthesis. As a result, there has been significant interest in plug-and-play generation, i.e., using a pre-defined model to guide the generative process. In this paper, we introduce Steered Diffusion, a generalized framework for fine-grained photorealistic zero-shot conditional image generation using a diffusion model trained for unconditional generation. The key idea is to steer the image generation of the diffusion model during inference via designing a loss using a pre-trained inverse model that characterizes the conditional task. Our model shows clear qualitative and quantitative improvements over state-of-the-art diffusion-based plug-and-play models, while adding negligible computational cost.
2. Conference paper: "BANSAC: A dynamic BAyesian Network for adaptive SAmple Consensus," by Valter Piedade and Pedro Miraldo
We derive a dynamic Bayesian network that updates individual data points' inlier scores while iterating RANSAC. At each iteration, we apply weighted sampling using the updated scores. Our method works with or without prior data point scorings. In addition, we use the updated inlier/outlier scoring for deriving a new stopping criterion for the RANSAC loop. Our method outperforms the baselines in accuracy while needing less computational time.
3. Conference paper: "Robust Frame-to-Frame Camera Rotation Estimation in Crowded Scenes," by Fabien Delattre, David Dirnfeld, Phat Nguyen, Stephen Scarano, Michael J. Jones, Pedro Miraldo, and Erik Learned-Miller
We present a novel approach to estimating camera rotation in crowded, real-world scenes captured using a handheld monocular video camera. Our method uses a novel generalization of the Hough transform on SO3 to efficiently find the camera rotation most compatible with the optical flow. Because the setting is not addressed well by other data sets, we provide a new dataset and benchmark, with high-accuracy and rigorously annotated ground truth on 17 video sequences. Our method is more accurate by almost 40 percent than the next best method.
4. Workshop paper: "Tensor Factorization for Leveraging Cross-Modal Knowledge in Data-Constrained Infrared Object Detection" by Manish Sharma*, Moitreya Chatterjee*, Kuan-Chuan Peng, Suhas Lohit, and Michael Jones
While state-of-the-art object detection methods for RGB images have reached some level of maturity, the same is not true for Infrared (IR) images. The primary bottleneck towards bridging this gap is the lack of sufficient labeled training data in the IR images. Towards addressing this issue, we present TensorFact, a novel tensor decomposition method which splits the convolution kernels of a CNN into low-rank factor matrices with fewer parameters. This compressed network is first pre-trained on RGB images and then augmented with only a few parameters. This augmented network is then trained on IR images, while freezing the weights trained on RGB. This prevents it from over-fitting, allowing it to generalize better. Experiments show that our method outperforms state-of-the-art.
5. “Vision-and-Language Algorithmic Reasoning (VLAR) Workshop and SMART-101 Challenge” by Anoop Cherian, Kuan-Chuan Peng, Suhas Lohit, Tim K. Marks, Ram Ramrakhya, Honglu Zhou, Kevin A. Smith, Joanna Matthiesen, and Joshua B. Tenenbaum
MERL researchers along with researchers from MIT, GeorgiaTech, Math Kangaroo USA, and Rutgers University are jointly organizing a workshop on vision-and-language algorithmic reasoning at ICCV 2023 and conducting a challenge based on the SMART-101 puzzles described in the paper: Are Deep Neural Networks SMARTer than Second Graders?. A focus of this workshop is to bring together outstanding faculty/researchers working at the intersections of vision, language, and cognition to provide their opinions on the recent breakthroughs in large language models and artificial general intelligence, as well as showcase their cutting edge research that could inspire the audience to search for the missing pieces in our quest towards solving the puzzle of artificial intelligence.
Workshop link: https://wvlar.github.io/iccv23/
-
- Date: June 18, 2023 - June 22, 2023
Where: Vancouver/Canada
MERL Contacts: Anoop Cherian; Michael J. Jones; Suhas Lohit; Kuan-Chuan Peng
Research Areas: Artificial Intelligence, Computer Vision, Machine Learning
Brief - MERL researchers are presenting 4 papers and co-organizing a workshop at the CVPR 2023 conference, which will be held in Vancouver, Canada June 18-22. CVPR is one of the most prestigious and competitive international conferences in computer vision. Details are provided below.
1. “Are Deep Neural Networks SMARTer than Second Graders,” by Anoop Cherian, Kuan-Chuan Peng, Suhas Lohit, Kevin Smith, and Joshua B. Tenenbaum
We present SMART: a Simple Multimodal Algorithmic Reasoning Task and the associated SMART-101 dataset for evaluating the abstraction, deduction, and generalization abilities of neural networks in solving visuo-linguistic puzzles designed for children in the 6-8 age group. Our experiments using SMART-101 reveal that powerful deep models are not better than random accuracy when analyzed for generalization. We also evaluate large language models (including ChatGPT) on a subset of SMART-101 and find that while these models show convincing reasoning abilities, their answers are often incorrect.
Paper: https://arxiv.org/abs/2212.09993
2. “EVAL: Explainable Video Anomaly Localization,” by Ashish Singh, Michael J. Jones, and Erik Learned-Miller
This work presents a method for detecting unusual activities in videos by building a high-level model of activities found in nominal videos of a scene. The high-level features used in the model are human understandable and include attributes such as the object class and the directions and speeds of motion. Such high-level features allow our method to not only detect anomalous activity but also to provide explanations for why it is anomalous.
Paper: https://arxiv.org/abs/2212.07900
3. "Aligning Step-by-Step Instructional Diagrams to Video Demonstrations," by Jiahao Zhang, Anoop Cherian, Yanbin Liu, Yizhak Ben-Shabat, Cristian Rodriguez, and Stephen Gould
The rise of do-it-yourself (DIY) videos on the web has made it possible even for an unskilled person (or a skilled robot) to imitate and follow instructions to complete complex real world tasks. In this paper, we consider the novel problem of aligning instruction steps that are depicted as assembly diagrams (commonly seen in Ikea assembly manuals) with video segments from in-the-wild videos. We present a new dataset: Ikea Assembly in the Wild (IAW) and propose a contrastive learning framework for aligning instruction diagrams with video clips.
Paper: https://arxiv.org/pdf/2303.13800.pdf
4. "HaLP: Hallucinating Latent Positives for Skeleton-Based Self-Supervised Learning of Actions," by Anshul Shah, Aniket Roy, Ketul Shah, Shlok Kumar Mishra, David Jacobs, Anoop Cherian, and Rama Chellappa
In this work, we propose a new contrastive learning approach to train models for skeleton-based action recognition without labels. Our key contribution is a simple module, HaLP: Hallucinating Latent Positives for contrastive learning. HaLP explores the latent space of poses in suitable directions to generate new positives. Our experiments using HaLP demonstrates strong empirical improvements.
Paper: https://arxiv.org/abs/2304.00387
The 4th Workshop on Fair, Data-Efficient, and Trusted Computer Vision
MERL researcher Kuan-Chuan Peng is co-organizing the fourth Workshop on Fair, Data-Efficient, and Trusted Computer Vision (https://fadetrcv.github.io/2023/) in conjunction with CVPR 2023 on June 18, 2023. This workshop provides a focused venue for discussing and disseminating research in the areas of fairness, bias, and trust in computer vision, as well as adjacent domains such as computational social science and public policy.
-
- Date: May 29, 2023 - June 2, 2023
Where: 2023 IEEE International Conference on Robotics and Automation (ICRA)
MERL Contacts: Anoop Cherian; Radu Corcodel; Siddarth Jain; Devesh K. Jha; Toshiaki Koike-Akino; Tim K. Marks; Daniel N. Nikovski; Arvind Raghunathan; Diego Romeres
Research Areas: Computer Vision, Machine Learning, Optimization, Robotics
Brief - MERL researchers will present thirteen papers, including eight main conference papers and five workshop papers, at the 2023 IEEE International Conference on Robotics and Automation (ICRA) to be held in London, UK from May 29 to June 2. ICRA is one of the largest and most prestigious conferences in the robotics community. The papers cover a broad set of topics in Robotics including estimation, manipulation, vision-based object recognition and segmentation, tactile estimation and tool manipulation, robotic food handling, robot skill learning, and model-based reinforcement learning.
In addition to the paper presentations, MERL robotics researchers will also host an exhibition booth and look forward to discussing our research with visitors.
-
- Date: January 12, 2023
Awarded to: William T. Freeman, Thouis R. Jones, and Egon C. Pasztor
Awarded by: IEEE Computer Society
Research Areas: Computer Vision, Machine Learning
Brief - The MERL paper entitled, "Example-Based Super-Resolution" by William T. Freeman, Thouis R. Jones, and Egon C. Pasztor, published in a 2002 issue of IEEE Computer Graphics and Applications, has been awarded a 2021 Test of Time Award by the IEEE Computer Society. This work was done while the principal investigator, Prof. Freeman, was a research scientist at MERL; he is now a Professor of Electrical Engineering and Computer Science at MIT.
This best paper award recognizes regular or special issue papers published by the magazine that have made profound and long-lasting research impacts in bridging the theory and practice of computer graphics. "This paper is an early example of using learning for a low-level vision task and we are very proud of the pioneering work that MERL has done in this area prior to the deep learning revolution," says Anthony Vetro, VP & Director at MERL.
-
- Date: November 29, 2022 - December 9, 2022
Where: NeurIPS 2022
MERL Contacts: Moitreya Chatterjee; Anoop Cherian; Michael J. Jones; Suhas Lohit
Research Areas: Artificial Intelligence, Computer Vision, Machine Learning, Speech & Audio
Brief - MERL researchers are presenting 5 papers at the NeurIPS Conference, which will be held in New Orleans from Nov 29-Dec 1st, with virtual presentations in the following week. NeurIPS is one of the most prestigious and competitive international conferences in machine learning.
MERL papers in NeurIPS 2022:
1. “AVLEN: Audio-Visual-Language Embodied Navigation in 3D Environments” by Sudipta Paul, Amit Roy-Chowdhary, and Anoop Cherian
This work proposes a unified multimodal task for audio-visual embodied navigation where the navigating agent can also interact and seek help from a human/oracle in natural language when it is uncertain of its navigation actions. We propose a multimodal deep hierarchical reinforcement learning framework for solving this challenging task that allows the agent to learn when to seek help and how to use the language instructions. AVLEN agents can interact anywhere in the 3D navigation space and demonstrate state-of-the-art performances when the audio-goal is sporadic or when distractor sounds are present.
2. “Learning Partial Equivariances From Data” by David W. Romero and Suhas Lohit
Group equivariance serves as a good prior improving data efficiency and generalization for deep neural networks, especially in settings with data or memory constraints. However, if the symmetry groups are misspecified, equivariance can be overly restrictive and lead to bad performance. This paper shows how to build partial group convolutional neural networks that learn to adapt the equivariance levels at each layer that are suitable for the task at hand directly from data. This improves performance while retaining equivariance properties approximately.
3. “Learning Audio-Visual Dynamics Using Scene Graphs for Audio Source Separation” by Moitreya Chatterjee, Narendra Ahuja, and Anoop Cherian
There often exist strong correlations between the 3D motion dynamics of a sounding source and its sound being heard, especially when the source is moving towards or away from the microphone. In this paper, we propose an audio-visual scene-graph that learns and leverages such correlations for improved visually-guided audio separation from an audio mixture, while also allowing predicting the direction of motion of the sound source.
4. “What Makes a "Good" Data Augmentation in Knowledge Distillation - A Statistical Perspective” by Huan Wang, Suhas Lohit, Michael Jones, and Yun Fu
This paper presents theoretical and practical results for understanding what makes a particular data augmentation technique (DA) suitable for knowledge distillation (KD). We design a simple metric that works very well in practice to predict the effectiveness of DA for KD. Based on this metric, we also propose a new data augmentation technique that outperforms other methods for knowledge distillation in image recognition networks.
5. “FeLMi : Few shot Learning with hard Mixup” by Aniket Roy, Anshul Shah, Ketul Shah, Prithviraj Dhar, Anoop Cherian, and Rama Chellappa
Learning from only a few examples is a fundamental challenge in machine learning. Recent approaches show benefits by learning a feature extractor on the abundant and labeled base examples and transferring these to the fewer novel examples. However, the latter stage is often prone to overfitting due to the small size of few-shot datasets. In this paper, we propose a novel uncertainty-based criteria to synthetically produce “hard” and useful data by mixing up real data samples. Our approach leads to state-of-the-art results on various computer vision few-shot benchmarks.
-
- Date: September 21, 2022
MERL Contacts: Philip V. Orlik; Anthony Vetro
Research Areas: Applied Physics, Artificial Intelligence, Communications, Computational Sensing, Computer Vision, Control, Data Analytics, Dynamical Systems, Electric Systems, Electronic and Photonic Devices, Machine Learning, Multi-Physical Modeling, Optimization, Robotics, Signal Processing, Speech & Audio
Brief - Mitsubishi Electric Research Laboratories (MERL) invites qualified postdoctoral candidates to apply for the position of Postdoctoral Research Fellow. This position provides early career scientists the opportunity to work at a unique, academically-oriented industrial research laboratory. Successful candidates will be expected to define and pursue their own original research agenda, explore connections to established laboratory initiatives, and publish high impact articles in leading venues. Please refer to our web page for further details.
-
- Date: May 22, 2022 - May 27, 2022
Where: Singapore
MERL Contacts: Anoop Cherian; Chiori Hori; Toshiaki Koike-Akino; Jonathan Le Roux; Tim K. Marks; Philip V. Orlik; Kuan-Chuan Peng; Pu (Perry) Wang; Gordon Wichern
Research Areas: Artificial Intelligence, Computer Vision, Signal Processing, Speech & Audio
Brief - MERL researchers are presenting 8 papers at the IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP), which is being held in Singapore from May 22-27, 2022. A week of virtual presentations also took place earlier this month.
Topics to be presented include recent advances in speech recognition, audio processing, scene understanding, computational sensing, and classification.
ICASSP is the flagship conference of the IEEE Signal Processing Society, and the world's largest and most comprehensive technical conference focused on the research advances and latest technological development in signal and information processing. The event attracts more than 2000 participants each year.
-
- Date: May 16, 2022 - May 20, 2022
Where: Seoul, Korea
MERL Contacts: Jianlin Guo; Toshiaki Koike-Akino; Philip V. Orlik; Kieran Parsons; Pu (Perry) Wang; Ye Wang
Research Areas: Artificial Intelligence, Communications, Computational Sensing, Computer Vision, Machine Learning, Signal Processing
Brief - MERL Connectivity & Information Processing Team scientists remotely presented 5 papers at the IEEE International Conference on Communications (ICC) 2022, held in Seoul Korea on May 16-20, 2022. Topics presented include recent advancements in communications technologies, deep learning methods, and quantum machine learning (QML). Presentation videos are also found on our YouTube channel. In addition, K. J. Kim organized "Industrial Private 5G-and-beyond Wireless Networks Workshop" at the conference.
IEEE ICC is one of two IEEE Communications Society’s flagship conferences (ICC and Globecom). Each year, close to 2,000 attendees from over 70 countries attend IEEE ICC to take advantage of a program which consists of exciting keynote session, robust technical paper sessions, innovative tutorials and workshops, and engaging industry sessions. This 5-day event is known for bringing together audiences from both industry and academia to learn about the latest research and innovations in communications and networking technology, share ideas and best practices, and collaborate on future projects.
-
- Date: May 4, 2022
MERL Contact: Radu Corcodel
Research Areas: Computer Vision, Robotics
Brief - Radu Corcodel, a Principal Research Scientist in MERL's Computer Vision Group, will present an overview of the Robot Perception research published by MERL for advanced manipulation. The talk will mainly cover topics pertaining to robotic manipulation in unstructured environments such as machine vision, tactile sensing and autonomous grasping. The seminar will also cover specific perception problems in non-prehensile interactions such as Contact-Implicit Trajectory Optimization and Tactile Classification, and is intended for a broader audience.
-
- Date: March 1, 2022
MERL Contacts: Anoop Cherian; Chiori Hori; Jonathan Le Roux; Tim K. Marks; Anthony Vetro
Research Areas: Artificial Intelligence, Computer Vision, Machine Learning, Speech & Audio
Brief - MERL's research on scene-aware interaction was recently featured in an IEEE Spectrum article. The article, titled "At Last, A Self-Driving Car That Can Explain Itself" and authored by MERL Senior Principal Research Scientist Chiori Hori and MERL Director Anthony Vetro, gives an overview of MERL's efforts towards developing a system that can analyze multimodal sensing information for highly natural and intuitive interaction with humans through context-dependent generation of natural language. The technology recognizes contextual objects and events based on multimodal sensing information, such as images and video captured with cameras, audio information recorded with microphones, and localization information measured with LiDAR.
Scene-Aware Interaction for car navigation, one target application that the article focuses on, will provide drivers with intuitive route guidance. Scene-Aware Interaction technology is expected to have wide applicability, including human-machine interfaces for in-vehicle infotainment, interaction with service robots in building and factory automation systems, systems that monitor the health and well-being of people, surveillance systems that interpret complex scenes for humans and encourage social distancing, support for touchless operation of equipment in public areas, and much more. MERL's Scene-Aware Interaction Technology had previously been featured in a Mitsubishi Electric Corporation Press Release.
IEEE Spectrum is the flagship magazine and website of the IEEE, the world’s largest professional organization devoted to engineering and the applied sciences. IEEE Spectrum has a circulation of over 400,000 engineers worldwide, making it one of the leading science and engineering magazines.
-
- Date: September 7, 2021
MERL Contact: Anoop Cherian
Research Areas: Artificial Intelligence, Computer Vision, Machine Learning
Brief - Anoop Cherian, a Principal Research Scientist in MERL's Computer Vision group, gave an invited virtual talk on "InSeGAN: An Unsupervised Approach to Identical Instance Segmentation" at the Visual Information Laboratory of University of Bristol, UK. The talk described a new approach to segmenting varied appearances of nearly identical 3D objects in depth images. More details of the talk can be found in the following paper https://arxiv.org/abs/2108.13865, which will be presented at the International Conference on Computer Vision (ICCV'21).
-
- Date: August 12, 2021
MERL Contact: Anthony Vetro
Research Areas: Artificial Intelligence, Computer Vision, Control, Dynamical Systems, Machine Learning, Optimization, Robotics
Brief - Anthony Vetro gave a keynote at the inaugural IEEE Conference on Autonomous Systems (ICAS), which was held virtually from August 11-13, 2021. The talk focused on challenges and recent progress in the area of robotic manipulation. The conference is sponsored by IEEE Signal Processing Society (SPS) through the SPS Autonomous Systems Initiative.
Abstract: Human-level manipulation continues to be beyond the capabilities of today’s robotic systems. Not only do current industrial robots require significant time to program a specific task, but they lack the flexibility to generalize to other tasks and be robust to changes in the environment. While collaborative robots help to reduce programming effort and improve the user interface, they still fall short on generalization and robustness. This talk will highlight recent advances in a number of key areas to improve the manipulation capabilities of autonomous robots, including methods to accurately model the dynamics of the robot and contact forces, sensors and signal processing algorithms to provide improved perception, optimization-based decision-making and control techniques, as well as new methods of interactivity to accelerate and enhance robot learning.
-
- Date: February 15, 2021
Where: The 2nd International Symposium on AI Electronics
MERL Contact: Chiori Hori
Research Areas: Artificial Intelligence, Computer Vision, Machine Learning, Speech & Audio
Brief - Chiori Hori, a Senior Principal Researcher in MERL's Speech and Audio Team, will be a keynote speaker at the 2nd International Symposium on AI Electronics, alongside Alex Acero, Senior Director of Apple Siri, Roberto Cipolla, Professor of Information Engineering at the University of Cambridge, and Hiroshi Amano, Professor at Nagoya University and winner of the Nobel prize in Physics for his work on blue light-emitting diodes. The symposium, organized by Tohoku University, will be held online on February 15, 2021, 10am-4pm (JST).
Chiori's talk, titled "Human Perspective Scene Understanding via Multimodal Sensing", will present MERL's work towards the development of scene-aware interaction. One important piece of technology that is still missing for human-machine interaction is natural and context-aware interaction, where machines understand their surrounding scene from the human perspective, and they can share their understanding with humans using natural language. To bridge this communications gap, MERL has been working at the intersection of research fields such as spoken dialog, audio-visual understanding, sensor signal understanding, and robotics technologies in order to build a new AI paradigm, called scene-aware interaction, that enables machines to translate their perception and understanding of a scene and respond to it using natural language to interact more effectively with humans. In this talk, the technologies will be surveyed, and an application for future car navigation will be introduced.
-
- Date: January 6, 2021
Awarded to: Rushil Anirudh, Suhas Lohit, Pavan Turaga
MERL Contact: Suhas Lohit
Research Areas: Computational Sensing, Computer Vision, Machine Learning
Brief - A team of researchers from Mitsubishi Electric Research Laboratories (MERL), Lawrence Livermore National Laboratory (LLNL) and Arizona State University (ASU) received the Best Paper Honorable Mention Award at WACV 2021 for their paper "Generative Patch Priors for Practical Compressive Image Recovery".
The paper proposes a novel model of natural images as a composition of small patches which are obtained from a deep generative network. This is unlike prior approaches where the networks attempt to model image-level distributions and are unable to generalize outside training distributions. The key idea in this paper is that learning patch-level statistics is far easier. As the authors demonstrate, this model can then be used to efficiently solve challenging inverse problems in imaging such as compressive image recovery and inpainting even from very few measurements for diverse natural scenes.
-
- Date: October 13, 2020
MERL Contact: Siddarth Jain
Research Areas: Artificial Intelligence, Computer Vision, Machine Learning, Robotics
Brief - Computer vision and robotics researcher, Siddarth Jain, has been appointed to the editorial board of the IEEE Robotics and Automation Letters (RA-L) as an Associate Editor. Siddarth joined MERL in September 2019 after obtaining his Ph.D. in robotics from Northwestern University, where he developed novel robotics systems to help people with motor-impairments in performing activities of daily living tasks.
RA-L publishes peer-reviewed articles in areas of robotics and automation. RA-L also provides a unique feature to the authors with the opportunity to publish a paper in a peer-reviewed journal and present the same paper at the annual flagship robotics conferences of IEEE RAS, including ICRA, IROS, and CASE.
-
- Date: August 23, 2020
Where: European Conference on Computer Vision (ECCV), online, 2020
MERL Contact: Anoop Cherian
Research Areas: Artificial Intelligence, Computer Vision, Machine Learning, Speech & Audio
Brief - MERL Principal Research Scientist Anoop Cherian gave an invited talk titled "Sound2Sight: Audio-Conditioned Visual Imagination" at the Multi-modal Video Analysis workshop held in conjunction with the European Conference on Computer Vision (ECCV), 2020. The talk was based on a recent ECCV paper that describes a new multimodal reasoning task called Sound2Sight and a generative adversarial machine learning algorithm for producing plausible video sequences conditioned on sound and visual context.
-
- Date: July 22, 2020
Where: Tokyo, Japan
MERL Contacts: Anoop Cherian; Chiori Hori; Jonathan Le Roux; Tim K. Marks; Anthony Vetro
Research Areas: Artificial Intelligence, Computer Vision, Machine Learning, Speech & Audio
Brief - Mitsubishi Electric Corporation announced that the company has developed what it believes to be the world’s first technology capable of highly natural and intuitive interaction with humans based on a scene-aware capability to translate multimodal sensing information into natural language.
The novel technology, Scene-Aware Interaction, incorporates Mitsubishi Electric’s proprietary Maisart® compact AI technology to analyze multimodal sensing information for highly natural and intuitive interaction with humans through context-dependent generation of natural language. The technology recognizes contextual objects and events based on multimodal sensing information, such as images and video captured with cameras, audio information recorded with microphones, and localization information measured with LiDAR.
Scene-Aware Interaction for car navigation, one target application, will provide drivers with intuitive route guidance. The technology is also expected to have applicability to human-machine interfaces for in-vehicle infotainment, interaction with service robots in building and factory automation systems, systems that monitor the health and well-being of people, surveillance systems that interpret complex scenes for humans and encourage social distancing, support for touchless operation of equipment in public areas, and much more. The technology is based on recent research by MERL's Speech & Audio and Computer Vision groups.
-
- Date: July 12, 2020 - July 18, 2020
Where: Vienna, Austria (virtual this year)
MERL Contacts: Mouhacine Benosman; Anoop Cherian; Devesh K. Jha; Daniel N. Nikovski
Research Areas: Artificial Intelligence, Computer Vision, Data Analytics, Dynamical Systems, Machine Learning, Optimization, Robotics
Brief - MERL researchers are presenting three papers at the International Conference on Machine Learning (ICML 2020), which is virtually held this year from 12-18th July. ICML is one of the top-tier conferences in machine learning with an acceptance rate of 22%. The MERL papers are:
1) "Finite-time convergence in Continuous-Time Optimization" by Orlando Romero and Mouhacine Benosman.
2) "Can Increasing Input Dimensionality Improve Deep Reinforcement Learning?" by Kei Ota, Tomoaki Oiki, Devesh Jha, Toshisada Mariyama, and Daniel Nikovski.
3) "Representation Learning Using Adversarially-Contrastive Optimal Transport" by Anoop Cherian and Shuchin Aeron.
-
- Date: June 14, 2020 - June 19, 2020
MERL Contacts: Anoop Cherian; Michael J. Jones; Toshiaki Koike-Akino; Tim K. Marks; Kuan-Chuan Peng; Ye Wang
Research Areas: Artificial Intelligence, Computer Vision, Machine Learning
Brief - MERL researchers are presenting four papers (two oral papers and two posters) and organizing two workshops at the IEEE/CVF Computer Vision and Pattern Recognition (CVPR 2020) conference.
CVPR 2020 Orals with MERL authors:
1. "Dynamic Multiscale Graph Neural Networks for 3D Skeleton Based Human Motion Prediction," by Maosen Li, Siheng Chen, Yangheng Zhao, Ya Zhang, Yanfeng Wang, Qi Tian
2. "Collaborative Motion Prediction via Neural Motion Message Passing," by Yue Hu, Siheng Chen, Ya Zhang, Xiao Gu
CVPR 2020 Posters with MERL authors:
3. "LUVLi Face Alignment: Estimating Landmarks’ Location, Uncertainty, and Visibility Likelihood," by Abhinav Kumar, Tim K. Marks, Wenxuan Mou, Ye Wang, Michael Jones, Anoop Cherian, Toshiaki Koike-Akino, Xiaoming Liu, Chen Feng
4. "MotionNet: Joint Perception and Motion Prediction for Autonomous Driving Based on Bird’s Eye View Maps," by Pengxiang Wu, Siheng Chen, Dimitris N. Metaxas
CVPR 2020 Workshops co-organized by MERL researchers:
1. Fair, Data-Efficient and Trusted Computer Vision
2. Deep Declarative Networks.
-
- Date: May 4, 2020 - May 8, 2020
Where: Virtual Barcelona
MERL Contacts: Karl Berntorp; Petros T. Boufounos; Chiori Hori; Toshiaki Koike-Akino; Jonathan Le Roux; Dehong Liu; Yanting Ma; Hassan Mansour; Philip V. Orlik; Anthony Vetro; Pu (Perry) Wang; Gordon Wichern
Research Areas: Computational Sensing, Computer Vision, Machine Learning, Signal Processing, Speech & Audio
Brief - MERL researchers are presenting 13 papers at the IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP), which is being held virtually from May 4-8, 2020. Petros Boufounos is also presenting a talk on the Computational Sensing Revolution in Array Processing (video) in ICASSP’s Industry Track, and Siheng Chen is co-organizing and chairing a special session on a Signal-Processing View of Graph Neural Networks.
Topics to be presented include recent advances in speech recognition, audio processing, scene understanding, computational sensing, array processing, and parameter estimation. Videos for all talks are available on MERL's YouTube channel, with corresponding links in the references below.
This year again, MERL is a sponsor of the conference and will be participating in the Student Job Fair; please join us to learn about our internship program and career opportunities.
ICASSP is the flagship conference of the IEEE Signal Processing Society, and the world's largest and most comprehensive technical conference focused on the research advances and latest technological development in signal and information processing. The event attracts more than 2000 participants each year. Originally planned to be held in Barcelona, Spain, ICASSP has moved to a fully virtual setting due to the COVID-19 crisis, with free registration for participants not covering a paper.
-
- Date: December 9, 2019 - December 13, 2019
Where: Waikoloa, Hawaii, USA
MERL Contacts: Jianlin Guo; Toshiaki Koike-Akino; Philip V. Orlik; Pu (Perry) Wang
Research Areas: Communications, Computer Vision, Machine Learning, Signal Processing, Information Security
Brief - MERL Signal Processing scientists and collaborators will be presenting 11 papers at the IEEE Global Communications Conference (GLOBECOM) 2019, which is being held in Waikoloa, Hawaii from December 9-13, 2019. Topics to be presented include recent advances in power amplifier, MIMO algorithms, WiFi sensing, video casting, visible light communications, user authentication, vehicular communications, secrecy, and relay systems, including sophisticated machine learning applications. A number of these papers are a result of successful collaboration between MERL and world-leading Universities including: Osaka University, University of New South Wales, Oxford University, Princeton University, South China University of Technology, Massachusetts Institute of Technology and Aalborg University.
GLOBECOM is one of the IEEE Communications Society’s two flagship conferences dedicated to driving innovation in nearly every aspect of communications. Each year, more than 3000 scientific researchers and their management submit proposals for program sessions to be held at the annual conference. Themed “Revolutionizing Communications,” GLOBECOM2019 will feature a comprehensive high-quality technical program including 13 symposia and a variety of tutorials and workshops to share visions and ideas, obtain updates on latest technologies and expand professional and social networking.
-
- Date: October 27, 2019
Awarded to: Abhinav Kumar, Tim K. Marks, Wenxuan Mou, Chen Feng, Xiaoming Liu
MERL Contact: Tim K. Marks
Research Areas: Artificial Intelligence, Computer Vision, Machine Learning
Brief - MERL researcher Tim Marks, former MERL interns Abhinav Kumar and Wenxuan Mou, and MERL consultants Professor Chen Feng (NYU) and Professor Xiaoming Liu (MSU) received the Best Oral Paper Award at the IEEE/CVF International Conference on Computer Vision (ICCV) 2019 Workshop on Statistical Deep Learning in Computer Vision (SDL-CV) held in Seoul, Korea. Their paper, entitled "UGLLI Face Alignment: Estimating Uncertainty with Gaussian Log-Likelihood Loss," describes a method which, given an image of a face, estimates not only the locations of facial landmarks but also the uncertainty of each landmark location estimate.
-
- Date: April 23, 2019
Awarded to: Teng-yok Lee
Research Areas: Artificial Intelligence, Computer Vision, Data Analytics, Machine Learning
Brief - MERL researcher Teng-yok Lee has won the Best Visualization Note Award at the PacificVis 2019 conference held in Bangkok Thailand, from April 23-26, 2019. The paper entitled "Space-Time Slicing: Visualizing Object Detector Performance in Driving Video Sequences" presents a visualization method called Space-Time Slicing to assist a human developer in the development of object detectors for driving applications without requiring labeled data. Space-Time Slicing reveals patterns in the detection data that can suggest the presence of false positives and false negatives.
-