NEWS MERL Papers and Workshops at CVPR 2024

Date released: May 10, 2024

NEWS MERL Papers and Workshops at CVPR 2024
Date:

June 17, 2024 - June 21, 2024
Where:

Seattle, WA
Description:

MERL researchers are presenting 5 conference papers, 3 workshop papers, and are co-organizing two workshops at the CVPR 2024 conference, which will be held in Seattle, June 17-21. CVPR is one of the most prestigious and competitive international conferences in computer vision. Details of MERL contributions are provided below.

CVPR Conference Papers:

1. "TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models" by H. Ni, B. Egger, S. Lohit, A. Cherian, Y. Wang, T. Koike-Akino, S. X. Huang, and T. K. Marks

This work enables a pretrained text-to-video (T2V) diffusion model to be additionally conditioned on an input image (first video frame), yielding a text+image to video (TI2V) model. Other than using the pretrained T2V model, our method requires no ("zero") training or fine-tuning. The paper uses a "repeat-and-slide" method and diffusion resampling to synthesize videos from a given starting image and text describing the video content.

Paper: https://www.merl.com/publications/TR2024-059
Project page: https://merl.com/research/highlights/TI2V-Zero

2. "Long-Tailed Anomaly Detection with Learnable Class Names" by C.-H. Ho, K.-C. Peng, and N. Vasconcelos

This work aims to identify defects across various classes without relying on hard-coded class names. We introduce the concept of long-tailed anomaly detection, addressing challenges like class imbalance and dataset variability. Our proposed method combines reconstruction and semantic modules, learning pseudo-class names and utilizing a variational autoencoder for feature synthesis to improve performance in long-tailed datasets, outperforming existing methods in experiments.

Paper: https://www.merl.com/publications/TR2024-040

3. "Gear-NeRF: Free-Viewpoint Rendering and Tracking with Motion-aware Spatio-Temporal Sampling" by X. Liu, Y-W. Tai, C-T. Tang, P. Miraldo, S. Lohit, and M. Chatterjee

This work presents a new strategy for rendering dynamic scenes from novel viewpoints. Our approach is based on stratifying the scene into regions based on the extent of motion of the region, which is automatically determined. Regions with higher motion are permitted a denser spatio-temporal sampling strategy for more faithful rendering of the scene. Additionally, to the best of our knowledge, ours is the first work to enable tracking of objects in the scene from novel views - based on the preferences of a user, provided by a click.

Paper: https://www.merl.com/publications/TR2024-042

4. "SIRA: Scalable Inter-frame Relation and Association for Radar Perception" by R. Yataka, P. Wang, P. T. Boufounos, and R. Takahashi

Overcoming the limitations on radar feature extraction such as low spatial resolution, multipath reflection, and motion blurs, this paper proposes SIRA (Scalable Inter-frame Relation and Association) for scalable radar perception with two designs: 1) extended temporal relation, generalizing the existing temporal relation layer from two frames to multiple inter-frames with temporally regrouped window attention for scalability; and 2) motion consistency track with a pseudo-tracklet generated from observational data for better object association.

Paper: https://www.merl.com/publications/TR2024-041

5. "RILA: Reflective and Imaginative Language Agent for Zero-Shot Semantic Audio-Visual Navigation" by Z. Yang, J. Liu, P. Chen, A. Cherian, T. K. Marks, J. L. Roux, and C. Gan

We leverage Large Language Models (LLM) for zero-shot semantic audio visual navigation. Specifically, by employing multi-modal models to process sensory data, we instruct an LLM-based planner to actively explore the environment by adaptively evaluating and dismissing inaccurate perceptual descriptions.

Paper: https://www.merl.com/publications/TR2024-043

CVPR Workshop Papers:

1. "CoLa-SDF: Controllable Latent StyleSDF for Disentangled 3D Face Generation" by R. Dey, B. Egger, V. Boddeti, Y. Wang, and T. K. Marks

This paper proposes a new method for generating 3D faces and rendering them to images by combining the controllability of nonlinear 3DMMs with the high fidelity of implicit 3D GANs. Inspired by StyleSDF, our model uses a similar architecture but enforces the latent space to match the interpretable and physical parameters of the nonlinear 3D morphable model MOST-GAN.

Paper: https://www.merl.com/publications/TR2024-045

2. “Tracklet-based Explainable Video Anomaly Localization” by A. Singh, M. J. Jones, and E. Learned-Miller

This paper describes a new method for localizing anomalous activity in video of a scene given sample videos of normal activity from the same scene. The method is based on detecting and tracking objects in the scene and estimating high-level attributes of the objects such as their location, size, short-term trajectory and object class. These high-level attributes can then be used to detect unusual activity as well as to provide a human-understandable explanation for what is unusual about the activity.

Paper: https://www.merl.com/publications/TR2024-057

MERL co-organized workshops:

1. "Multimodal Algorithmic Reasoning Workshop" by A. Cherian, K-C. Peng, S. Lohit, M. Chatterjee, H. Zhou, K. Smith, T. K. Marks, J. Mathissen, and J. Tenenbaum

Workshop link: https://marworkshop.github.io/cvpr24/index.html

2. "The 5th Workshop on Fair, Data-Efficient, and Trusted Computer Vision" by K-C. Peng, et al.

Workshop link: https://fadetrcv.github.io/2024/

3. "SuperLoRA: Parameter-Efficient Unified Adaptation for Large Vision Models" by X. Chen, J. Liu, Y. Wang, P. Wang, M. Brand, G. Wang, and T. Koike-Akino

This paper proposes a generalized framework called SuperLoRA that unifies and extends different variants of low-rank adaptation (LoRA). Introducing new options with grouping, folding, shuffling, projection, and tensor decomposition, SuperLoRA offers high flexibility and demonstrates superior performance up to 10-fold gain in parameter efficiency for transfer learning tasks.

Paper: https://www.merl.com/publications/TR2024-062
External Link:

https://cvpr.thecvf.com/
MERL Contacts:
Research Areas:

Artificial Intelligence, Computational Sensing, Computer Vision, Machine Learning, Speech & Audio
- Related Publications
  Ni, H., Egger, B., Lohit, S., Cherian, A., Wang, Y., Koike-Akino, T., Huang, S.X., Marks, T.K., "TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2024, pp. 9015-9025.
  BibTeX TR2024-059 PDF Video Software Presentation
  @inproceedings{Ni2024jun,
  author = {Ni, Haomiao and Egger, Bernhard and Lohit, Suhas and Cherian, Anoop and Wang, Ye and Koike-Akino, Toshiaki and Huang, Sharon X. and Marks, Tim K.},
  title = {{TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models}},
  booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = 2024,
  pages = {9015--9025},
  month = jun,
  url = {https://www.merl.com/publications/TR2024-059}
  }
  Singh, A., Jones, M.J., Learned-Miller, E., "Tracklet-based Explainable Video Anomaly Localization", IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, May 2024, pp. 3992-4001.
  BibTeX TR2024-057 PDF
  @inproceedings{Singh2024may,
  author = {Singh, Ashish and Jones, Michael J. and Learned-Miller, Erik},
  title = {{Tracklet-based Explainable Video Anomaly Localization}},
  booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
  year = 2024,
  pages = {3992--4001},
  month = may,
  url = {https://www.merl.com/publications/TR2024-057}
  }
  Dey, R., Egger, B., Boddeti, V., Wang, Y., Marks, T.K., "CoLa-SDF: Controllable Latent StyleSDF for Disentangled 3D Face Generation", IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), DOI: 10.1109/CVPRW63382.2024.00291, June 2024, pp. 2852-2861.
  BibTeX TR2024-045 PDF
  @inproceedings{Dey2024apr,
  author = {Dey, Rahul and Egger, Bernhard and Boddeti, Vishnu and Wang, Ye and Marks, Tim K.},
  title = {{CoLa-SDF: Controllable Latent StyleSDF for Disentangled 3D Face Generation}},
  booktitle = {IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)},
  year = 2024,
  pages = {2852--2861},
  month = apr,
  publisher = {IEEE},
  doi = {10.1109/CVPRW63382.2024.00291},
  isbn = {979-8-3503-6547-4},
  url = {https://www.merl.com/publications/TR2024-045}
  }
  Yang, Z., Liu, J., Chen, P., Cherian, A., Marks, T.K., Le Roux, J., Gan, C., "RILA: Reflective and Imaginative Language Agent for Zero-Shot Semantic Audio-Visual Navigation", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), April 2024, pp. 16251-16261.
  BibTeX TR2024-043 PDF
  @inproceedings{Yang2024apr,
  author = {Yang, Zeyuan and Liu, Jiageng and Chen, Peihao and Cherian, Anoop and Marks, Tim K. and {Le Roux}, Jonathan and Gan, Chuang},
  title = {{RILA: Reflective and Imaginative Language Agent for Zero-Shot Semantic Audio-Visual Navigation}},
  booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = 2024,
  pages = {16251--16261},
  month = apr,
  publisher = {CVF},
  url = {https://www.merl.com/publications/TR2024-043}
  }
  Liu, X., Tai, Y.-W., Tang, C.-K., Miraldo, P., Lohit, S., Chatterjee, M., "Gear-NeRF: Free-Viewpoint Rendering and Tracking with Motion-aware Spatio-Temporal Sampling", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), May 2024, pp. 19667-19679.
  BibTeX TR2024-042 PDF Videos Software
  @inproceedings{Liu2024may,
  author = {Liu, Xinhang and Tai, Yu-wing and Tang, Chi-Keung and Miraldo, Pedro and Lohit, Suhas and Chatterjee, Moitreya},
  title = {{Gear-NeRF: Free-Viewpoint Rendering and Tracking with Motion-aware Spatio-Temporal Sampling}},
  booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = 2024,
  pages = {19667--19679},
  month = may,
  publisher = {IEEE},
  url = {https://www.merl.com/publications/TR2024-042}
  }
  Yataka, R., Wang, P., Boufounos, P.T., Takahashi, R., "SIRA: Scalable Inter-frame Relation and Association for Radar Perception", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2024, pp. 15024-15034.
  BibTeX TR2024-041 PDF Video
  @inproceedings{Yataka2024jun,
  author = {Yataka, Ryoma and Wang, Pu and Boufounos, Petros T. and Takahashi, Ryuhei},
  title = {{SIRA: Scalable Inter-frame Relation and Association for Radar Perception}},
  booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = 2024,
  pages = {15024--15034},
  month = jun,
  url = {https://www.merl.com/publications/TR2024-041}
  }
  Ho, C.-H., Peng, K.-C., Vasconcelos, N., "Long-Tailed Anomaly Detection with Learnable Class Names", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Farhadi, A. and Crandall, D. and Sato, I. and Wu, J. and Pless, R. and Akata, Z., Eds., DOI: 10.1109/CVPR52733.2024.01182, June 2024, pp. 12435-12446.
  BibTeX TR2024-040 PDF Video Data Presentation
  @inproceedings{Ho2024jun,
  author = {Ho, Chih-Hui and Peng, Kuan-Chuan and Vasconcelos, Nuno},
  title = {{Long-Tailed Anomaly Detection with Learnable Class Names}},
  booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = 2024,
  editor = {Farhadi, A. and Crandall, D. and Sato, I. and Wu, J. and Pless, R. and Akata, Z.},
  pages = {12435--12446},
  month = jun,
  publisher = {IEEE},
  doi = {10.1109/CVPR52733.2024.01182},
  issn = {2575-7075},
  isbn = {979-8-3503-5300-6},
  url = {https://www.merl.com/publications/TR2024-040}
  }
  Chen, X., Liu, J., Wang, Y., Wang, P., Brand, M., Wang, G., Koike-Akino, T., "SuperLoRA: Parameter-Efficient Unified Adaptation for Large Vision Models", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), DOI: 10.1109/CVPRW63382.2024.00804, June 2024, pp. 8050-8055.
  BibTeX TR2024-062 PDF Presentation
  @inproceedings{Chen2024jun,
  author = {Chen, Xiangyu and Liu, Jing and Wang, Ye and Wang, Pu and Brand, Matthew and Wang, Guanghui and Koike-Akino, Toshiaki},
  title = {{SuperLoRA: Parameter-Efficient Unified Adaptation for Large Vision Models}},
  booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = 2024,
  pages = {8050--8055},
  month = jun,
  publisher = {IEEE},
  doi = {10.1109/CVPRW63382.2024.00804},
  url = {https://www.merl.com/publications/TR2024-062}
  }