TR2024-175

GPT Sonograpy: Hand Gesture Decoding from Forearm Ultrasound Images via VLM

- Bimbraw, K., Wang, Y., Liu, J., Koike-Akino, T., "GPT Sonograpy: Hand Gesture Decoding from Forearm Ultrasound Images via VLM", Advances in Neural Information Processing Systems (NeurIPS), December 2024.
  BibTeX TR2024-175 PDF Presentation
  - @inproceedings{Bimbraw2024dec,
  - author = {Bimbraw, Keshav and Wang, Ye and Liu, Jing and Koike-Akino, Toshiaki},
  - title = {{GPT Sonograpy: Hand Gesture Decoding from Forearm Ultrasound Images via VLM}},
  - booktitle = {Advancements In Medical Foundation Models: Explainability, Robustness, Security, and Beyond Workshop at Neural Information Processing Systems (NeurIPS)},
  - year = 2024,
  - month = dec,
  - url = {https://www.merl.com/publications/TR2024-175}
  - }
MERL Contacts:
Research Areas:

Artificial Intelligence, Machine Learning

Abstract:

Vision-language models (VLMs), such as the Generative Pre-trained Transformer
4-omni (GPT-4o), are emerging multi-modal foundation models which have great potential as powerful artificial-intelligence (AI) assistance tools for a myriad of applications, including healthcare, industrial, and academic sectors. Although such foundation models perform well in a wide range of general tasks, their ca- pability without fine-tuning is often limited in specialized tasks. However, full fine-tuning of large foundation models is challenging due to enormous computa- tion/memory/dataset requirements. We show that GPT-4o can decode hand gestures from forearm ultrasound data even with no fine-tuning, and improves with few-shot, retrieval-augmented in-context learning.

TR2024-175

GPT Sonograpy: Hand Gesture Decoding from Forearm Ultrasound Images via VLM

MERL Contacts:

Ye
Wang

Jing
Liu

Toshiaki
Koike-Akino

Research Areas:

Abstract:

MERL Contacts:

YeWang

JingLiu

ToshiakiKoike-Akino

Research Areas:

Abstract:

Ye
Wang

Jing
Liu

Toshiaki
Koike-Akino