TR2004-115

A Unified Framework for Video Summarization, Browsing and Retrieval


    •  Yong Rui, Ziyou Xiong, Regunathan Radhakrishnan, Ajay Divakaran, Thomas S. Huang, "A Unified Framework for Video Summarization, Browsing and Retrieval", Tech. Rep. TR2004-115, Mitsubishi Electric Research Laboratories, Cambridge, MA, September 2004.
      BibTeX TR2004-115 PDF
      • @techreport{MERL_TR2004-115,
      • author = {Yong Rui, Ziyou Xiong, Regunathan Radhakrishnan, Ajay Divakaran, Thomas S. Huang},
      • title = {A Unified Framework for Video Summarization, Browsing and Retrieval},
      • institution = {MERL - Mitsubishi Electric Research Laboratories},
      • address = {Cambridge, MA 02139},
      • number = {TR2004-115},
      • month = sep,
      • year = 2004,
      • url = {https://www.merl.com/publications/TR2004-115/}
      • }
Abstract:

Video content can be accessed by using either a top-down approach or a bottom-up approach [1, 2, 3, 4]. The top-down approach, i.e. video browsing, is useful when we need to get an "essence" of the content. The bottom-up approach, i.e. video retrieval, is useful when we know exactly what we are looking for in the content, as shown in Fig. 1. In video summarization, what "essence" the summary should capture depends on whether the content is scripted or not. Since scripted content, such as news, drama & movie, is carefully structured as a sequence of semantic units, one can get its essence by enabling a traversal through representative items from these semantic units. Hence, Table of Contents (ToC) based video browsing caters to summarization of scripted content. For instance, a news video composed of a sequence of stories can be summarized/browsed using a key-frame representation for each of the shots in a story. However, summarization of unscripted content, such as surveillance & sports), requires a "highlights" extraction framework that only captures remarkable events that constitute the summary.