Multimedia Search and Summarization


Multimedia Search and Summarization


What are the affordances of computer and human in satisfying information need and in facilitating access to large-scale multimedia archives?


A mobile framework for landmark image recognition and classification has three major components in its architecture. The landmark image recognition and classification engine resides on the server along with a series of web services designed to expose their functionality to a mobile application running on a compatible mobile device (in this case an iPhone.) The efficacy of the mobile framework in end-to-end image classification and annotation was evaluated with automatic and user-centered evaluations.


As part of DCU's TRECVid Interactive Search activities, I developed an interactive search system which integrated with the K-Space video search engine to enable a user to query, retrieve and select shots relevant to a topic of interest. The traditional approach to presenting video search results is to maximize recall by offering a user as many potentially relevant shots as possible within a limited amount of time. 'Context'-oriented systems opt to allocate a portion of the results presentation space to providing additional contextual cues about the returned results. In video retrieval these cues often include temporal information such as a shot's location within the overall video broadcast and/or its neighboring shots. We developed two interfaces with identical retrieval functionality in order to measure the effects of such context on user performance. The first system had a 'recall-oriented' interface, where results from a query were presented as a ranked list of shots. The second was 'context-oriented', with results presented as a ranked list of broadcasts. In the 2007 TRECVid evaluation, 10 users participated in the experiments, of which 8 were novices and 2 experts. Participants completed a number of retrieval topics using both the recall-oriented and context-oriented systems. In 2008, we completed a multi-site, multi-interface experiment. Three institutes participated involving 36 users, 12 each from Dublin City University (DCU, Ireland), University of Glasgow (GU, Scotland) and Centrum Wiskunde & Informatica (CWI, the Netherlands).


As part of the TRECVid Summarization effort, I developed two solutions to summarizing BBC Rushes content. Rushes are the raw material (extra video, B-rolls footage) used to produce a video. 20 to 40 times as much material may be shot as actually becomes part of the finished product. Within the TRECVid summarization task, given a video from the rushes test collection, the goal is to automatically create an summary clip less than or equal to 2% of the original video's duration. The goal of the research was to balance brevity of the summary with overall coherence and salience, and was informed by a number of human-centered concepts as was the visual composition and arrangement of the on-screen elements.


A collaborative video search system for mobile devices, ‘iBingo’. It supports division of labour among users, providing search results to colocated iPod Touch devices