Project interactive approaches will focus on general users with a straightforward retrieval paradigm that requires the minumum of retrieval effort from the user. As the majority of text searches are composed of 2-3 keywords this is an important consideration.
The digital content used in the project is composed of television news, newspaper archives, museum photos and personal digital photos. Material is supplied by the British library, the Victoria & Albert Museum, BBC, University of Waikato and Imperial College London.
- Develop a query-by-example mode using automated content-based analysis which ultimately extracts salient visual and textual features from the multimedia objects and creates according indices
- Explore new browsing paradigms which make use of above similarity indices
- Devise new search paradigms based on lateral browsing
- Present and summarise results adequately through document clustering; automated natural language summarisation of text and speech; story board generation of video material
- Define new interfaces which integrate and synchronise the different modi for resource discovery in digital libraries
UK Multimedia Knowledge Management Network consists of research teams from seven UK universities who work in this new interdisciplinary field. The aim of this network is to enhance communication between the experts in both academia and industry, and to maintain shared resources for the direct benefit of the research community. The network is hosted at and maintained by Imperial College London.
- promote communication through workshops and meetings, e-mail lists and web sites.
- develop a research roadmap for multimedia knowledge management and its research applications
- create and maintain shared resources for the direct benefit of the research community
- assist in training and development of new and existing researchers in the field
- promote the Network activity to potential members, beneficiaries, collaborators and users
- develop and implement a strategy for continuation of network activity after 3 years of EPSRC funding
Relevant research topics within Multimedia Knowledege Management include, but are not limited to, multimedia analysis, indexing, storage and delivery, needs elicitation and analysis, retrieval, summarisation, presentation and personalisation, crafting appropriate access environments, capitalisation and evaluation.
This research is jointly sponsored by the NSF and the EU from 2002 to 2004.
M Carey, D Heesch and S Rüger: Info Navigator: A visualization tool for document searching and browsing. Proc of Intl Conf on Distributed Multimedia Systems (DMS Sep 2003), 2003
P Au, M Carey, S Sewraz, Y Guo and S Rüger: New paradigms in information visualization. Int'l ACM Information Retrieval Conf (SIGIR, Athens, Greece), pp 307–309, ISBN 1-58113-226-3, Jul 2000
S Sewraz and S Rüger: A visual information-retrieval navigator. European Colloquium on Information Retrieval Research (ECIR, Cambridge, UK), pp 222–231, Apr 2000
This project has made a number of contributions to the area of multimedia information retrieval ranging from i) the development and evaluation of simple image features such as texture, colour and shape for images based on psychological, signal-processing and statistical methods; ii) a novel polyphonic symbolic music representation that allows the use of ordinary text search technology such as google's to index and search music repositories by humming; iii) the introduction of novel automated structuring principles such as lateral similarity and search-result clustering that allow the user to browse (sub)collections intuitively; to iv) novel video summarisation schemes that are suitable, eg, for news search engines.
The overriding principle in this research has been the ability to create an easy, intuitive and user-friendly content-based multimedia search engine. To that end a number of research prototypes of music, image and video search engines were successfully developed and integrated into a multimedia search platform. This platform has undergone extensive metric-based evaluation in international collaborative evaluation conferences (such as TRECVID and ImageCLEF) where it has consistently proven to be amongst the top systems worldwide.
The research we have carried out so far during this fellowship has resulted in a well-designed and robust general framework for multimedia searches which lends itself to be deployed in specific application areas. Ultimately, those results are bound to improve searching, browsing, discovery and access in areas such as arts and media through imaginative navigation modes; crime prevention through automated analysis of CCTV footage; intellectual property through detection of trademark duplication or copyright infringement; journalism through content-based image searches and resource discovery, medical diagnosis through finding similar images from a database; and, in general, web repositories, cultural heritage collections and multimedia digital libraries.
This research is sponsored through the award of an EPSRC Advanced Research Fellowship from Oct 1999 to Sept 2004.
We are pleased to report that considerable progress has been made, if along lines slightly different from the ones originally outlined in the proposal. Task 1 was completed early in the project and it was shown that this algorithm coped well with monophonic signals. However, the methods suggested in the proposal to extend this method to handle polyphony proved impractical. We were thus forced to return to more fundamental studies of pitch detection algorithms. Substantial theoretical and experimental investigations were carried out into existing algorithms and novel algorithms were developed and implemented that are capable of detecting notes in polyphonic music and which, we believe, represent significant advances over the current state-of-the-art in many aspects. Thus task 3, which was the most difficult fundamental part of the project, was successfully completed.
A two-step approach was adopted which divides the task of note recognition into two subtasks: (A) short-time spectral estimation of the musical signal, resulting in a time-frequency spectrum, and (B) note extraction based on the resulting spectra. Novel approaches have been developed for both the spectral analysis as well as the pattern recognition part of the note identification problem; for the former, the main novelty lies in the use of auto-regressive as opposed to conventional Fourier spectral estimators, for the latter in a combination of data classification methods and a topological approach to note identification which emphasises connectivity patterns in both time and pitch.
The resulting algorithms were coded in Mathematica and successfully tested with digitised recordings of both mono- and polyphonic piano music with up to 3 tones occurring simultaneously. At the time of writing one paper has been published , a second one is in preparation which will contain the major part of our results , and more technical issues are contained in an as yet unpublished report .
 T von Schroeter (1998): Frequency Warping with Arbitrary Allpass Maps. IEEE Signal Processing Letters, 6, pp 116-118
 T von Schroeter and J Darlington (in preparation): Connectivity in auto-regressive spectra of polyphonic piano music - a topological approach to automated transcription.
 T von Schroeter: Auto-regressive spectral line analysis of piano tones, Technical report.
The aim of this research is to enhance multimedia retrieval applications by combining both knowledge and statistical data in a learning framework to extract semantic information from multimedia. We will approach the problem as a Bayesian learning problem divided in three parts:
- Multimedia mining: mines the feature space for the problem’s most common patterns and learns the causality relations between the occurrence of these patterns and keywords.
- Multi-modal information fusion: multi-modal features will be combined in a statistical framework to increase the prediction accuracy of keywords in new unseen content. and
- Semantic information extraction: improve the inference results obtained in the previous steps by using knowledge about keywords co-occurrences.
This project is focussed on issues around the scalability of content-based image retrieval systems. Specifically I am looking at how to efficiently index high dimensional feature vectors.
This project is concerned with learning robust statistical models of invariant image properties for automatically annotating unseen images with relevant keywords. These annotations are intended for providing text-based search access to large collections of unlabelled images and videos. We have built a prototype search engine based on these principles.
Keywords: automatic image annotation, automated image annotation, learning image captions, statistics of natural images
Finished PhD projects
This PhD project investigated the application of iterative Monte Carlo methods to the problem of parameter estimation for models of maximum entropy, minimum divergence, and maximum likelihood among the class of exponential-family densities. It showed how to apply such models to large domains in which exact computation is not practically possible.
My PhD thesis, which I successfully defended in mid-November 2005, was in the area of image analysis and image retrieval. It addressed the specific problem of how to represent the semantic richness of images and how to learn combinations of visual features that optimally model human perception of similarity. The abstract is given below:
Retrieval of images from large image archives based solely on their visual similarity to a query image provides an exciting alternative to conventional text -based search. For content-based retrieval images are represented in terms of visual features. The question of how to combine these for similarity computati on is typically addressed by eliciting relevance feedback from the user on the retrieved images. We argue in this thesis that the prevailing approach to relevance feedback suffers from three significant shortcomings: firstly, it leaves unsolved the question of how to combine features for the first retrieval; se condly, the advantage of automated content-extraction over manual annotation is greatest for large collections but if the query image is not constrained to come from the indexed collection, content-based retrieval entails imagewise comparisons leading to prohibitive response times; thirdly, users may only have vaguely defined information needs or may change their needs in the course of the interaction. The large majority of relevance feedback techniques are ill-suited for such undirected exploration. We propose a new framework of user interaction that addresses these limitations. It is centred on what we call the NNk idea. The NNk of an image are all those images that are most similar to it under some combination of features. They can be viewed as representatives of the possible semantic facets an image may exhibit to different users. The NNk idea is first applied to the problem of automated retrieval where it suggests a two-step method of relevance feedback that is shown to outperform existing techniques. In the continuation of the thesis we broaden the view and introduce NNk Networks as static structures for browsing image collections. NNk Networks are directed graphs in which every image is connected to all its NNk. NNk Networks obviate the need to articulate an information need pictorially. Moreover, by being entirely precomputed we achieve interaction times that are independent of the collection size. We investigate topological properties of the networks and analyse how well they capture the semantic structure of a collection. These formal analyses are complemented by a large-scale quantitative evaluation on 32,000 images and a set of realistic search tasks. Both approaches suggest that NNk Networks provide a very effective alternative to automated retrieval both for directed searching and undirected browsing.
The work is supported by AT&T Laboratories, Cambridge.
MIR Design StudiesWe are currently building a multimedia information retrieval system as a framework for research and demonstrator for applications.
Every day, around the world, events take place that change history. Because people are curious, technologies have evolved that make this information available for immediate consumption, and because people are also sentimental, for eternity. First, newspapers dominated the scene, then television was invented, and now, the Internet is ubiquitous, embedded in our lives like the very air we breathe. As a result, information is available always and everywhere. Thus, a shift has occurred - the question is no longer "Where can I find information?", the question has evolved to "How can I find the information I want?".
This report introduces GeoBrowser, a web-based graphical user interface which implements a new method of navigating through large amounts of news material.
The Image Browser is expected to be screen-aware (ie utilise the full physical screen of the user), band-width aware (ie, decrease resolution for slow connections and pre-load/cache the images which are likely to be viewed next), dynamic (so that results from an image search engine can be viewed), context-aware (so that annotations, if any, can be displayed along with the images). We have several pure image collections of up to 32,000 images which are partially annotated. There exist backend image search engines which can be integrated into the browsing process.
Video Browsing should initially be the same as image browsing, ie, one can
expect that a video has been dissected into "shots" each of which is
represented by a key-frame image. When one clicks on a key-frame, the clip
should be played in this window.
We have some 100 hours of videos dissected into shots and key-frames with
annotations from speech recognition or teletext subtitles for these
We first investigate the principles of high dimensional indexing, choose an index structure for implementing and verify our choice. Then present our design and implementation of the indexing structure. Finally, the indexing system is evaluated by intensive performance tests.
This individual study option is a study into the application of Support Vector Machines (SVMs) and a number of feature selection to Content Based Image Retrieval (CBIR). We investigate the effects of different feature selection algorithm on an image representation designed by Tieu and Viola and observe differences in retrieval performance using SVMs.
This project has a theoretical and a practical sections. The theoretical section investigates different approaches to CBIR; the practical section presents the experimental results of this study.
This project proposes to build a system which address summarization at a multimedia level. In one short sentence ths system could be described as:
"Watch the news while I was away and tell me what happened."
This project combines a Video scene change algorithm, with the current text segmentation and summarization techniques to build an automatic news summarization and extraction system.
Television broadcast news are captured both in Video/Audio format with the accompanying subtitles in text format. News stories are identified, extracted from the video, and summarized in a short paragraph which reduces the amount of information into a manageable size. Individual news video clips can be retrieved effectively by a combination of video and text, using a reversed indexed search engine to provide distilled information such as a summarized version of the orginal text and highlights important key words in the text.
The aim of this project is to develop an extractive text summarization system, and to examine how such a system might personalize its output to the level of knowledge of its user.
Its overall aims are to allow full content search and retrieval of video The system performs a number of functions. It records TV material and the respective subtitles of the teletext system, identifies video scenes from analysis of colour histograms and motion vectors, and then automatically indexes these 'video paragraphs' according to significant words detected in the subtitles. A query is typically submitted as text input. Thumbnails of keyframes are then displayed with the option to show a sentence describing the content of each shot, extracted from the subtitles, or to play back the shot itself.
This group project implements feature extraction methods for images submitted in jpeg format. These features are used to search a database of pre-computed features for images in a data bank. The database was implemented using R-trees to facilitate multi-dimensional searches.
This project is to visually present a large set of documents (returned by a search engine) in a way that the users can easily spot subsets of documents they are interested in. This to be integrated into a search engine and implemented as a web based distributed application.
This project involved producing such a system. The speech recordings took the form of BBC News broadcasts. In order to allow the indexing and then retrieval of the broadcasts, they needed to be firstly transcribed into text form. This means that a conventional text retrieval system can be used for the indexing and retrieval. Secondly the news broadcasts need to be segmented into indivdual stories which allows the retrieval to be in a more managable form for user.
The project is based on identifying relevant words in the set of hit documents and on a clustering of the hit-document set; it shall make use of the clustering to visually group the documents returned from the search and label the groups with their respective related words. Also, the navigator shall be able to browse cluster information as well as drill up or down in one or more clusters and refine the search using one or more of the suggested related keywords.