M Sukthankar 2000: Spoken-Document Retrieval

 

Home

A Spoken Document Retrieval System automatically indexes and then retrieves relevant items from a large collection of speech recordings in response to a user query.

This project involved producing such a system.  The speech recordings took the form of BBC News broadcasts. In order to allow the indexing and then retrieval of the broadcasts, they needed to be firstly transcribed into text form. This means that a conventional text retrieval system can be used for the indexing and retrieval.  Secondly the news broadcasts need to be segmented into indivdual stories which allows the retrieval to be in a more managable form for user.