A multimodal data analysis is applied to concurrent visual signals, auditory signals, and, when available, closed caption text. The analysis is general and unstructured; it can be applied, for example, to broadcast video in any language. We have applied the analysis to automatically identify and segment news broadcasts. However, the methods can be applied to identify and segment other broadcast types as well. The methods break the news broadcast stream into separate stories and also determine key frames for each story. When closed captions are available, each story can be labeled with its topic. The story segmentation is robust and has been applied to broadcasts in both Japanese and English. For the latter we have assembled a very large and rich collection over multiple months.
The figure shows topics arranged according to interestingness for a given period of time. Hottest topics (those most reported) appear in the central column. Side columns are used for topics of lesser impact by the interestingness measure. When the user pushes the play button, time progresses and topics flow upward. Topics move into or out of the central columns according to whether they are heating up or cooling down. At any point, the user can click on a frame and get the whole story. The user can get a fast overview of news over any period of time that is scalable to any number of channels, since topics rather than channels are displayed. Other measures besides hotness could be used. For example, the visual display may show topics that are only being reported by one or a few stations, perhaps indicating local news. This work is directed by Jianping Fan.