Robust and Efficient Multimedia Retrieval for Music and Motion Data (14/06/07)
In this talk, we introduce concepts and algorithms for robust and efficient multimedia retrieval in the presence of variations. By means of two different types of multimedia data -- waveform-based music data and human motion data -- we discuss strategies for handling object deformations and variability in the given data. To illustrate the kind of problems encountered in content-based retrieval, we outline some typical query-by-example scenarios. In the music domain, one often has multiple realizations of one and the same piece of music such as audio recordings of different interpretations and arrangements. Given an excerpt of a specific audio recording as a query, say, the first twenty seconds of Bernstein's interpretation of Beethoven's Fifth Symphony, the objective is to find all corresponding audio clips within a given music database. In case of Beethoven's Fifth, this includes the repetition of the theme in the exposition or in the recapitulation within the same interpretation, as well as the corresponding excerpts in all recordings of the same piece conducted, e. g. by Toscanini or Karajan. Even more challenging is to also include arrangements such as Liszt's piano transcription of Beethoven's Fifth or a synthesized version of a corresponding MIDI file. The main difficulty in such a matching scenario is that two audio clips, even though similar from a musical point of view, may exhibit significant variations in dynamics, timbre, execution of note groups, musical key, articulation, or tempo. Switching to the motion domain, we consider a motion capture database containing a variety of human motions performed by different actors in various styles. Then, given a short motion clip as a query, the task is to automatically locate all database motion fragments that are in some sense similar to the query. For example, querying for a kicking motion, one may want to retrieve all database kicking motions irrespective of the specific motion speed or the direction and height of the kick. Here, the variations that are to be handled in the retrieval process concern the spatial as well as the temporal domain.