DSPRelated.com
Forums

Finding start position of one audio file in another

Started by evan...@gmail.com March 4, 2007
Just a novice question: If I have two audio files and one is, for instance, 5min and the other is a 20s segment of the first one, how would I go about finding the starting position of the second one within the first one? I'm looking for sort of an IndexOf, InStr type of function, but for audio files. Any ideas on where to start, or if there are any COM or .NET components that would be helpful?
Evan-

> Just a novice question: If I have two audio files and one is, for
> instance, 5min and the other is a 20s segment of the
> first one, how would I go about finding the starting position of
> the second one within the first one? I'm looking for
> sort of an IndexOf, InStr type of function, but for audio files.
> Any ideas on where to start, or if there are any COM
> or .NET components that would be helpful?

Suggest to research cross-correlation, or a subset sometimes referred to as "matched filtering". In this case, your
20 sec segment will act as a filter, to which you're trying to find the highest correlation inside the long data.
However, 20 sec at 44.1 kHz sampling rate is a lot of data, enough to make real-time cross-correlation out of the
question even with fastest PCs, so you will need to make some tradeoffs.

Have fun :-)

-Jeff
On Sun, Mar 04, 2007 at 08:07:52PM -0600, Jeff Brower wrote:

> Suggest to research cross-correlation, or a subset sometimes referred to as "matched filtering". In this case, your
> 20 sec segment will act as a filter, to which you're trying to find the highest correlation inside the long data.
> However, 20 sec at 44.1 kHz sampling rate is a lot of data, enough to make real-time cross-correlation out of the
> question even with fastest PCs, so you will need to make some tradeoffs.

You can apply text search approach here. Just split shorter fragment into let's say 0,1 second blocks. Correlate
the first block, if the value is high, then append new block, if no, just move in the longer file. If the material
is music, downsampling by 4 (or even 8) before correlating may also reduce computational power required.

--
Grzegorz Kraszewski
http://teleinfo.pb.bialystok.pl/~krashan