DSPRelated.com
Forums

Comparing two similiar audio files, FFT?

Started by kieran October 8, 2008
Hello,
I am trying to compare two similar audio files (WAV). From what i have
read i need to sample both audio files at certain frequencies and run
these through a FFT and then compare the results. Can anyone advise me
if this is the correct approach and also describe the steps i need to
take to get to the stage where I can compare the files.
TIA,
Kieran
kieran wrote:
> Hello, > I am trying to compare two similar audio files (WAV). From what i have > read i need to sample both audio files at certain frequencies and run > these through a FFT and then compare the results. Can anyone advise me > if this is the correct approach and also describe the steps i need to > take to get to the stage where I can compare the files.
WAV files contain sampled data (at any of a variety of rates). What would sampling them involve? What does it mean to compare similar sounds? Can you define similarity with software? Jerry -- Engineering is the art of making what you want from things you can get. ����������������������������������������������������������������������� ** Posted from http://www.teranews.com **
On Oct 8, 10:15&#4294967295;am, Jerry Avins <j...@ieee.org> wrote:
> kieran wrote: > > Hello, > > I am trying to compare two similar audio files (WAV). From what i have > > read i need to sample both audio files at certain frequencies and run > > these through a FFT and then compare the results. Can anyone advise me > > if this is the correct approach and also describe the steps i need to > > take to get to the stage where I can compare the files. > > WAV files contain sampled data (at any of a variety of rates). What > would sampling them involve?
I think that is what he is trying to figure out. :)
> What does it mean to compare similar sounds? Can you define similarity > with software?
Again, I think he is asking for help from someone to do that for him :) To OP: 1. Do whatever is necessary to convert .wav files to their discrete- time signals: http://www.sonicspot.com/guide/wavefiles.html 2. time-warping might or might not be necessary depending on difference between two sample rates: http://en.wikipedia.org/wiki/Dynamic_time_warping 3. After time warping, truncate both signals so that their durations are equivalent. 4. Compute normalized energy spectral density (ESD) from DFT's two signals: http://en.wikipedia.org/wiki/Power_spectrum. 6. Compute mean-square-error (MSE) between normalized ESD's of two signals: http://en.wikipedia.org/wiki/Mean_squared_error The MSE between the normalized ESD's of two signals is good metric of closeness. If you have say, 10 .wav files, and 2 of them are nearly the same, but the others are not, the two that are close should have a relatively low MSE. Two perfectly identical signals will obviously have MSE of zero. Ideally, two "equivalent" signals with different time scales, (20-second human talking versus 5-second chipmunk), different energies (soft-spoken human verus yelling chipmunk), and different phases (sampling began at slightly different instant against continuous time input); should still have MSE of zero, but quantization errors inherent in DSP will yield MSE slightly greater than zero. http://en.wikipedia.org/wiki/Minimum_mean-square_error -Le Chaud Lapin-
On Oct 8, 6:48 pm, Le Chaud Lapin <jaibudu...@gmail.com> wrote:
> On Oct 8, 10:15 am, Jerry Avins <j...@ieee.org> wrote: > > >kieranwrote: > > > Hello, > > > I am trying to compare two similar audio files (WAV). From what i have > > > read i need to sample both audio files at certain frequencies and run > > > these through a FFT and then compare the results. Can anyone advise me > > > if this is the correct approach and also describe the steps i need to > > > take to get to the stage where I can compare the files. > > > WAV files contain sampled data (at any of a variety of rates). What > > would sampling them involve? > > I think that is what he is trying to figure out. :) > > > What does it mean to compare similar sounds? Can you define similarity > > with software? > > Again, I think he is asking for help from someone to do that for > him :) > > To OP: > > 1. Do whatever is necessary to convert .wav files to their discrete- > time signals: > > http://www.sonicspot.com/guide/wavefiles.html > > 2. time-warping might or might not be necessary depending on > difference between two sample rates: > > http://en.wikipedia.org/wiki/Dynamic_time_warping > > 3. After time warping, truncate both signals so that their durations > are equivalent. > > 4. Compute normalized energy spectral density (ESD) from DFT's two > signals: > > http://en.wikipedia.org/wiki/Power_spectrum. > > 6. Compute mean-square-error (MSE) between normalized ESD's of two > signals: > > http://en.wikipedia.org/wiki/Mean_squared_error > > The MSE between the normalized ESD's of two signals is good metric of > closeness. If you have say, 10 .wav files, and 2 of them are nearly > the same, but the others are not, the two that are close should have a > relatively low MSE. Two perfectly identical signals will obviously > have MSE of zero. Ideally, two "equivalent" signals with different > time scales, (20-second human talking versus 5-second chipmunk), > different energies (soft-spoken human verus yelling chipmunk), and > different phases (sampling began at slightly different instant against > continuous time input); should still have MSE of zero, but > quantization errors inherent in DSP will yield MSE slightly greater > than zero. > > http://en.wikipedia.org/wiki/Minimum_mean-square_error > > -Le Chaud Lapin-
Hello, thanks for your reply, this approach seems to be the way forward. I have been working on this for the previous 3 weeks , much of my time has been spent learning about DSP and FFT. When i thought I was ready to put what i have learned into perl i have had no luck with the modules I have been trying to install. I have installed lots of modules over the previous couple of weeks but the modules around Audio have not been installing properly for me. Perhaps the modules I have been using are old and no longer supported, but I have not been able to install them properly. Has anyone used the following modules? Audio::FLAC::Decoder, Audio::Mad::Resample, Audio::MPEG, Audio::SNDFile. Or can anyone suggest any modules to do the following? Downsample/Decode wave files, apply a low pass filter to audio sample. Hope you can help, kieran
On Oct 23, 4:07 pm, kieran <kieranoc...@gmail.com> wrote:
> On Oct 8, 6:48 pm, Le Chaud Lapin <jaibudu...@gmail.com> wrote: > > > > > On Oct 8, 10:15 am, Jerry Avins <j...@ieee.org> wrote: > > > >kieranwrote: > > > > Hello, > > > > I am trying to compare two similar audio files (WAV). From what i have > > > > read i need to sample both audio files at certain frequencies and run > > > > these through a FFT and then compare the results. Can anyone advise me > > > > if this is the correct approach and also describe the steps i need to > > > > take to get to the stage where I can compare the files. > > > > WAV files contain sampled data (at any of a variety of rates). What > > > would sampling them involve? > > > I think that is what he is trying to figure out. :) > > > > What does it mean to compare similar sounds? Can you define similarity > > > with software? > > > Again, I think he is asking for help from someone to do that for > > him :) > > > To OP: > > > 1. Do whatever is necessary to convert .wav files to their discrete- > > time signals: > > >http://www.sonicspot.com/guide/wavefiles.html > > > 2. time-warping might or might not be necessary depending on > > difference between two sample rates: > > >http://en.wikipedia.org/wiki/Dynamic_time_warping > > > 3. After time warping, truncate both signals so that their durations > > are equivalent. > > > 4. Compute normalized energy spectral density (ESD) from DFT's two > > signals: > > >http://en.wikipedia.org/wiki/Power_spectrum. > > > 6. Compute mean-square-error (MSE) between normalized ESD's of two > > signals: > > >http://en.wikipedia.org/wiki/Mean_squared_error > > > The MSE between the normalized ESD's of two signals is good metric of > > closeness. If you have say, 10 .wav files, and 2 of them are nearly > > the same, but the others are not, the two that are close should have a > > relatively low MSE. Two perfectly identical signals will obviously > > have MSE of zero. Ideally, two "equivalent" signals with different > > time scales, (20-second human talking versus 5-second chipmunk), > > different energies (soft-spoken human verus yelling chipmunk), and > > different phases (sampling began at slightly different instant against > > continuous time input); should still have MSE of zero, but > > quantization errors inherent in DSP will yield MSE slightly greater > > than zero. > > >http://en.wikipedia.org/wiki/Minimum_mean-square_error > > > -Le Chaud Lapin- > > Hello, > thanks for your reply, this approach seems to be the way forward. I > have been working on this for the previous 3 weeks , much of my time > has been spent learning about DSP and FFT. When i thought I was ready > to put what i have learned into perl i have had no luck with the > modules I have been trying to install. > I have installed lots of modules over the previous couple of weeks > but the modules around Audio have not been installing properly for me. > Perhaps the modules I have been using are old and no longer > supported, but I have not been able to install them properly. > Has anyone used the following modules? > Audio::FLAC::Decoder, Audio::Mad::Resample, Audio::MPEG, > Audio::SNDFile. > Or can anyone suggest any modules to do the following? > Downsample/Decode wave files, apply a low pass filter to audio > sample. > Hope you can help,kieran
Sorry guys, that last one was really perl specific. Thanks for your help. Kieran
On Oct 8, 7:49&#4294967295;pm, kieran <kieranoc...@gmail.com> wrote:
> Hello, > I am trying to compare two similar audio files (WAV). From what i have > read i need to sample both audio files at certain frequencies and run > these through a FFT and then compare the results. Can anyone advise me > if this is the correct approach and also describe the steps i need to > take to get to the stage where I can compare the files. > TIA, > Kieran
My two-pence: In case the wave files contain speech, you problem looks quite similar to standard speech recognition problem. Maybe you can take some inspiration from speech recoginition techniques in that case. Speech recognition usually use LPC&Cepstral coefficients instead of FFT for preprocessing. You may or may not require the VQ (Vector Quantization)/ HMM (Hidden Markov model) part depending on the exact application. Dr. Lawrence Rabiner's papers/book on this subject may be a good starting point. Regards Piyush
Hi, I'm trying to address a similar, yet still different problem.  I
will have hundreds of recordings consisting of about 350ms worth of
data.  The phase between them will be different, since the beginning
of the recording is using a carrier operated squelch trigger, and
buffering the 100ms before the trigger, in addition to 250ms after the
trigger.  As such, the actual beginning of the recording could be off
by up to 30 or 40 ms.  The waveforms, if viewed on a scope are nearly
identical if coming from the same source.  The waveform from a
difference source will be visually different, and have a different
"fingerprint".

Would using what you describe below be able to address my scenario?

Thanks in advance,
Jason

On Oct 8, 10:48&#4294967295;am, Le Chaud Lapin <jaibudu...@gmail.com> wrote:
> On Oct 8, 10:15&#4294967295;am, Jerry Avins <j...@ieee.org> wrote: > > > kieran wrote: > > > Hello, > > > I am trying tocomparetwo similar audio files (WAV). From what i have > > > read i need to sample both audio files at certain frequencies and run > > > these through a FFT and thencomparethe results. Can anyone advise me > > > if this is the correct approach and also describe the steps i need to > > > take to get to the stage where I cancomparethe files. > > >WAVfiles contain sampled data (at any of a variety of rates). What > > would sampling them involve? > > I think that is what he is trying to figure out. :) > > > What does it mean tocomparesimilar sounds? Can you define similarity > > with software? > > Again, I think he is asking for help from someone to do that for > him :) > > To OP: > > 1. Do whatever is necessary to convert .wavfiles to their discrete- > time signals: > > http://www.sonicspot.com/guide/wavefiles.html > > 2. time-warping might or might not be necessary depending on > difference between two sample rates: > > http://en.wikipedia.org/wiki/Dynamic_time_warping > > 3. After time warping, truncate both signals so that their durations > are equivalent. > > 4. Compute normalized energy spectral density (ESD) from DFT's two > signals: > > http://en.wikipedia.org/wiki/Power_spectrum. > > 6. Compute mean-square-error (MSE) between normalized ESD's of two > signals: > > http://en.wikipedia.org/wiki/Mean_squared_error > > The MSE between the normalized ESD's of two signals is good metric of > closeness. If you have say, 10 .wavfiles, and 2 of them are nearly > the same, but the others are not, the two that are close should have a > relatively low MSE. Two perfectly identical signals will obviously > have MSE of zero. Ideally, two "equivalent" signals with different > time scales, (20-second human talking versus 5-second chipmunk), > different energies (soft-spoken human verus yelling chipmunk), and > different phases (sampling began at slightly different instant against > continuous time input); should still have MSE of zero, but > quantization errors inherent in DSP will yield MSE slightly greater > than zero. > > http://en.wikipedia.org/wiki/Minimum_mean-square_error > > -Le Chaud Lapin-
On Oct 30, 11:32&#4294967295;pm, jleg...@proxime.net wrote:
> Hi, I'm trying to address a similar, yet still different problem. &#4294967295;I > will have hundreds of recordings consisting of about 350ms worth of > data. &#4294967295;The phase between them will be different, since the beginning > of the recording is using a carrier operated squelch trigger, and > buffering the 100ms before the trigger, in addition to 250ms after the > trigger. &#4294967295;As such, the actual beginning of the recording could be off > by up to 30 or 40 ms. &#4294967295;The waveforms, if viewed on a scope are nearly > identical if coming from the same source. &#4294967295;The waveform from a > difference source will be visually different, and have a different > "fingerprint". > > Would using what you describe below be able to address my scenario? > > Thanks in advance, > Jason > > On Oct 8, 10:48&#4294967295;am, Le Chaud Lapin <jaibudu...@gmail.com> wrote: > > > > > On Oct 8, 10:15&#4294967295;am, Jerry Avins <j...@ieee.org> wrote: > > > > kieran wrote: > > > > Hello, > > > > I am trying tocomparetwo similar audio files (WAV). From what i have > > > > read i need to sample both audio files at certain frequencies and run > > > > these through a FFT and thencomparethe results. Can anyone advise me > > > > if this is the correct approach and also describe the steps i need to > > > > take to get to the stage where I cancomparethe files. > > > >WAVfiles contain sampled data (at any of a variety of rates). What > > > would sampling them involve? > > > I think that is what he is trying to figure out. :) > > > > What does it mean tocomparesimilar sounds? Can you define similarity > > > with software? > > > Again, I think he is asking for help from someone to do that for > > him :) > > > To OP: > > > 1. Do whatever is necessary to convert .wavfiles to their discrete- > > time signals: > > >http://www.sonicspot.com/guide/wavefiles.html > > > 2. time-warping might or might not be necessary depending on > > difference between two sample rates: > > >http://en.wikipedia.org/wiki/Dynamic_time_warping > > > 3. After time warping, truncate both signals so that their durations > > are equivalent. > > > 4. Compute normalized energy spectral density (ESD) from DFT's two > > signals: > > >http://en.wikipedia.org/wiki/Power_spectrum. > > > 6. Compute mean-square-error (MSE) between normalized ESD's of two > > signals: > > >http://en.wikipedia.org/wiki/Mean_squared_error > > > The MSE between the normalized ESD's of two signals is good metric of > > closeness. If you have say, 10 .wavfiles, and 2 of them are nearly > > the same, but the others are not, the two that are close should have a > > relatively low MSE. Two perfectly identical signals will obviously > > have MSE of zero. Ideally, two "equivalent" signals with different > > time scales, (20-second human talking versus 5-second chipmunk), > > different energies (soft-spoken human verus yelling chipmunk), and > > different phases (sampling began at slightly different instant against > > continuous time input); should still have MSE of zero, but > > quantization errors inherent in DSP will yield MSE slightly greater > > than zero.
Hi, I just took a look at the cepstral method for the first time, and it seems that the results would be better, as indicated by other posters. It makes sense, as it takes into account the logarithmic nature of "similarity" of two utterances, where as the straight MMSE method does not. Still, the MMSE method, with normalization, is a good place to start, as it is the swiss-army-knife of signal estimation. In fact, it appears that cepstral method uses same concept of MMSE, but in a different domain, that domain being the PSD of signal that is log of PSD of original signal, which kind makes sense, as hearing/speech sensitity is physiologically logarithmic anyway. On a related note, one can regard the cepstral method as one of a class of algorithms where MMSE technique is applied to PSD of signal, but some transformation thereof. So answer is yes, you should get some positive results, but cepstral method should definitely be investigated to see just how much better it is. -Le Chaud Lapin-
Excellent.  Thanks!  I'll be progressing on this over the next few
weeks as a side project.


On Oct 31, 1:14&#4294967295;am, Le Chaud Lapin <jaibudu...@gmail.com> wrote:
> On Oct 30, 11:32&#4294967295;pm, jleg...@proxime.net wrote: > > > > > > > Hi, I'm trying to address a similar, yet still different problem. &#4294967295;I > > will have hundreds of recordings consisting of about 350ms worth of > > data. &#4294967295;The phase between them will be different, since the beginning > > of the recording is using a carrier operated squelch trigger, and > > buffering the 100ms before the trigger, in addition to 250ms after the > > trigger. &#4294967295;As such, the actual beginning of the recording could be off > > by up to 30 or 40 ms. &#4294967295;The waveforms, if viewed on a scope are nearly > > identical if coming from the same source. &#4294967295;The waveform from a > > difference source will be visually different, and have a different > > "fingerprint". > > > Would using what you describe below be able to address my scenario? > > > Thanks in advance, > > Jason > > > On Oct 8, 10:48&#4294967295;am, Le Chaud Lapin <jaibudu...@gmail.com> wrote: > > > > On Oct 8, 10:15&#4294967295;am, Jerry Avins <j...@ieee.org> wrote: > > > > > kieran wrote: > > > > > Hello, > > > > > I am trying tocomparetwo similar audio files (WAV). From what i have > > > > > read i need to sample both audio files at certain frequencies and run > > > > > these through a FFT and thencomparethe results. Can anyone advise me > > > > > if this is the correct approach and also describe the steps i need to > > > > > take to get to the stage where I cancomparethe files. > > > > >WAVfiles contain sampled data (at any of a variety of rates). What > > > > would sampling them involve? > > > > I think that is what he is trying to figure out. :) > > > > > What does it mean tocomparesimilar sounds? Can you define similarity > > > > with software? > > > > Again, I think he is asking for help from someone to do that for > > > him :) > > > > To OP: > > > > 1. Do whatever is necessary to convert .wavfiles to their discrete- > > > time signals: > > > >http://www.sonicspot.com/guide/wavefiles.html > > > > 2. time-warping might or might not be necessary depending on > > > difference between two sample rates: > > > >http://en.wikipedia.org/wiki/Dynamic_time_warping > > > > 3. After time warping, truncate both signals so that their durations > > > are equivalent. > > > > 4. Compute normalized energy spectral density (ESD) from DFT's two > > > signals: > > > >http://en.wikipedia.org/wiki/Power_spectrum. > > > > 6. Compute mean-square-error (MSE) between normalized ESD's of two > > > signals: > > > >http://en.wikipedia.org/wiki/Mean_squared_error > > > > The MSE between the normalized ESD's of two signals is good metric of > > > closeness. If you have say, 10 .wavfiles, and 2 of them are nearly > > > the same, but the others are not, the two that are close should have a > > > relatively low MSE. Two perfectly identical signals will obviously > > > have MSE of zero. Ideally, two "equivalent" signals with different > > > time scales, (20-second human talking versus 5-second chipmunk), > > > different energies (soft-spoken human verus yelling chipmunk), and > > > different phases (sampling began at slightly different instant against > > > continuous time input); should still have MSE of zero, but > > > quantization errors inherent in DSP will yield MSE slightly greater > > > than zero. > > Hi, > > I just took a look at the cepstral method for the first time, and it > seems that the results would be better, as indicated by other > posters. &#4294967295;It makes sense, as it takes into account the logarithmic > nature of "similarity" of two utterances, where as the straight MMSE > method does not. > > Still, the MMSE method, with normalization, is a good place to start, > as it is the swiss-army-knife of signal estimation. &#4294967295;In fact, it > appears that cepstral method uses same concept of MMSE, but in a > different domain, that domain being the PSD of signal that is log of > PSD of original signal, which kind makes sense, as hearing/speech > sensitity is physiologically logarithmic anyway. > > On a related note, one can regard the cepstral method as one of a > class of algorithms where MMSE technique is &#4294967295;applied to PSD of signal, > but some transformation thereof. > > So answer is yes, you should get some positive results, but cepstral > method should definitely be investigated to see just how much better > it is. > > -Le Chaud Lapin-
Well, I ended up using cohere() in octave, and it compares exactly
what you mentioned.  The issue is, when I look at two waveforms that I
know are "different", ie, the initial onset waveform starts at a
different point in the cycle than on the other (one starts at about
90, the other at about 240 degrees).  Essentially, we are trying to
fingerprint some transmitters, and the visual waveforms are indeed
unique per radio, but the MSE between them approaches 0 (to the point
that it's nearly equal to the MSE between two waveforms from the same
radio on some samples).  The false positive rate is a little on the
high side.  Is it acceptable to take sliding differentials on the
waveform with sufficient overlap and use that as another datapoint?

On Oct 31, 12:57&#4294967295;am, jleg...@proxime.net wrote:
> Excellent. &#4294967295;Thanks! &#4294967295;I'll be progressing on this over the next few > weeks as a side project. > > On Oct 31, 1:14&#4294967295;am, Le Chaud Lapin <jaibudu...@gmail.com> wrote: > > > On Oct 30, 11:32&#4294967295;pm, jleg...@proxime.net wrote: > > > > Hi, I'm trying to address a similar, yet still different problem. &#4294967295;I > > > will have hundreds of recordings consisting of about 350ms worth of > > > data. &#4294967295;The phase between them will be different, since the beginning > > > of the recording is using a carrier operated squelch trigger, and > > > buffering the 100ms before the trigger, in addition to 250ms after the > > > trigger. &#4294967295;As such, the actual beginning of the recording could be off > > > by up to 30 or 40 ms. &#4294967295;The waveforms, if viewed on a scope are nearly > > > identical if coming from the same source. &#4294967295;The waveform from a > > > difference source will be visually different, and have a different > > > "fingerprint". > > > > Would using what you describe below be able to address my scenario? > > > > Thanks in advance, > > > Jason > > > > On Oct 8, 10:48&#4294967295;am, Le Chaud Lapin <jaibudu...@gmail.com> wrote: > > > > > On Oct 8, 10:15&#4294967295;am, Jerry Avins <j...@ieee.org> wrote: > > > > > > kieran wrote: > > > > > > Hello, > > > > > > I am trying tocomparetwo similar audio files (WAV). From what i have > > > > > > read i need to sample both audio files at certain frequencies and run > > > > > > these through a FFT and thencomparethe results. Can anyone advise me > > > > > > if this is the correct approach and also describe the steps i need to > > > > > > take to get to the stage where I cancomparethe files. > > > > > >WAVfiles contain sampled data (at any of a variety of rates). What > > > > > would sampling them involve? > > > > > I think that is what he is trying to figure out. :) > > > > > > What does it mean tocomparesimilar sounds? Can you define similarity > > > > > with software? > > > > > Again, I think he is asking for help from someone to do that for > > > > him :) > > > > > To OP: > > > > > 1. Do whatever is necessary to convert .wavfiles to their discrete- > > > > time signals: > > > > >http://www.sonicspot.com/guide/wavefiles.html > > > > > 2. time-warping might or might not be necessary depending on > > > > difference between two sample rates: > > > > >http://en.wikipedia.org/wiki/Dynamic_time_warping > > > > > 3. After time warping, truncate both signals so that their durations > > > > are equivalent. > > > > > 4. Compute normalized energy spectral density (ESD) from DFT's two > > > > signals: > > > > >http://en.wikipedia.org/wiki/Power_spectrum. > > > > > 6. Compute mean-square-error (MSE) between normalized ESD's of two > > > > signals: > > > > >http://en.wikipedia.org/wiki/Mean_squared_error > > > > > The MSE between the normalized ESD's of two signals is good metric of > > > > closeness. If you have say, 10 .wavfiles, and 2 of them are nearly > > > > the same, but the others are not, the two that are close should have a > > > > relatively low MSE. Two perfectly identical signals will obviously > > > > have MSE of zero. Ideally, two "equivalent" signals with different > > > > time scales, (20-second human talking versus 5-second chipmunk), > > > > different energies (soft-spoken human verus yelling chipmunk), and > > > > different phases (sampling began at slightly different instant against > > > > continuous time input); should still have MSE of zero, but > > > > quantization errors inherent in DSP will yield MSE slightly greater > > > > than zero. > > > Hi, > > > I just took a look at the cepstral method for the first time, and it > > seems that the results would be better, as indicated by other > > posters. &#4294967295;It makes sense, as it takes into account the logarithmic > > nature of "similarity" of two utterances, where as the straight MMSE > > method does not. > > > Still, the MMSE method, with normalization, is a good place to start, > > as it is the swiss-army-knife of signal estimation. &#4294967295;In fact, it > > appears that cepstral method uses same concept of MMSE, but in a > > different domain, that domain being the PSD of signal that is log of > > PSD of original signal, which kind makes sense, as hearing/speech > > sensitity is physiologically logarithmic anyway. > > > On a related note, one can regard the cepstral method as one of a > > class of algorithms where MMSE technique is &#4294967295;applied to PSD of signal, > > but some transformation thereof. > > > So answer is yes, you should get some positive results, but cepstral > > method should definitely be investigated to see just how much better > > it is. > > > -Le Chaud Lapin-