comp.dsp | Comparing two similiar audio files, FFT?

Hello,
I am trying to compare two similar audio files (WAV). From what i have
read i need to sample both audio files at certain frequencies and run
these through a FFT and then compare the results. Can anyone advise me
if this is the correct approach and also describe the steps i need to
take to get to the stage where I can compare the files.
TIA,
Kieran

Reply by Jerry Avins ●October 8, 20082008-10-08

kieran wrote:
> Hello,
> I am trying to compare two similar audio files (WAV). From what i have
> read i need to sample both audio files at certain frequencies and run
> these through a FFT and then compare the results. Can anyone advise me
> if this is the correct approach and also describe the steps i need to
> take to get to the stage where I can compare the files.

WAV files contain sampled data (at any of a variety of rates). What 
would sampling them involve?

What does it mean to compare similar sounds? Can you define similarity 
with software?

Jerry
-- 
Engineering is the art of making what you want from things you can get.
&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;
** Posted from http://www.teranews.com **

Reply by Le Chaud Lapin ●October 8, 20082008-10-08

On Oct 8, 10:15&#4294967295;am, Jerry Avins <j...@ieee.org> wrote:
> kieran wrote:
> > Hello,
> > I am trying to compare two similar audio files (WAV). From what i have
> > read i need to sample both audio files at certain frequencies and run
> > these through a FFT and then compare the results. Can anyone advise me
> > if this is the correct approach and also describe the steps i need to
> > take to get to the stage where I can compare the files.
>
> WAV files contain sampled data (at any of a variety of rates). What
> would sampling them involve?

I think that is what he is trying to figure out. :)

> What does it mean to compare similar sounds? Can you define similarity
> with software?

Again, I think he is asking for help from someone to do that for
him :)

To OP:

1. Do whatever is necessary to convert .wav files to their discrete-
time signals:

http://www.sonicspot.com/guide/wavefiles.html

2. time-warping might or might not be necessary depending on
difference between two sample rates:

http://en.wikipedia.org/wiki/Dynamic_time_warping

3. After time warping, truncate both signals so that their durations
are equivalent.

4. Compute normalized energy spectral density (ESD) from DFT's two
signals:

http://en.wikipedia.org/wiki/Power_spectrum.

6. Compute mean-square-error (MSE) between normalized ESD's of two
signals:

http://en.wikipedia.org/wiki/Mean_squared_error

The MSE between the normalized ESD's of two signals is good metric of
closeness. If you have say, 10 .wav files, and 2 of them are nearly
the same, but the others are not, the two that are close should have a
relatively low MSE. Two perfectly identical signals will obviously
have MSE of zero. Ideally, two "equivalent" signals with different
time scales, (20-second human talking versus 5-second chipmunk),
different energies (soft-spoken human verus yelling chipmunk), and
different phases (sampling began at slightly different instant against
continuous time input); should still have MSE of zero, but
quantization errors inherent in DSP will yield MSE slightly greater
than zero.

http://en.wikipedia.org/wiki/Minimum_mean-square_error

-Le Chaud Lapin-

Reply by kieran ●October 23, 20082008-10-23

On Oct 8, 6:48 pm, Le Chaud Lapin <jaibudu...@gmail.com> wrote:
> On Oct 8, 10:15 am, Jerry Avins <j...@ieee.org> wrote:
>
> >kieranwrote:
> > > Hello,
> > > I am trying to compare two similar audio files (WAV). From what i have
> > > read i need to sample both audio files at certain frequencies and run
> > > these through a FFT and then compare the results. Can anyone advise me
> > > if this is the correct approach and also describe the steps i need to
> > > take to get to the stage where I can compare the files.
>
> > WAV files contain sampled data (at any of a variety of rates). What
> > would sampling them involve?
>
> I think that is what he is trying to figure out. :)
>
> > What does it mean to compare similar sounds? Can you define similarity
> > with software?
>
> Again, I think he is asking for help from someone to do that for
> him :)
>
> To OP:
>
> 1. Do whatever is necessary to convert .wav files to their discrete-
> time signals:
>
> http://www.sonicspot.com/guide/wavefiles.html
>
> 2. time-warping might or might not be necessary depending on
> difference between two sample rates:
>
> http://en.wikipedia.org/wiki/Dynamic_time_warping
>
> 3. After time warping, truncate both signals so that their durations
> are equivalent.
>
> 4. Compute normalized energy spectral density (ESD) from DFT's two
> signals:
>
> http://en.wikipedia.org/wiki/Power_spectrum.
>
> 6. Compute mean-square-error (MSE) between normalized ESD's of two
> signals:
>
> http://en.wikipedia.org/wiki/Mean_squared_error
>
> The MSE between the normalized ESD's of two signals is good metric of
> closeness. If you have say, 10 .wav files, and 2 of them are nearly
> the same, but the others are not, the two that are close should have a
> relatively low MSE. Two perfectly identical signals will obviously
> have MSE of zero. Ideally, two "equivalent" signals with different
> time scales, (20-second human talking versus 5-second chipmunk),
> different energies (soft-spoken human verus yelling chipmunk), and
> different phases (sampling began at slightly different instant against
> continuous time input); should still have MSE of zero, but
> quantization errors inherent in DSP will yield MSE slightly greater
> than zero.
>
> http://en.wikipedia.org/wiki/Minimum_mean-square_error
>
> -Le Chaud Lapin-

Hello,
thanks for your reply, this approach seems to be the way forward. I
have been working on this for the previous 3 weeks , much of my time
has been spent learning about DSP and FFT. When i thought I was ready
to put what i have learned into perl i have had no luck with the
modules I have been trying to install.
 I have installed lots of modules over the previous couple of weeks
but the modules around Audio have not been installing properly for me.
Perhaps the modules I have been using are old  and no longer
supported, but I have not been able to install them properly.
Has anyone used the following modules?
Audio::FLAC::Decoder, Audio::Mad::Resample, Audio::MPEG,
Audio::SNDFile.
Or can anyone suggest any modules to do the following?
Downsample/Decode wave files, apply a low pass filter to audio
sample.
Hope you can help,
kieran

Reply by kieran ●October 23, 20082008-10-23

On Oct 23, 4:07 pm, kieran <kieranoc...@gmail.com> wrote:
> On Oct 8, 6:48 pm, Le Chaud Lapin <jaibudu...@gmail.com> wrote:
>
>
>
> > On Oct 8, 10:15 am, Jerry Avins <j...@ieee.org> wrote:
>
> > >kieranwrote:
> > > > Hello,
> > > > I am trying to compare two similar audio files (WAV). From what i have
> > > > read i need to sample both audio files at certain frequencies and run
> > > > these through a FFT and then compare the results. Can anyone advise me
> > > > if this is the correct approach and also describe the steps i need to
> > > > take to get to the stage where I can compare the files.
>
> > > WAV files contain sampled data (at any of a variety of rates). What
> > > would sampling them involve?
>
> > I think that is what he is trying to figure out. :)
>
> > > What does it mean to compare similar sounds? Can you define similarity
> > > with software?
>
> > Again, I think he is asking for help from someone to do that for
> > him :)
>
> > To OP:
>
> > 1. Do whatever is necessary to convert .wav files to their discrete-
> > time signals:
>
> >http://www.sonicspot.com/guide/wavefiles.html
>
> > 2. time-warping might or might not be necessary depending on
> > difference between two sample rates:
>
> >http://en.wikipedia.org/wiki/Dynamic_time_warping
>
> > 3. After time warping, truncate both signals so that their durations
> > are equivalent.
>
> > 4. Compute normalized energy spectral density (ESD) from DFT's two
> > signals:
>
> >http://en.wikipedia.org/wiki/Power_spectrum.
>
> > 6. Compute mean-square-error (MSE) between normalized ESD's of two
> > signals:
>
> >http://en.wikipedia.org/wiki/Mean_squared_error
>
> > The MSE between the normalized ESD's of two signals is good metric of
> > closeness. If you have say, 10 .wav files, and 2 of them are nearly
> > the same, but the others are not, the two that are close should have a
> > relatively low MSE. Two perfectly identical signals will obviously
> > have MSE of zero. Ideally, two "equivalent" signals with different
> > time scales, (20-second human talking versus 5-second chipmunk),
> > different energies (soft-spoken human verus yelling chipmunk), and
> > different phases (sampling began at slightly different instant against
> > continuous time input); should still have MSE of zero, but
> > quantization errors inherent in DSP will yield MSE slightly greater
> > than zero.
>
> >http://en.wikipedia.org/wiki/Minimum_mean-square_error
>
> > -Le Chaud Lapin-
>
> Hello,
> thanks for your reply, this approach seems to be the way forward. I
> have been working on this for the previous 3 weeks , much of my time
> has been spent learning about DSP and FFT. When i thought I was ready
> to put what i have learned into perl i have had no luck with the
> modules I have been trying to install.
>  I have installed lots of modules over the previous couple of weeks
> but the modules around Audio have not been installing properly for me.
> Perhaps the modules I have been using are old  and no longer
> supported, but I have not been able to install them properly.
> Has anyone used the following modules?
> Audio::FLAC::Decoder, Audio::Mad::Resample, Audio::MPEG,
> Audio::SNDFile.
> Or can anyone suggest any modules to do the following?
> Downsample/Decode wave files, apply a low pass filter to audio
> sample.
> Hope you can help,kieran

Sorry guys, that last one was really perl specific. Thanks for your
help.
Kieran

Reply by Piyush Kaul ●October 24, 20082008-10-24

On Oct 8, 7:49&#4294967295;pm, kieran <kieranoc...@gmail.com> wrote:
> Hello,
> I am trying to compare two similar audio files (WAV). From what i have
> read i need to sample both audio files at certain frequencies and run
> these through a FFT and then compare the results. Can anyone advise me
> if this is the correct approach and also describe the steps i need to
> take to get to the stage where I can compare the files.
> TIA,
> Kieran

My two-pence:
In case the wave files contain speech, you problem looks quite similar
to standard speech recognition problem. Maybe you can take some
inspiration from speech recoginition techniques in that case. Speech
recognition usually use LPC&Cepstral coefficients instead of FFT for
preprocessing. You may or may not require the VQ (Vector Quantization)/
HMM (Hidden Markov model) part depending on the exact application. Dr.
Lawrence Rabiner's papers/book on this subject may be a good starting
point.

Regards
Piyush

Reply by ●October 31, 20082008-10-31

Hi, I'm trying to address a similar, yet still different problem.  I
will have hundreds of recordings consisting of about 350ms worth of
data.  The phase between them will be different, since the beginning
of the recording is using a carrier operated squelch trigger, and
buffering the 100ms before the trigger, in addition to 250ms after the
trigger.  As such, the actual beginning of the recording could be off
by up to 30 or 40 ms.  The waveforms, if viewed on a scope are nearly
identical if coming from the same source.  The waveform from a
difference source will be visually different, and have a different
"fingerprint".

Would using what you describe below be able to address my scenario?

Thanks in advance,
Jason

On Oct 8, 10:48&#4294967295;am, Le Chaud Lapin <jaibudu...@gmail.com> wrote:
> On Oct 8, 10:15&#4294967295;am, Jerry Avins <j...@ieee.org> wrote:
>
> > kieran wrote:
> > > Hello,
> > > I am trying tocomparetwo similar audio files (WAV). From what i have
> > > read i need to sample both audio files at certain frequencies and run
> > > these through a FFT and thencomparethe results. Can anyone advise me
> > > if this is the correct approach and also describe the steps i need to
> > > take to get to the stage where I cancomparethe files.
>
> >WAVfiles contain sampled data (at any of a variety of rates). What
> > would sampling them involve?
>
> I think that is what he is trying to figure out. :)
>
> > What does it mean tocomparesimilar sounds? Can you define similarity
> > with software?
>
> Again, I think he is asking for help from someone to do that for
> him :)
>
> To OP:
>
> 1. Do whatever is necessary to convert .wavfiles to their discrete-
> time signals:
>
> http://www.sonicspot.com/guide/wavefiles.html
>
> 2. time-warping might or might not be necessary depending on
> difference between two sample rates:
>
> http://en.wikipedia.org/wiki/Dynamic_time_warping
>
> 3. After time warping, truncate both signals so that their durations
> are equivalent.
>
> 4. Compute normalized energy spectral density (ESD) from DFT's two
> signals:
>
> http://en.wikipedia.org/wiki/Power_spectrum.
>
> 6. Compute mean-square-error (MSE) between normalized ESD's of two
> signals:
>
> http://en.wikipedia.org/wiki/Mean_squared_error
>
> The MSE between the normalized ESD's of two signals is good metric of
> closeness. If you have say, 10 .wavfiles, and 2 of them are nearly
> the same, but the others are not, the two that are close should have a
> relatively low MSE. Two perfectly identical signals will obviously
> have MSE of zero. Ideally, two "equivalent" signals with different
> time scales, (20-second human talking versus 5-second chipmunk),
> different energies (soft-spoken human verus yelling chipmunk), and
> different phases (sampling began at slightly different instant against
> continuous time input); should still have MSE of zero, but
> quantization errors inherent in DSP will yield MSE slightly greater
> than zero.
>
> http://en.wikipedia.org/wiki/Minimum_mean-square_error
>
> -Le Chaud Lapin-

Reply by Le Chaud Lapin ●October 31, 20082008-10-31

On Oct 30, 11:32&#4294967295;pm, jleg...@proxime.net wrote:
> Hi, I'm trying to address a similar, yet still different problem. &#4294967295;I
> will have hundreds of recordings consisting of about 350ms worth of
> data. &#4294967295;The phase between them will be different, since the beginning
> of the recording is using a carrier operated squelch trigger, and
> buffering the 100ms before the trigger, in addition to 250ms after the
> trigger. &#4294967295;As such, the actual beginning of the recording could be off
> by up to 30 or 40 ms. &#4294967295;The waveforms, if viewed on a scope are nearly
> identical if coming from the same source. &#4294967295;The waveform from a
> difference source will be visually different, and have a different
> "fingerprint".
>
> Would using what you describe below be able to address my scenario?
>
> Thanks in advance,
> Jason
>
> On Oct 8, 10:48&#4294967295;am, Le Chaud Lapin <jaibudu...@gmail.com> wrote:
>
>
>
> > On Oct 8, 10:15&#4294967295;am, Jerry Avins <j...@ieee.org> wrote:
>
> > > kieran wrote:
> > > > Hello,
> > > > I am trying tocomparetwo similar audio files (WAV). From what i have
> > > > read i need to sample both audio files at certain frequencies and run
> > > > these through a FFT and thencomparethe results. Can anyone advise me
> > > > if this is the correct approach and also describe the steps i need to
> > > > take to get to the stage where I cancomparethe files.
>
> > >WAVfiles contain sampled data (at any of a variety of rates). What
> > > would sampling them involve?
>
> > I think that is what he is trying to figure out. :)
>
> > > What does it mean tocomparesimilar sounds? Can you define similarity
> > > with software?
>
> > Again, I think he is asking for help from someone to do that for
> > him :)
>
> > To OP:
>
> > 1. Do whatever is necessary to convert .wavfiles to their discrete-
> > time signals:
>
> >http://www.sonicspot.com/guide/wavefiles.html
>
> > 2. time-warping might or might not be necessary depending on
> > difference between two sample rates:
>
> >http://en.wikipedia.org/wiki/Dynamic_time_warping
>
> > 3. After time warping, truncate both signals so that their durations
> > are equivalent.
>
> > 4. Compute normalized energy spectral density (ESD) from DFT's two
> > signals:
>
> >http://en.wikipedia.org/wiki/Power_spectrum.
>
> > 6. Compute mean-square-error (MSE) between normalized ESD's of two
> > signals:
>
> >http://en.wikipedia.org/wiki/Mean_squared_error
>
> > The MSE between the normalized ESD's of two signals is good metric of
> > closeness. If you have say, 10 .wavfiles, and 2 of them are nearly
> > the same, but the others are not, the two that are close should have a
> > relatively low MSE. Two perfectly identical signals will obviously
> > have MSE of zero. Ideally, two "equivalent" signals with different
> > time scales, (20-second human talking versus 5-second chipmunk),
> > different energies (soft-spoken human verus yelling chipmunk), and
> > different phases (sampling began at slightly different instant against
> > continuous time input); should still have MSE of zero, but
> > quantization errors inherent in DSP will yield MSE slightly greater
> > than zero.

Hi,

I just took a look at the cepstral method for the first time, and it
seems that the results would be better, as indicated by other
posters.  It makes sense, as it takes into account the logarithmic
nature of "similarity" of two utterances, where as the straight MMSE
method does not.

Still, the MMSE method, with normalization, is a good place to start,
as it is the swiss-army-knife of signal estimation.  In fact, it
appears that cepstral method uses same concept of MMSE, but in a
different domain, that domain being the PSD of signal that is log of
PSD of original signal, which kind makes sense, as hearing/speech
sensitity is physiologically logarithmic anyway.

On a related note, one can regard the cepstral method as one of a
class of algorithms where MMSE technique is  applied to PSD of signal,
but some transformation thereof.

So answer is yes, you should get some positive results, but cepstral
method should definitely be investigated to see just how much better
it is.

-Le Chaud Lapin-

Reply by ●October 31, 20082008-10-31

Excellent.  Thanks!  I'll be progressing on this over the next few
weeks as a side project.


On Oct 31, 1:14&#4294967295;am, Le Chaud Lapin <jaibudu...@gmail.com> wrote:
> On Oct 30, 11:32&#4294967295;pm, jleg...@proxime.net wrote:
>
>
>
>
>
> > Hi, I'm trying to address a similar, yet still different problem. &#4294967295;I
> > will have hundreds of recordings consisting of about 350ms worth of
> > data. &#4294967295;The phase between them will be different, since the beginning
> > of the recording is using a carrier operated squelch trigger, and
> > buffering the 100ms before the trigger, in addition to 250ms after the
> > trigger. &#4294967295;As such, the actual beginning of the recording could be off
> > by up to 30 or 40 ms. &#4294967295;The waveforms, if viewed on a scope are nearly
> > identical if coming from the same source. &#4294967295;The waveform from a
> > difference source will be visually different, and have a different
> > "fingerprint".
>
> > Would using what you describe below be able to address my scenario?
>
> > Thanks in advance,
> > Jason
>
> > On Oct 8, 10:48&#4294967295;am, Le Chaud Lapin <jaibudu...@gmail.com> wrote:
>
> > > On Oct 8, 10:15&#4294967295;am, Jerry Avins <j...@ieee.org> wrote:
>
> > > > kieran wrote:
> > > > > Hello,
> > > > > I am trying tocomparetwo similar audio files (WAV). From what i have
> > > > > read i need to sample both audio files at certain frequencies and run
> > > > > these through a FFT and thencomparethe results. Can anyone advise me
> > > > > if this is the correct approach and also describe the steps i need to
> > > > > take to get to the stage where I cancomparethe files.
>
> > > >WAVfiles contain sampled data (at any of a variety of rates). What
> > > > would sampling them involve?
>
> > > I think that is what he is trying to figure out. :)
>
> > > > What does it mean tocomparesimilar sounds? Can you define similarity
> > > > with software?
>
> > > Again, I think he is asking for help from someone to do that for
> > > him :)
>
> > > To OP:
>
> > > 1. Do whatever is necessary to convert .wavfiles to their discrete-
> > > time signals:
>
> > >http://www.sonicspot.com/guide/wavefiles.html
>
> > > 2. time-warping might or might not be necessary depending on
> > > difference between two sample rates:
>
> > >http://en.wikipedia.org/wiki/Dynamic_time_warping
>
> > > 3. After time warping, truncate both signals so that their durations
> > > are equivalent.
>
> > > 4. Compute normalized energy spectral density (ESD) from DFT's two
> > > signals:
>
> > >http://en.wikipedia.org/wiki/Power_spectrum.
>
> > > 6. Compute mean-square-error (MSE) between normalized ESD's of two
> > > signals:
>
> > >http://en.wikipedia.org/wiki/Mean_squared_error
>
> > > The MSE between the normalized ESD's of two signals is good metric of
> > > closeness. If you have say, 10 .wavfiles, and 2 of them are nearly
> > > the same, but the others are not, the two that are close should have a
> > > relatively low MSE. Two perfectly identical signals will obviously
> > > have MSE of zero. Ideally, two "equivalent" signals with different
> > > time scales, (20-second human talking versus 5-second chipmunk),
> > > different energies (soft-spoken human verus yelling chipmunk), and
> > > different phases (sampling began at slightly different instant against
> > > continuous time input); should still have MSE of zero, but
> > > quantization errors inherent in DSP will yield MSE slightly greater
> > > than zero.
>
> Hi,
>
> I just took a look at the cepstral method for the first time, and it
> seems that the results would be better, as indicated by other
> posters. &#4294967295;It makes sense, as it takes into account the logarithmic
> nature of "similarity" of two utterances, where as the straight MMSE
> method does not.
>
> Still, the MMSE method, with normalization, is a good place to start,
> as it is the swiss-army-knife of signal estimation. &#4294967295;In fact, it
> appears that cepstral method uses same concept of MMSE, but in a
> different domain, that domain being the PSD of signal that is log of
> PSD of original signal, which kind makes sense, as hearing/speech
> sensitity is physiologically logarithmic anyway.
>
> On a related note, one can regard the cepstral method as one of a
> class of algorithms where MMSE technique is &#4294967295;applied to PSD of signal,
> but some transformation thereof.
>
> So answer is yes, you should get some positive results, but cepstral
> method should definitely be investigated to see just how much better
> it is.
>
> -Le Chaud Lapin-

Reply by ●November 2, 20082008-11-02

Well, I ended up using cohere() in octave, and it compares exactly
what you mentioned.  The issue is, when I look at two waveforms that I
know are "different", ie, the initial onset waveform starts at a
different point in the cycle than on the other (one starts at about
90, the other at about 240 degrees).  Essentially, we are trying to
fingerprint some transmitters, and the visual waveforms are indeed
unique per radio, but the MSE between them approaches 0 (to the point
that it's nearly equal to the MSE between two waveforms from the same
radio on some samples).  The false positive rate is a little on the
high side.  Is it acceptable to take sliding differentials on the
waveform with sufficient overlap and use that as another datapoint?

On Oct 31, 12:57&#4294967295;am, jleg...@proxime.net wrote:
> Excellent. &#4294967295;Thanks! &#4294967295;I'll be progressing on this over the next few
> weeks as a side project.
>
> On Oct 31, 1:14&#4294967295;am, Le Chaud Lapin <jaibudu...@gmail.com> wrote:
>
> > On Oct 30, 11:32&#4294967295;pm, jleg...@proxime.net wrote:
>
> > > Hi, I'm trying to address a similar, yet still different problem. &#4294967295;I
> > > will have hundreds of recordings consisting of about 350ms worth of
> > > data. &#4294967295;The phase between them will be different, since the beginning
> > > of the recording is using a carrier operated squelch trigger, and
> > > buffering the 100ms before the trigger, in addition to 250ms after the
> > > trigger. &#4294967295;As such, the actual beginning of the recording could be off
> > > by up to 30 or 40 ms. &#4294967295;The waveforms, if viewed on a scope are nearly
> > > identical if coming from the same source. &#4294967295;The waveform from a
> > > difference source will be visually different, and have a different
> > > "fingerprint".
>
> > > Would using what you describe below be able to address my scenario?
>
> > > Thanks in advance,
> > > Jason
>
> > > On Oct 8, 10:48&#4294967295;am, Le Chaud Lapin <jaibudu...@gmail.com> wrote:
>
> > > > On Oct 8, 10:15&#4294967295;am, Jerry Avins <j...@ieee.org> wrote:
>
> > > > > kieran wrote:
> > > > > > Hello,
> > > > > > I am trying tocomparetwo similar audio files (WAV). From what i have
> > > > > > read i need to sample both audio files at certain frequencies and run
> > > > > > these through a FFT and thencomparethe results. Can anyone advise me
> > > > > > if this is the correct approach and also describe the steps i need to
> > > > > > take to get to the stage where I cancomparethe files.
>
> > > > >WAVfiles contain sampled data (at any of a variety of rates). What
> > > > > would sampling them involve?
>
> > > > I think that is what he is trying to figure out. :)
>
> > > > > What does it mean tocomparesimilar sounds? Can you define similarity
> > > > > with software?
>
> > > > Again, I think he is asking for help from someone to do that for
> > > > him :)
>
> > > > To OP:
>
> > > > 1. Do whatever is necessary to convert .wavfiles to their discrete-
> > > > time signals:
>
> > > >http://www.sonicspot.com/guide/wavefiles.html
>
> > > > 2. time-warping might or might not be necessary depending on
> > > > difference between two sample rates:
>
> > > >http://en.wikipedia.org/wiki/Dynamic_time_warping
>
> > > > 3. After time warping, truncate both signals so that their durations
> > > > are equivalent.
>
> > > > 4. Compute normalized energy spectral density (ESD) from DFT's two
> > > > signals:
>
> > > >http://en.wikipedia.org/wiki/Power_spectrum.
>
> > > > 6. Compute mean-square-error (MSE) between normalized ESD's of two
> > > > signals:
>
> > > >http://en.wikipedia.org/wiki/Mean_squared_error
>
> > > > The MSE between the normalized ESD's of two signals is good metric of
> > > > closeness. If you have say, 10 .wavfiles, and 2 of them are nearly
> > > > the same, but the others are not, the two that are close should have a
> > > > relatively low MSE. Two perfectly identical signals will obviously
> > > > have MSE of zero. Ideally, two "equivalent" signals with different
> > > > time scales, (20-second human talking versus 5-second chipmunk),
> > > > different energies (soft-spoken human verus yelling chipmunk), and
> > > > different phases (sampling began at slightly different instant against
> > > > continuous time input); should still have MSE of zero, but
> > > > quantization errors inherent in DSP will yield MSE slightly greater
> > > > than zero.
>
> > Hi,
>
> > I just took a look at the cepstral method for the first time, and it
> > seems that the results would be better, as indicated by other
> > posters. &#4294967295;It makes sense, as it takes into account the logarithmic
> > nature of "similarity" of two utterances, where as the straight MMSE
> > method does not.
>
> > Still, the MMSE method, with normalization, is a good place to start,
> > as it is the swiss-army-knife of signal estimation. &#4294967295;In fact, it
> > appears that cepstral method uses same concept of MMSE, but in a
> > different domain, that domain being the PSD of signal that is log of
> > PSD of original signal, which kind makes sense, as hearing/speech
> > sensitity is physiologically logarithmic anyway.
>
> > On a related note, one can regard the cepstral method as one of a
> > class of algorithms where MMSE technique is &#4294967295;applied to PSD of signal,
> > but some transformation thereof.
>
> > So answer is yes, you should get some positive results, but cepstral
> > method should definitely be investigated to see just how much better
> > it is.
>
> > -Le Chaud Lapin-

Previous12 Next

Comparing two similiar audio files, FFT?

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group