Have you tried Sevana AquA software?

http://www.sevana.fi/voice_quality_testing_measurement_analysis.php

>Well, I ended up using cohere() in octave, and it compares exactly
>what you mentioned.  The issue is, when I look at two waveforms that I
>know are "different", ie, the initial onset waveform starts at a
>different point in the cycle than on the other (one starts at about
>90, the other at about 240 degrees).  Essentially, we are trying to
>fingerprint some transmitters, and the visual waveforms are indeed
>unique per radio, but the MSE between them approaches 0 (to the point
>that it's nearly equal to the MSE between two waveforms from the same
>radio on some samples).  The false positive rate is a little on the
>high side.  Is it acceptable to take sliding differentials on the
>waveform with sufficient overlap and use that as another datapoint?
>
>On Oct 31, 12:57=A0am, jleg...@proxime.net wrote:
>> Excellent. =A0Thanks! =A0I'll be progressing on this over the next few
>> weeks as a side project.
>>
>> On Oct 31, 1:14=A0am, Le Chaud Lapin <jaibudu...@gmail.com> wrote:
>>
>> > On Oct 30, 11:32=A0pm, jleg...@proxime.net wrote:
>>
>> > > Hi, I'm trying to address a similar, yet still different problem.
=A0=
>I
>> > > will have hundreds of recordings consisting of about 350ms worth
of
>> > > data. =A0The phase between them will be different, since the
beginnin=
>g
>> > > of the recording is using a carrier operated squelch trigger, and
>> > > buffering the 100ms before the trigger, in addition to 250ms after
th=
>e
>> > > trigger. =A0As such, the actual beginning of the recording could be
o=
>ff
>> > > by up to 30 or 40 ms. =A0The waveforms, if viewed on a scope are
near=
>ly
>> > > identical if coming from the same source. =A0The waveform from a
>> > > difference source will be visually different, and have a different
>> > > "fingerprint".
>>
>> > > Would using what you describe below be able to address my
scenario?
>>
>> > > Thanks in advance,
>> > > Jason
>>
>> > > On Oct 8, 10:48=A0am, Le Chaud Lapin <jaibudu...@gmail.com> wrote:
>>
>> > > > On Oct 8, 10:15=A0am, Jerry Avins <j...@ieee.org> wrote:
>>
>> > > > > kieran wrote:
>> > > > > > Hello,
>> > > > > > I am trying tocomparetwo similar audio files (WAV). From what
i=
> have
>> > > > > > read i need to sample both audio files at certain frequencies
a=
>nd run
>> > > > > > these through a FFT and thencomparethe results. Can anyone
advi=
>se me
>> > > > > > if this is the correct approach and also describe the steps i
n=
>eed to
>> > > > > > take to get to the stage where I cancomparethe files.
>>
>> > > > >WAVfiles contain sampled data (at any of a variety of rates).
What
>> > > > > would sampling them involve?
>>
>> > > > I think that is what he is trying to figure out. :)
>>
>> > > > > What does it mean tocomparesimilar sounds? Can you define
similar=
>ity
>> > > > > with software?
>>
>> > > > Again, I think he is asking for help from someone to do that for
>> > > > him :)
>>
>> > > > To OP:
>>
>> > > > 1. Do whatever is necessary to convert .wavfiles to their
discrete-
>> > > > time signals:
>>
>> > > >http://www.sonicspot.com/guide/wavefiles.html
>>
>> > > > 2. time-warping might or might not be necessary depending on
>> > > > difference between two sample rates:
>>
>> > > >http://en.wikipedia.org/wiki/Dynamic_time_warping
>>
>> > > > 3. After time warping, truncate both signals so that their
duration=
>s
>> > > > are equivalent.
>>
>> > > > 4. Compute normalized energy spectral density (ESD) from DFT's
two
>> > > > signals:
>>
>> > > >http://en.wikipedia.org/wiki/Power_spectrum.
>>
>> > > > 6. Compute mean-square-error (MSE) between normalized ESD's of
two
>> > > > signals:
>>
>> > > >http://en.wikipedia.org/wiki/Mean_squared_error
>>
>> > > > The MSE between the normalized ESD's of two signals is good
metric =
>of
>> > > > closeness. If you have say, 10 .wavfiles, and 2 of them are
nearly
>> > > > the same, but the others are not, the two that are close should
hav=
>e a
>> > > > relatively low MSE. Two perfectly identical signals will
obviously
>> > > > have MSE of zero. Ideally, two "equivalent" signals with
different
>> > > > time scales, (20-second human talking versus 5-second chipmunk),
>> > > > different energies (soft-spoken human verus yelling chipmunk),
and
>> > > > different phases (sampling began at slightly different instant
agai=
>nst
>> > > > continuous time input); should still have MSE of zero, but
>> > > > quantization errors inherent in DSP will yield MSE slightly
greater
>> > > > than zero.
>>
>> > Hi,
>>
>> > I just took a look at the cepstral method for the first time, and it
>> > seems that the results would be better, as indicated by other
>> > posters. =A0It makes sense, as it takes into account the logarithmic
>> > nature of "similarity" of two utterances, where as the straight MMSE
>> > method does not.
>>
>> > Still, the MMSE method, with normalization, is a good place to
start,
>> > as it is the swiss-army-knife of signal estimation. =A0In fact, it
>> > appears that cepstral method uses same concept of MMSE, but in a
>> > different domain, that domain being the PSD of signal that is log of
>> > PSD of original signal, which kind makes sense, as hearing/speech
>> > sensitity is physiologically logarithmic anyway.
>>
>> > On a related note, one can regard the cepstral method as one of a
>> > class of algorithms where MMSE technique is =A0applied to PSD of
signal=
>,
>> > but some transformation thereof.
>>
>> > So answer is yes, you should get some positive results, but cepstral
>> > method should definitely be investigated to see just how much better
>> > it is.
>>
>> > -Le Chaud Lapin-
>
>

On Nov 2, 12:12&#4294967295;pm, jleg...@proxime.net wrote:
> Well, I ended up using cohere() in octave, and it compares exactly
> what you mentioned. &#4294967295;The issue is, when I look at two waveforms that I
> know are "different", ie, the initial onset waveform starts at a
> different point in the cycle than on the other (one starts at about
> 90, the other at about 240 degrees). &#4294967295;Essentially, we are trying to
> fingerprint some transmitters, and the visual waveforms are indeed
> unique per radio, but the MSE between them approaches 0 (to the point
> that it's nearly equal to the MSE between two waveforms from the same
> radio on some samples). &#4294967295;The false positive rate is a little on the
> high side. &#4294967295;Is it acceptable to take sliding differentials on the
> waveform with sufficient overlap and use that as another datapoint?

What do you mean by the "initial onset"?  Are you saying that you have
two essentially identical waveforms that are different with the
exception of a small time shift? If so, the MSE on DFT will be close
to zero because the waveforms are the same.

Also, are you calculating MSE on s[n] or DFT of s[n]?

Also, per your other post about the decay, IIUC, you are working with
waveforms that are essentially

s[n] = b[n] * r^(n), where r is the ratio? Correct?

If you try to take MSE against s1[n] versus s2[n], where s1[n] = s2[n
+ tau], the result will be markedly different from MSE (DFT{s1[n]} -
DFT{s2[n]}).  For example, imagine s1[n] to be cosine. Then if s2[n]
is -cosine,  s1[n] and s2[n] are essentially equivalent, except for
phase shift.  But the humps are 180 degrees out of phase, so obviously
when you take the mean of square of error, you're going to get
something significantly greater than 0, relatively speaking.  But MSE
against DFT's of those same waves will be zero.

-Le Chaud Lapin-

I realized, I should also clarify.  The waveform will consistently
start at a different phase from another radio, and will progress a
particular number of cycles, damping in a very similar ratio per
cycle, yet even after I decimate by 8 from the 44k1 wav file (the
frequencies in play are below 5k), the MSE seems to love to approach 0
(probably by design, yet they can't control the individual components
and how they contribute to the onset wave)

On Nov 2, 10:12&#4294967295;am, jleg...@proxime.net wrote:
> Well, I ended up using cohere() in octave, and it compares exactly
> what you mentioned. &#4294967295;The issue is, when I look at two waveforms that I
> know are "different", ie, the initial onset waveform starts at a
> different point in the cycle than on the other (one starts at about
> 90, the other at about 240 degrees). &#4294967295;Essentially, we are trying to
> fingerprint some transmitters, and the visual waveforms are indeed
> unique per radio, but the MSE between them approaches 0 (to the point
> that it's nearly equal to the MSE between two waveforms from the same
> radio on some samples). &#4294967295;The false positive rate is a little on the
> high side. &#4294967295;Is it acceptable to take sliding differentials on the
> waveform with sufficient overlap and use that as another datapoint?
>
> On Oct 31, 12:57&#4294967295;am, jleg...@proxime.net wrote:
>
> > Excellent. &#4294967295;Thanks! &#4294967295;I'll be progressing on this over the next few
> > weeks as a side project.
>
> > On Oct 31, 1:14&#4294967295;am, Le Chaud Lapin <jaibudu...@gmail.com> wrote:
>
> > > On Oct 30, 11:32&#4294967295;pm, jleg...@proxime.net wrote:
>
> > > > Hi, I'm trying to address a similar, yet still different problem. &#4294967295;I
> > > > will have hundreds of recordings consisting of about 350ms worth of
> > > > data. &#4294967295;The phase between them will be different, since the beginning
> > > > of the recording is using a carrier operated squelch trigger, and
> > > > buffering the 100ms before the trigger, in addition to 250ms after the
> > > > trigger. &#4294967295;As such, the actual beginning of the recording could be off
> > > > by up to 30 or 40 ms. &#4294967295;The waveforms, if viewed on a scope are nearly
> > > > identical if coming from the same source. &#4294967295;The waveform from a
> > > > difference source will be visually different, and have a different
> > > > "fingerprint".
>
> > > > Would using what you describe below be able to address my scenario?
>
> > > > Thanks in advance,
> > > > Jason
>
> > > > On Oct 8, 10:48&#4294967295;am, Le Chaud Lapin <jaibudu...@gmail.com> wrote:
>
> > > > > On Oct 8, 10:15&#4294967295;am, Jerry Avins <j...@ieee.org> wrote:
>
> > > > > > kieran wrote:
> > > > > > > Hello,
> > > > > > > I am trying tocomparetwo similar audio files (WAV). From what i have
> > > > > > > read i need to sample both audio files at certain frequencies and run
> > > > > > > these through a FFT and thencomparethe results. Can anyone advise me
> > > > > > > if this is the correct approach and also describe the steps i need to
> > > > > > > take to get to the stage where I cancomparethe files.
>
> > > > > >WAVfiles contain sampled data (at any of a variety of rates). What
> > > > > > would sampling them involve?
>
> > > > > I think that is what he is trying to figure out. :)
>
> > > > > > What does it mean tocomparesimilar sounds? Can you define similarity
> > > > > > with software?
>
> > > > > Again, I think he is asking for help from someone to do that for
> > > > > him :)
>
> > > > > To OP:
>
> > > > > 1. Do whatever is necessary to convert .wavfiles to their discrete-
> > > > > time signals:
>
> > > > >http://www.sonicspot.com/guide/wavefiles.html
>
> > > > > 2. time-warping might or might not be necessary depending on
> > > > > difference between two sample rates:
>
> > > > >http://en.wikipedia.org/wiki/Dynamic_time_warping
>
> > > > > 3. After time warping, truncate both signals so that their durations
> > > > > are equivalent.
>
> > > > > 4. Compute normalized energy spectral density (ESD) from DFT's two
> > > > > signals:
>
> > > > >http://en.wikipedia.org/wiki/Power_spectrum.
>
> > > > > 6. Compute mean-square-error (MSE) between normalized ESD's of two
> > > > > signals:
>
> > > > >http://en.wikipedia.org/wiki/Mean_squared_error
>
> > > > > The MSE between the normalized ESD's of two signals is good metric of
> > > > > closeness. If you have say, 10 .wavfiles, and 2 of them are nearly
> > > > > the same, but the others are not, the two that are close should have a
> > > > > relatively low MSE. Two perfectly identical signals will obviously
> > > > > have MSE of zero. Ideally, two "equivalent" signals with different
> > > > > time scales, (20-second human talking versus 5-second chipmunk),
> > > > > different energies (soft-spoken human verus yelling chipmunk), and
> > > > > different phases (sampling began at slightly different instant against
> > > > > continuous time input); should still have MSE of zero, but
> > > > > quantization errors inherent in DSP will yield MSE slightly greater
> > > > > than zero.
>
> > > Hi,
>
> > > I just took a look at the cepstral method for the first time, and it
> > > seems that the results would be better, as indicated by other
> > > posters. &#4294967295;It makes sense, as it takes into account the logarithmic
> > > nature of "similarity" of two utterances, where as the straight MMSE
> > > method does not.
>
> > > Still, the MMSE method, with normalization, is a good place to start,
> > > as it is the swiss-army-knife of signal estimation. &#4294967295;In fact, it
> > > appears that cepstral method uses same concept of MMSE, but in a
> > > different domain, that domain being the PSD of signal that is log of
> > > PSD of original signal, which kind makes sense, as hearing/speech
> > > sensitity is physiologically logarithmic anyway.
>
> > > On a related note, one can regard the cepstral method as one of a
> > > class of algorithms where MMSE technique is &#4294967295;applied to PSD of signal,
> > > but some transformation thereof.
>
> > > So answer is yes, you should get some positive results, but cepstral
> > > method should definitely be investigated to see just how much better
> > > it is.
>
> > > -Le Chaud Lapin-

Well, I ended up using cohere() in octave, and it compares exactly
what you mentioned.  The issue is, when I look at two waveforms that I
know are "different", ie, the initial onset waveform starts at a
different point in the cycle than on the other (one starts at about
90, the other at about 240 degrees).  Essentially, we are trying to
fingerprint some transmitters, and the visual waveforms are indeed
unique per radio, but the MSE between them approaches 0 (to the point
that it's nearly equal to the MSE between two waveforms from the same
radio on some samples).  The false positive rate is a little on the
high side.  Is it acceptable to take sliding differentials on the
waveform with sufficient overlap and use that as another datapoint?

On Oct 31, 12:57&#4294967295;am, jleg...@proxime.net wrote:
> Excellent. &#4294967295;Thanks! &#4294967295;I'll be progressing on this over the next few
> weeks as a side project.
>
> On Oct 31, 1:14&#4294967295;am, Le Chaud Lapin <jaibudu...@gmail.com> wrote:
>
> > On Oct 30, 11:32&#4294967295;pm, jleg...@proxime.net wrote:
>
> > > Hi, I'm trying to address a similar, yet still different problem. &#4294967295;I
> > > will have hundreds of recordings consisting of about 350ms worth of
> > > data. &#4294967295;The phase between them will be different, since the beginning
> > > of the recording is using a carrier operated squelch trigger, and
> > > buffering the 100ms before the trigger, in addition to 250ms after the
> > > trigger. &#4294967295;As such, the actual beginning of the recording could be off
> > > by up to 30 or 40 ms. &#4294967295;The waveforms, if viewed on a scope are nearly
> > > identical if coming from the same source. &#4294967295;The waveform from a
> > > difference source will be visually different, and have a different
> > > "fingerprint".
>
> > > Would using what you describe below be able to address my scenario?
>
> > > Thanks in advance,
> > > Jason
>
> > > On Oct 8, 10:48&#4294967295;am, Le Chaud Lapin <jaibudu...@gmail.com> wrote:
>
> > > > On Oct 8, 10:15&#4294967295;am, Jerry Avins <j...@ieee.org> wrote:
>
> > > > > kieran wrote:
> > > > > > Hello,
> > > > > > I am trying tocomparetwo similar audio files (WAV). From what i have
> > > > > > read i need to sample both audio files at certain frequencies and run
> > > > > > these through a FFT and thencomparethe results. Can anyone advise me
> > > > > > if this is the correct approach and also describe the steps i need to
> > > > > > take to get to the stage where I cancomparethe files.
>
> > > > >WAVfiles contain sampled data (at any of a variety of rates). What
> > > > > would sampling them involve?
>
> > > > I think that is what he is trying to figure out. :)
>
> > > > > What does it mean tocomparesimilar sounds? Can you define similarity
> > > > > with software?
>
> > > > Again, I think he is asking for help from someone to do that for
> > > > him :)
>
> > > > To OP:
>
> > > > 1. Do whatever is necessary to convert .wavfiles to their discrete-
> > > > time signals:
>
> > > >http://www.sonicspot.com/guide/wavefiles.html
>
> > > > 2. time-warping might or might not be necessary depending on
> > > > difference between two sample rates:
>
> > > >http://en.wikipedia.org/wiki/Dynamic_time_warping
>
> > > > 3. After time warping, truncate both signals so that their durations
> > > > are equivalent.
>
> > > > 4. Compute normalized energy spectral density (ESD) from DFT's two
> > > > signals:
>
> > > >http://en.wikipedia.org/wiki/Power_spectrum.
>
> > > > 6. Compute mean-square-error (MSE) between normalized ESD's of two
> > > > signals:
>
> > > >http://en.wikipedia.org/wiki/Mean_squared_error
>
> > > > The MSE between the normalized ESD's of two signals is good metric of
> > > > closeness. If you have say, 10 .wavfiles, and 2 of them are nearly
> > > > the same, but the others are not, the two that are close should have a
> > > > relatively low MSE. Two perfectly identical signals will obviously
> > > > have MSE of zero. Ideally, two "equivalent" signals with different
> > > > time scales, (20-second human talking versus 5-second chipmunk),
> > > > different energies (soft-spoken human verus yelling chipmunk), and
> > > > different phases (sampling began at slightly different instant against
> > > > continuous time input); should still have MSE of zero, but
> > > > quantization errors inherent in DSP will yield MSE slightly greater
> > > > than zero.
>
> > Hi,
>
> > I just took a look at the cepstral method for the first time, and it
> > seems that the results would be better, as indicated by other
> > posters. &#4294967295;It makes sense, as it takes into account the logarithmic
> > nature of "similarity" of two utterances, where as the straight MMSE
> > method does not.
>
> > Still, the MMSE method, with normalization, is a good place to start,
> > as it is the swiss-army-knife of signal estimation. &#4294967295;In fact, it
> > appears that cepstral method uses same concept of MMSE, but in a
> > different domain, that domain being the PSD of signal that is log of
> > PSD of original signal, which kind makes sense, as hearing/speech
> > sensitity is physiologically logarithmic anyway.
>
> > On a related note, one can regard the cepstral method as one of a
> > class of algorithms where MMSE technique is &#4294967295;applied to PSD of signal,
> > but some transformation thereof.
>
> > So answer is yes, you should get some positive results, but cepstral
> > method should definitely be investigated to see just how much better
> > it is.
>
> > -Le Chaud Lapin-

Excellent.  Thanks!  I'll be progressing on this over the next few
weeks as a side project.


On Oct 31, 1:14&#4294967295;am, Le Chaud Lapin <jaibudu...@gmail.com> wrote:
> On Oct 30, 11:32&#4294967295;pm, jleg...@proxime.net wrote:
>
>
>
>
>
> > Hi, I'm trying to address a similar, yet still different problem. &#4294967295;I
> > will have hundreds of recordings consisting of about 350ms worth of
> > data. &#4294967295;The phase between them will be different, since the beginning
> > of the recording is using a carrier operated squelch trigger, and
> > buffering the 100ms before the trigger, in addition to 250ms after the
> > trigger. &#4294967295;As such, the actual beginning of the recording could be off
> > by up to 30 or 40 ms. &#4294967295;The waveforms, if viewed on a scope are nearly
> > identical if coming from the same source. &#4294967295;The waveform from a
> > difference source will be visually different, and have a different
> > "fingerprint".
>
> > Would using what you describe below be able to address my scenario?
>
> > Thanks in advance,
> > Jason
>
> > On Oct 8, 10:48&#4294967295;am, Le Chaud Lapin <jaibudu...@gmail.com> wrote:
>
> > > On Oct 8, 10:15&#4294967295;am, Jerry Avins <j...@ieee.org> wrote:
>
> > > > kieran wrote:
> > > > > Hello,
> > > > > I am trying tocomparetwo similar audio files (WAV). From what i have
> > > > > read i need to sample both audio files at certain frequencies and run
> > > > > these through a FFT and thencomparethe results. Can anyone advise me
> > > > > if this is the correct approach and also describe the steps i need to
> > > > > take to get to the stage where I cancomparethe files.
>
> > > >WAVfiles contain sampled data (at any of a variety of rates). What
> > > > would sampling them involve?
>
> > > I think that is what he is trying to figure out. :)
>
> > > > What does it mean tocomparesimilar sounds? Can you define similarity
> > > > with software?
>
> > > Again, I think he is asking for help from someone to do that for
> > > him :)
>
> > > To OP:
>
> > > 1. Do whatever is necessary to convert .wavfiles to their discrete-
> > > time signals:
>
> > >http://www.sonicspot.com/guide/wavefiles.html
>
> > > 2. time-warping might or might not be necessary depending on
> > > difference between two sample rates:
>
> > >http://en.wikipedia.org/wiki/Dynamic_time_warping
>
> > > 3. After time warping, truncate both signals so that their durations
> > > are equivalent.
>
> > > 4. Compute normalized energy spectral density (ESD) from DFT's two
> > > signals:
>
> > >http://en.wikipedia.org/wiki/Power_spectrum.
>
> > > 6. Compute mean-square-error (MSE) between normalized ESD's of two
> > > signals:
>
> > >http://en.wikipedia.org/wiki/Mean_squared_error
>
> > > The MSE between the normalized ESD's of two signals is good metric of
> > > closeness. If you have say, 10 .wavfiles, and 2 of them are nearly
> > > the same, but the others are not, the two that are close should have a
> > > relatively low MSE. Two perfectly identical signals will obviously
> > > have MSE of zero. Ideally, two "equivalent" signals with different
> > > time scales, (20-second human talking versus 5-second chipmunk),
> > > different energies (soft-spoken human verus yelling chipmunk), and
> > > different phases (sampling began at slightly different instant against
> > > continuous time input); should still have MSE of zero, but
> > > quantization errors inherent in DSP will yield MSE slightly greater
> > > than zero.
>
> Hi,
>
> I just took a look at the cepstral method for the first time, and it
> seems that the results would be better, as indicated by other
> posters. &#4294967295;It makes sense, as it takes into account the logarithmic
> nature of "similarity" of two utterances, where as the straight MMSE
> method does not.
>
> Still, the MMSE method, with normalization, is a good place to start,
> as it is the swiss-army-knife of signal estimation. &#4294967295;In fact, it
> appears that cepstral method uses same concept of MMSE, but in a
> different domain, that domain being the PSD of signal that is log of
> PSD of original signal, which kind makes sense, as hearing/speech
> sensitity is physiologically logarithmic anyway.
>
> On a related note, one can regard the cepstral method as one of a
> class of algorithms where MMSE technique is &#4294967295;applied to PSD of signal,
> but some transformation thereof.
>
> So answer is yes, you should get some positive results, but cepstral
> method should definitely be investigated to see just how much better
> it is.
>
> -Le Chaud Lapin-

On Oct 30, 11:32&#4294967295;pm, jleg...@proxime.net wrote:
> Hi, I'm trying to address a similar, yet still different problem. &#4294967295;I
> will have hundreds of recordings consisting of about 350ms worth of
> data. &#4294967295;The phase between them will be different, since the beginning
> of the recording is using a carrier operated squelch trigger, and
> buffering the 100ms before the trigger, in addition to 250ms after the
> trigger. &#4294967295;As such, the actual beginning of the recording could be off
> by up to 30 or 40 ms. &#4294967295;The waveforms, if viewed on a scope are nearly
> identical if coming from the same source. &#4294967295;The waveform from a
> difference source will be visually different, and have a different
> "fingerprint".
>
> Would using what you describe below be able to address my scenario?
>
> Thanks in advance,
> Jason
>
> On Oct 8, 10:48&#4294967295;am, Le Chaud Lapin <jaibudu...@gmail.com> wrote:
>
>
>
> > On Oct 8, 10:15&#4294967295;am, Jerry Avins <j...@ieee.org> wrote:
>
> > > kieran wrote:
> > > > Hello,
> > > > I am trying tocomparetwo similar audio files (WAV). From what i have
> > > > read i need to sample both audio files at certain frequencies and run
> > > > these through a FFT and thencomparethe results. Can anyone advise me
> > > > if this is the correct approach and also describe the steps i need to
> > > > take to get to the stage where I cancomparethe files.
>
> > >WAVfiles contain sampled data (at any of a variety of rates). What
> > > would sampling them involve?
>
> > I think that is what he is trying to figure out. :)
>
> > > What does it mean tocomparesimilar sounds? Can you define similarity
> > > with software?
>
> > Again, I think he is asking for help from someone to do that for
> > him :)
>
> > To OP:
>
> > 1. Do whatever is necessary to convert .wavfiles to their discrete-
> > time signals:
>
> >http://www.sonicspot.com/guide/wavefiles.html
>
> > 2. time-warping might or might not be necessary depending on
> > difference between two sample rates:
>
> >http://en.wikipedia.org/wiki/Dynamic_time_warping
>
> > 3. After time warping, truncate both signals so that their durations
> > are equivalent.
>
> > 4. Compute normalized energy spectral density (ESD) from DFT's two
> > signals:
>
> >http://en.wikipedia.org/wiki/Power_spectrum.
>
> > 6. Compute mean-square-error (MSE) between normalized ESD's of two
> > signals:
>
> >http://en.wikipedia.org/wiki/Mean_squared_error
>
> > The MSE between the normalized ESD's of two signals is good metric of
> > closeness. If you have say, 10 .wavfiles, and 2 of them are nearly
> > the same, but the others are not, the two that are close should have a
> > relatively low MSE. Two perfectly identical signals will obviously
> > have MSE of zero. Ideally, two "equivalent" signals with different
> > time scales, (20-second human talking versus 5-second chipmunk),
> > different energies (soft-spoken human verus yelling chipmunk), and
> > different phases (sampling began at slightly different instant against
> > continuous time input); should still have MSE of zero, but
> > quantization errors inherent in DSP will yield MSE slightly greater
> > than zero.

Hi,

I just took a look at the cepstral method for the first time, and it
seems that the results would be better, as indicated by other
posters.  It makes sense, as it takes into account the logarithmic
nature of "similarity" of two utterances, where as the straight MMSE
method does not.

Still, the MMSE method, with normalization, is a good place to start,
as it is the swiss-army-knife of signal estimation.  In fact, it
appears that cepstral method uses same concept of MMSE, but in a
different domain, that domain being the PSD of signal that is log of
PSD of original signal, which kind makes sense, as hearing/speech
sensitity is physiologically logarithmic anyway.

On a related note, one can regard the cepstral method as one of a
class of algorithms where MMSE technique is  applied to PSD of signal,
but some transformation thereof.

So answer is yes, you should get some positive results, but cepstral
method should definitely be investigated to see just how much better
it is.

-Le Chaud Lapin-

Hi, I'm trying to address a similar, yet still different problem.  I
will have hundreds of recordings consisting of about 350ms worth of
data.  The phase between them will be different, since the beginning
of the recording is using a carrier operated squelch trigger, and
buffering the 100ms before the trigger, in addition to 250ms after the
trigger.  As such, the actual beginning of the recording could be off
by up to 30 or 40 ms.  The waveforms, if viewed on a scope are nearly
identical if coming from the same source.  The waveform from a
difference source will be visually different, and have a different
"fingerprint".

Would using what you describe below be able to address my scenario?

Thanks in advance,
Jason

On Oct 8, 10:48&#4294967295;am, Le Chaud Lapin <jaibudu...@gmail.com> wrote:
> On Oct 8, 10:15&#4294967295;am, Jerry Avins <j...@ieee.org> wrote:
>
> > kieran wrote:
> > > Hello,
> > > I am trying tocomparetwo similar audio files (WAV). From what i have
> > > read i need to sample both audio files at certain frequencies and run
> > > these through a FFT and thencomparethe results. Can anyone advise me
> > > if this is the correct approach and also describe the steps i need to
> > > take to get to the stage where I cancomparethe files.
>
> >WAVfiles contain sampled data (at any of a variety of rates). What
> > would sampling them involve?
>
> I think that is what he is trying to figure out. :)
>
> > What does it mean tocomparesimilar sounds? Can you define similarity
> > with software?
>
> Again, I think he is asking for help from someone to do that for
> him :)
>
> To OP:
>
> 1. Do whatever is necessary to convert .wavfiles to their discrete-
> time signals:
>
> http://www.sonicspot.com/guide/wavefiles.html
>
> 2. time-warping might or might not be necessary depending on
> difference between two sample rates:
>
> http://en.wikipedia.org/wiki/Dynamic_time_warping
>
> 3. After time warping, truncate both signals so that their durations
> are equivalent.
>
> 4. Compute normalized energy spectral density (ESD) from DFT's two
> signals:
>
> http://en.wikipedia.org/wiki/Power_spectrum.
>
> 6. Compute mean-square-error (MSE) between normalized ESD's of two
> signals:
>
> http://en.wikipedia.org/wiki/Mean_squared_error
>
> The MSE between the normalized ESD's of two signals is good metric of
> closeness. If you have say, 10 .wavfiles, and 2 of them are nearly
> the same, but the others are not, the two that are close should have a
> relatively low MSE. Two perfectly identical signals will obviously
> have MSE of zero. Ideally, two "equivalent" signals with different
> time scales, (20-second human talking versus 5-second chipmunk),
> different energies (soft-spoken human verus yelling chipmunk), and
> different phases (sampling began at slightly different instant against
> continuous time input); should still have MSE of zero, but
> quantization errors inherent in DSP will yield MSE slightly greater
> than zero.
>
> http://en.wikipedia.org/wiki/Minimum_mean-square_error
>
> -Le Chaud Lapin-

On Oct 8, 7:49&#4294967295;pm, kieran <kieranoc...@gmail.com> wrote:
> Hello,
> I am trying to compare two similar audio files (WAV). From what i have
> read i need to sample both audio files at certain frequencies and run
> these through a FFT and then compare the results. Can anyone advise me
> if this is the correct approach and also describe the steps i need to
> take to get to the stage where I can compare the files.
> TIA,
> Kieran

My two-pence:
In case the wave files contain speech, you problem looks quite similar
to standard speech recognition problem. Maybe you can take some
inspiration from speech recoginition techniques in that case. Speech
recognition usually use LPC&Cepstral coefficients instead of FFT for
preprocessing. You may or may not require the VQ (Vector Quantization)/
HMM (Hidden Markov model) part depending on the exact application. Dr.
Lawrence Rabiner's papers/book on this subject may be a good starting
point.

Regards
Piyush

On Oct 23, 4:07 pm, kieran <kieranoc...@gmail.com> wrote:
> On Oct 8, 6:48 pm, Le Chaud Lapin <jaibudu...@gmail.com> wrote:
>
>
>
> > On Oct 8, 10:15 am, Jerry Avins <j...@ieee.org> wrote:
>
> > >kieranwrote:
> > > > Hello,
> > > > I am trying to compare two similar audio files (WAV). From what i have
> > > > read i need to sample both audio files at certain frequencies and run
> > > > these through a FFT and then compare the results. Can anyone advise me
> > > > if this is the correct approach and also describe the steps i need to
> > > > take to get to the stage where I can compare the files.
>
> > > WAV files contain sampled data (at any of a variety of rates). What
> > > would sampling them involve?
>
> > I think that is what he is trying to figure out. :)
>
> > > What does it mean to compare similar sounds? Can you define similarity
> > > with software?
>
> > Again, I think he is asking for help from someone to do that for
> > him :)
>
> > To OP:
>
> > 1. Do whatever is necessary to convert .wav files to their discrete-
> > time signals:
>
> >http://www.sonicspot.com/guide/wavefiles.html
>
> > 2. time-warping might or might not be necessary depending on
> > difference between two sample rates:
>
> >http://en.wikipedia.org/wiki/Dynamic_time_warping
>
> > 3. After time warping, truncate both signals so that their durations
> > are equivalent.
>
> > 4. Compute normalized energy spectral density (ESD) from DFT's two
> > signals:
>
> >http://en.wikipedia.org/wiki/Power_spectrum.
>
> > 6. Compute mean-square-error (MSE) between normalized ESD's of two
> > signals:
>
> >http://en.wikipedia.org/wiki/Mean_squared_error
>
> > The MSE between the normalized ESD's of two signals is good metric of
> > closeness. If you have say, 10 .wav files, and 2 of them are nearly
> > the same, but the others are not, the two that are close should have a
> > relatively low MSE. Two perfectly identical signals will obviously
> > have MSE of zero. Ideally, two "equivalent" signals with different
> > time scales, (20-second human talking versus 5-second chipmunk),
> > different energies (soft-spoken human verus yelling chipmunk), and
> > different phases (sampling began at slightly different instant against
> > continuous time input); should still have MSE of zero, but
> > quantization errors inherent in DSP will yield MSE slightly greater
> > than zero.
>
> >http://en.wikipedia.org/wiki/Minimum_mean-square_error
>
> > -Le Chaud Lapin-
>
> Hello,
> thanks for your reply, this approach seems to be the way forward. I
> have been working on this for the previous 3 weeks , much of my time
> has been spent learning about DSP and FFT. When i thought I was ready
> to put what i have learned into perl i have had no luck with the
> modules I have been trying to install.
>  I have installed lots of modules over the previous couple of weeks
> but the modules around Audio have not been installing properly for me.
> Perhaps the modules I have been using are old  and no longer
> supported, but I have not been able to install them properly.
> Has anyone used the following modules?
> Audio::FLAC::Decoder, Audio::Mad::Resample, Audio::MPEG,
> Audio::SNDFile.
> Or can anyone suggest any modules to do the following?
> Downsample/Decode wave files, apply a low pass filter to audio
> sample.
> Hope you can help,kieran

Sorry guys, that last one was really perl specific. Thanks for your
help.
Kieran

On Oct 8, 6:48 pm, Le Chaud Lapin <jaibudu...@gmail.com> wrote:
> On Oct 8, 10:15 am, Jerry Avins <j...@ieee.org> wrote:
>
> >kieranwrote:
> > > Hello,
> > > I am trying to compare two similar audio files (WAV). From what i have
> > > read i need to sample both audio files at certain frequencies and run
> > > these through a FFT and then compare the results. Can anyone advise me
> > > if this is the correct approach and also describe the steps i need to
> > > take to get to the stage where I can compare the files.
>
> > WAV files contain sampled data (at any of a variety of rates). What
> > would sampling them involve?
>
> I think that is what he is trying to figure out. :)
>
> > What does it mean to compare similar sounds? Can you define similarity
> > with software?
>
> Again, I think he is asking for help from someone to do that for
> him :)
>
> To OP:
>
> 1. Do whatever is necessary to convert .wav files to their discrete-
> time signals:
>
> http://www.sonicspot.com/guide/wavefiles.html
>
> 2. time-warping might or might not be necessary depending on
> difference between two sample rates:
>
> http://en.wikipedia.org/wiki/Dynamic_time_warping
>
> 3. After time warping, truncate both signals so that their durations
> are equivalent.
>
> 4. Compute normalized energy spectral density (ESD) from DFT's two
> signals:
>
> http://en.wikipedia.org/wiki/Power_spectrum.
>
> 6. Compute mean-square-error (MSE) between normalized ESD's of two
> signals:
>
> http://en.wikipedia.org/wiki/Mean_squared_error
>
> The MSE between the normalized ESD's of two signals is good metric of
> closeness. If you have say, 10 .wav files, and 2 of them are nearly
> the same, but the others are not, the two that are close should have a
> relatively low MSE. Two perfectly identical signals will obviously
> have MSE of zero. Ideally, two "equivalent" signals with different
> time scales, (20-second human talking versus 5-second chipmunk),
> different energies (soft-spoken human verus yelling chipmunk), and
> different phases (sampling began at slightly different instant against
> continuous time input); should still have MSE of zero, but
> quantization errors inherent in DSP will yield MSE slightly greater
> than zero.
>
> http://en.wikipedia.org/wiki/Minimum_mean-square_error
>
> -Le Chaud Lapin-

Hello,
thanks for your reply, this approach seems to be the way forward. I
have been working on this for the previous 3 weeks , much of my time
has been spent learning about DSP and FFT. When i thought I was ready
to put what i have learned into perl i have had no luck with the
modules I have been trying to install.
 I have installed lots of modules over the previous couple of weeks
but the modules around Audio have not been installing properly for me.
Perhaps the modules I have been using are old  and no longer
supported, but I have not been able to install them properly.
Has anyone used the following modules?
Audio::FLAC::Decoder, Audio::Mad::Resample, Audio::MPEG,
Audio::SNDFile.
Or can anyone suggest any modules to do the following?
Downsample/Decode wave files, apply a low pass filter to audio
sample.
Hope you can help,
kieran