DSPRelated.com
Forums

Comparing two similiar audio files, FFT?

Started by kieran October 8, 2008
I realized, I should also clarify.  The waveform will consistently
start at a different phase from another radio, and will progress a
particular number of cycles, damping in a very similar ratio per
cycle, yet even after I decimate by 8 from the 44k1 wav file (the
frequencies in play are below 5k), the MSE seems to love to approach 0
(probably by design, yet they can't control the individual components
and how they contribute to the onset wave)

On Nov 2, 10:12�am, jleg...@proxime.net wrote:
> Well, I ended up using cohere() in octave, and it compares exactly > what you mentioned. &#4294967295;The issue is, when I look at two waveforms that I > know are "different", ie, the initial onset waveform starts at a > different point in the cycle than on the other (one starts at about > 90, the other at about 240 degrees). &#4294967295;Essentially, we are trying to > fingerprint some transmitters, and the visual waveforms are indeed > unique per radio, but the MSE between them approaches 0 (to the point > that it's nearly equal to the MSE between two waveforms from the same > radio on some samples). &#4294967295;The false positive rate is a little on the > high side. &#4294967295;Is it acceptable to take sliding differentials on the > waveform with sufficient overlap and use that as another datapoint? > > On Oct 31, 12:57&#4294967295;am, jleg...@proxime.net wrote: > > > Excellent. &#4294967295;Thanks! &#4294967295;I'll be progressing on this over the next few > > weeks as a side project. > > > On Oct 31, 1:14&#4294967295;am, Le Chaud Lapin <jaibudu...@gmail.com> wrote: > > > > On Oct 30, 11:32&#4294967295;pm, jleg...@proxime.net wrote: > > > > > Hi, I'm trying to address a similar, yet still different problem. &#4294967295;I > > > > will have hundreds of recordings consisting of about 350ms worth of > > > > data. &#4294967295;The phase between them will be different, since the beginning > > > > of the recording is using a carrier operated squelch trigger, and > > > > buffering the 100ms before the trigger, in addition to 250ms after the > > > > trigger. &#4294967295;As such, the actual beginning of the recording could be off > > > > by up to 30 or 40 ms. &#4294967295;The waveforms, if viewed on a scope are nearly > > > > identical if coming from the same source. &#4294967295;The waveform from a > > > > difference source will be visually different, and have a different > > > > "fingerprint". > > > > > Would using what you describe below be able to address my scenario? > > > > > Thanks in advance, > > > > Jason > > > > > On Oct 8, 10:48&#4294967295;am, Le Chaud Lapin <jaibudu...@gmail.com> wrote: > > > > > > On Oct 8, 10:15&#4294967295;am, Jerry Avins <j...@ieee.org> wrote: > > > > > > > kieran wrote: > > > > > > > Hello, > > > > > > > I am trying tocomparetwo similar audio files (WAV). From what i have > > > > > > > read i need to sample both audio files at certain frequencies and run > > > > > > > these through a FFT and thencomparethe results. Can anyone advise me > > > > > > > if this is the correct approach and also describe the steps i need to > > > > > > > take to get to the stage where I cancomparethe files. > > > > > > >WAVfiles contain sampled data (at any of a variety of rates). What > > > > > > would sampling them involve? > > > > > > I think that is what he is trying to figure out. :) > > > > > > > What does it mean tocomparesimilar sounds? Can you define similarity > > > > > > with software? > > > > > > Again, I think he is asking for help from someone to do that for > > > > > him :) > > > > > > To OP: > > > > > > 1. Do whatever is necessary to convert .wavfiles to their discrete- > > > > > time signals: > > > > > >http://www.sonicspot.com/guide/wavefiles.html > > > > > > 2. time-warping might or might not be necessary depending on > > > > > difference between two sample rates: > > > > > >http://en.wikipedia.org/wiki/Dynamic_time_warping > > > > > > 3. After time warping, truncate both signals so that their durations > > > > > are equivalent. > > > > > > 4. Compute normalized energy spectral density (ESD) from DFT's two > > > > > signals: > > > > > >http://en.wikipedia.org/wiki/Power_spectrum. > > > > > > 6. Compute mean-square-error (MSE) between normalized ESD's of two > > > > > signals: > > > > > >http://en.wikipedia.org/wiki/Mean_squared_error > > > > > > The MSE between the normalized ESD's of two signals is good metric of > > > > > closeness. If you have say, 10 .wavfiles, and 2 of them are nearly > > > > > the same, but the others are not, the two that are close should have a > > > > > relatively low MSE. Two perfectly identical signals will obviously > > > > > have MSE of zero. Ideally, two "equivalent" signals with different > > > > > time scales, (20-second human talking versus 5-second chipmunk), > > > > > different energies (soft-spoken human verus yelling chipmunk), and > > > > > different phases (sampling began at slightly different instant against > > > > > continuous time input); should still have MSE of zero, but > > > > > quantization errors inherent in DSP will yield MSE slightly greater > > > > > than zero. > > > > Hi, > > > > I just took a look at the cepstral method for the first time, and it > > > seems that the results would be better, as indicated by other > > > posters. &#4294967295;It makes sense, as it takes into account the logarithmic > > > nature of "similarity" of two utterances, where as the straight MMSE > > > method does not. > > > > Still, the MMSE method, with normalization, is a good place to start, > > > as it is the swiss-army-knife of signal estimation. &#4294967295;In fact, it > > > appears that cepstral method uses same concept of MMSE, but in a > > > different domain, that domain being the PSD of signal that is log of > > > PSD of original signal, which kind makes sense, as hearing/speech > > > sensitity is physiologically logarithmic anyway. > > > > On a related note, one can regard the cepstral method as one of a > > > class of algorithms where MMSE technique is &#4294967295;applied to PSD of signal, > > > but some transformation thereof. > > > > So answer is yes, you should get some positive results, but cepstral > > > method should definitely be investigated to see just how much better > > > it is. > > > > -Le Chaud Lapin-
On Nov 2, 12:12&#4294967295;pm, jleg...@proxime.net wrote:
> Well, I ended up using cohere() in octave, and it compares exactly > what you mentioned. &#4294967295;The issue is, when I look at two waveforms that I > know are "different", ie, the initial onset waveform starts at a > different point in the cycle than on the other (one starts at about > 90, the other at about 240 degrees). &#4294967295;Essentially, we are trying to > fingerprint some transmitters, and the visual waveforms are indeed > unique per radio, but the MSE between them approaches 0 (to the point > that it's nearly equal to the MSE between two waveforms from the same > radio on some samples). &#4294967295;The false positive rate is a little on the > high side. &#4294967295;Is it acceptable to take sliding differentials on the > waveform with sufficient overlap and use that as another datapoint?
What do you mean by the "initial onset"? Are you saying that you have two essentially identical waveforms that are different with the exception of a small time shift? If so, the MSE on DFT will be close to zero because the waveforms are the same. Also, are you calculating MSE on s[n] or DFT of s[n]? Also, per your other post about the decay, IIUC, you are working with waveforms that are essentially s[n] = b[n] * r^(n), where r is the ratio? Correct? If you try to take MSE against s1[n] versus s2[n], where s1[n] = s2[n + tau], the result will be markedly different from MSE (DFT{s1[n]} - DFT{s2[n]}). For example, imagine s1[n] to be cosine. Then if s2[n] is -cosine, s1[n] and s2[n] are essentially equivalent, except for phase shift. But the humps are 180 degrees out of phase, so obviously when you take the mean of square of error, you're going to get something significantly greater than 0, relatively speaking. But MSE against DFT's of those same waves will be zero. -Le Chaud Lapin-
Have you tried Sevana AquA software?

http://www.sevana.fi/voice_quality_testing_measurement_analysis.php

>Well, I ended up using cohere() in octave, and it compares exactly >what you mentioned. The issue is, when I look at two waveforms that I >know are "different", ie, the initial onset waveform starts at a >different point in the cycle than on the other (one starts at about >90, the other at about 240 degrees). Essentially, we are trying to >fingerprint some transmitters, and the visual waveforms are indeed >unique per radio, but the MSE between them approaches 0 (to the point >that it's nearly equal to the MSE between two waveforms from the same >radio on some samples). The false positive rate is a little on the >high side. Is it acceptable to take sliding differentials on the >waveform with sufficient overlap and use that as another datapoint? > >On Oct 31, 12:57=A0am, jleg...@proxime.net wrote: >> Excellent. =A0Thanks! =A0I'll be progressing on this over the next few >> weeks as a side project. >> >> On Oct 31, 1:14=A0am, Le Chaud Lapin <jaibudu...@gmail.com> wrote: >> >> > On Oct 30, 11:32=A0pm, jleg...@proxime.net wrote: >> >> > > Hi, I'm trying to address a similar, yet still different problem.
=A0=
>I >> > > will have hundreds of recordings consisting of about 350ms worth
of
>> > > data. =A0The phase between them will be different, since the
beginnin=
>g >> > > of the recording is using a carrier operated squelch trigger, and >> > > buffering the 100ms before the trigger, in addition to 250ms after
th=
>e >> > > trigger. =A0As such, the actual beginning of the recording could be
o=
>ff >> > > by up to 30 or 40 ms. =A0The waveforms, if viewed on a scope are
near=
>ly >> > > identical if coming from the same source. =A0The waveform from a >> > > difference source will be visually different, and have a different >> > > "fingerprint". >> >> > > Would using what you describe below be able to address my
scenario?
>> >> > > Thanks in advance, >> > > Jason >> >> > > On Oct 8, 10:48=A0am, Le Chaud Lapin <jaibudu...@gmail.com> wrote: >> >> > > > On Oct 8, 10:15=A0am, Jerry Avins <j...@ieee.org> wrote: >> >> > > > > kieran wrote: >> > > > > > Hello, >> > > > > > I am trying tocomparetwo similar audio files (WAV). From what
i=
> have >> > > > > > read i need to sample both audio files at certain frequencies
a=
>nd run >> > > > > > these through a FFT and thencomparethe results. Can anyone
advi=
>se me >> > > > > > if this is the correct approach and also describe the steps i
n=
>eed to >> > > > > > take to get to the stage where I cancomparethe files. >> >> > > > >WAVfiles contain sampled data (at any of a variety of rates).
What
>> > > > > would sampling them involve? >> >> > > > I think that is what he is trying to figure out. :) >> >> > > > > What does it mean tocomparesimilar sounds? Can you define
similar=
>ity >> > > > > with software? >> >> > > > Again, I think he is asking for help from someone to do that for >> > > > him :) >> >> > > > To OP: >> >> > > > 1. Do whatever is necessary to convert .wavfiles to their
discrete-
>> > > > time signals: >> >> > > >http://www.sonicspot.com/guide/wavefiles.html >> >> > > > 2. time-warping might or might not be necessary depending on >> > > > difference between two sample rates: >> >> > > >http://en.wikipedia.org/wiki/Dynamic_time_warping >> >> > > > 3. After time warping, truncate both signals so that their
duration=
>s >> > > > are equivalent. >> >> > > > 4. Compute normalized energy spectral density (ESD) from DFT's
two
>> > > > signals: >> >> > > >http://en.wikipedia.org/wiki/Power_spectrum. >> >> > > > 6. Compute mean-square-error (MSE) between normalized ESD's of
two
>> > > > signals: >> >> > > >http://en.wikipedia.org/wiki/Mean_squared_error >> >> > > > The MSE between the normalized ESD's of two signals is good
metric =
>of >> > > > closeness. If you have say, 10 .wavfiles, and 2 of them are
nearly
>> > > > the same, but the others are not, the two that are close should
hav=
>e a >> > > > relatively low MSE. Two perfectly identical signals will
obviously
>> > > > have MSE of zero. Ideally, two "equivalent" signals with
different
>> > > > time scales, (20-second human talking versus 5-second chipmunk), >> > > > different energies (soft-spoken human verus yelling chipmunk),
and
>> > > > different phases (sampling began at slightly different instant
agai=
>nst >> > > > continuous time input); should still have MSE of zero, but >> > > > quantization errors inherent in DSP will yield MSE slightly
greater
>> > > > than zero. >> >> > Hi, >> >> > I just took a look at the cepstral method for the first time, and it >> > seems that the results would be better, as indicated by other >> > posters. =A0It makes sense, as it takes into account the logarithmic >> > nature of "similarity" of two utterances, where as the straight MMSE >> > method does not. >> >> > Still, the MMSE method, with normalization, is a good place to
start,
>> > as it is the swiss-army-knife of signal estimation. =A0In fact, it >> > appears that cepstral method uses same concept of MMSE, but in a >> > different domain, that domain being the PSD of signal that is log of >> > PSD of original signal, which kind makes sense, as hearing/speech >> > sensitity is physiologically logarithmic anyway. >> >> > On a related note, one can regard the cepstral method as one of a >> > class of algorithms where MMSE technique is =A0applied to PSD of
signal=
>, >> > but some transformation thereof. >> >> > So answer is yes, you should get some positive results, but cepstral >> > method should definitely be investigated to see just how much better >> > it is. >> >> > -Le Chaud Lapin- > >