DSPRelated.com
Forums

Sound comparison: practical approach

Started by Rob Vermeulen June 1, 2006
Hi Folks,

It's been a while since I looked at this NG but now I think I can use some 
fellow thinkers :)

I've been asked to think about comparing 2 audio signals to see if the audio 
is roughly (but with high probability) the same.
One input is a broadcasted signal (over air or internet), the other one is 
its reference (the original signal).
So the quality of both signals can differ, and there can be a short delay 
(because of transmission time).

For clarity: It is not a quality comparison, but a content-comparator. e.g. 
if the source input contains 'Radio 1' and the reference input sounds like 
'classical FM', a trigger must be fired saying: NO_MATCH! :-)

I was planning on building a prototype in software (Windows, C++) and I do 
have some ideas I'd like to work out, but I'd appreciate the insights of you 
guys. Maybe your expertises add value :-)

Currently I roughly see 2 options:
- time domain : Peak Response / Impulse Response
- freq. domain: Big-O comparison of freq. delta's

I think the hard part lies with the possible (but constant) delay.
Any other insights on this?

I'd appreciate any response!

Best regards,

Rob Vermeulen 


Rob Vermeulen wrote:
...
> I've been asked to think about comparing 2 audio signals to see if the audio > is roughly (but with high probability) the same. > > One input is a broadcasted signal (over air or internet), the other one is > its reference (the original signal). > So the quality of both signals can differ, and there can be a short delay > (because of transmission time). > > For clarity: It is not a quality comparison, but a content-comparator. e.g. > if the source input contains 'Radio 1' and the reference input sounds like > 'classical FM', a trigger must be fired saying: NO_MATCH! :-) > > I was planning on building a prototype in software (Windows, C++) and I do > have some ideas I'd like to work out, but I'd appreciate the insights of you > guys. Maybe your expertises add value :-) > > Currently I roughly see 2 options: > - time domain : Peak Response / Impulse Response > - freq. domain: Big-O comparison of freq. delta's > > I think the hard part lies with the possible (but constant) delay. > Any other insights on this?
There are several levels of difficulty of this problem. If the two signals are seperated by a time-invariant linear system (which is more general than a simple constant delay), I would try an adaptive filter. A measure of equality of the two signals is then the rate of change of the filter coefficients. Or the rate of change of some statistic based on the filter coefficients (I think the reflection coefficients are far more stable to subtle signal changes than the tapped delay coefficients, for example). However, lossy audio coding, sample rate changes and transmission errors (all of which can occur for internet transmissions) are in general neither time-invariant nor linear transformations. I don't know how robust the adaptive filter approach would be in that case. Regards, Andor
Thanks for the reply, Andor.

Here's a bit of a naive thought, (and only 8 hours of work, which my boss 
would prefer:-)) but is this an option:

Say I have overcome the delay problem, so I am able to compare 2 signals 
that are fed to the inputs in a similar time scale, and I only have to worry 
about quality differences.
Would it be sufficient enough to low-pass both signals (so to erase 
high-freq distortion caused by encoding and air-transmission), normalize 
them and then compare their average energy-level from the past 1.5 seconds? 
Then, when the difference reaches a certain threshold, I can fire a trigger.

Rob.

"Andor" <andor.bariska@gmail.com> wrote in message 
news:1149156916.638291.172930@u72g2000cwu.googlegroups.com...
> Rob Vermeulen wrote: > ... >> I've been asked to think about comparing 2 audio signals to see if the >> audio >> is roughly (but with high probability) the same. >> >> One input is a broadcasted signal (over air or internet), the other one >> is >> its reference (the original signal). >> So the quality of both signals can differ, and there can be a short delay >> (because of transmission time). >> >> For clarity: It is not a quality comparison, but a content-comparator. >> e.g. >> if the source input contains 'Radio 1' and the reference input sounds >> like >> 'classical FM', a trigger must be fired saying: NO_MATCH! :-) >> >> I was planning on building a prototype in software (Windows, C++) and I >> do >> have some ideas I'd like to work out, but I'd appreciate the insights of >> you >> guys. Maybe your expertises add value :-) >> >> Currently I roughly see 2 options: >> - time domain : Peak Response / Impulse Response >> - freq. domain: Big-O comparison of freq. delta's >> >> I think the hard part lies with the possible (but constant) delay. >> Any other insights on this? > > There are several levels of difficulty of this problem. If the two > signals are seperated by a time-invariant linear system (which is more > general than a simple constant delay), I would try an adaptive filter. > A measure of equality of the two signals is then the rate of change of > the filter coefficients. Or the rate of change of some statistic based > on the filter coefficients (I think the reflection coefficients are far > more stable to subtle signal changes than the tapped delay > coefficients, for example). > > However, lossy audio coding, sample rate changes and transmission > errors (all of which can occur for internet transmissions) are in > general neither time-invariant nor linear transformations. I don't know > how robust the adaptive filter approach would be in that case. > > Regards, > Andor >
Rob Vermeulen wrote:
> Thanks for the reply, Andor. > > Here's a bit of a naive thought, (and only 8 hours of work, which my boss > would prefer:-)) but is this an option: > > Say I have overcome the delay problem, so I am able to compare 2 signals > that are fed to the inputs in a similar time scale, and I only have to worry > about quality differences. > Would it be sufficient enough to low-pass both signals (so to erase > high-freq distortion caused by encoding and air-transmission), normalize > them and then compare their average energy-level from the past 1.5 seconds? > Then, when the difference reaches a certain threshold, I can fire a trigger.
I don't know if this method is robust enough for your purposes. Try it out. I can imagine that it could be hard to differentiate modern CDs with this method, because they are all over-compressed - you won't find much local energy fluctuations to trigger on. Another problem could be posed by broadcast limiters, which also tend to flatten out the energy curve (to improve the reach and "punch" of the radio station). Regards, Andor
Andor,

> I can imagine that it could be hard to differentiate modern CDs with > this method, because they are all over-compressed - you won't find much > local energy fluctuations to trigger on.
I see what you mean. I had similar problems when I tried to detect bass-drums in modern dance music.(No distinct difference between energy levels throughout the song). How about my suggestion to compare frequency deltas between two signals? Say I have a frequency splitter (using FFT) that separates a signal into 'n' bands (groups of FFT-bins) and I compare each band's delta with the delta from my reference signal's corresponding band. I think I'll just have to try it out :) But I'm still open for good suggestions. Best regards! "Andor" <andor.bariska@gmail.com> wrote in message news:1149166574.984263.268440@y43g2000cwc.googlegroups.com...
> Rob Vermeulen wrote: >> Thanks for the reply, Andor. >> >> Here's a bit of a naive thought, (and only 8 hours of work, which my boss >> would prefer:-)) but is this an option: >> >> Say I have overcome the delay problem, so I am able to compare 2 signals >> that are fed to the inputs in a similar time scale, and I only have to >> worry >> about quality differences. >> Would it be sufficient enough to low-pass both signals (so to erase >> high-freq distortion caused by encoding and air-transmission), normalize >> them and then compare their average energy-level from the past 1.5 >> seconds? >> Then, when the difference reaches a certain threshold, I can fire a >> trigger. > > I don't know if this method is robust enough for your purposes. Try it > out. > > I can imagine that it could be hard to differentiate modern CDs with > this method, because they are all over-compressed - you won't find much > local energy fluctuations to trigger on. > > Another problem could be posed by broadcast limiters, which also tend > to flatten out the energy curve (to improve the reach and "punch" of > the radio station). > > Regards, > Andor >
On Thu, 1 Jun 2006 10:37:58 +0200, "Rob Vermeulen"
<rvermeulen@nospam-arbor-audio-spamless.com> wrote:

>Hi Folks, > >It's been a while since I looked at this NG but now I think I can use some >fellow thinkers :) > >I've been asked to think about comparing 2 audio signals to see if the audio >is roughly (but with high probability) the same. >One input is a broadcasted signal (over air or internet), the other one is >its reference (the original signal). >So the quality of both signals can differ, and there can be a short delay >(because of transmission time). > >For clarity: It is not a quality comparison, but a content-comparator. e.g. >if the source input contains 'Radio 1' and the reference input sounds like >'classical FM', a trigger must be fired saying: NO_MATCH! :-) > >I was planning on building a prototype in software (Windows, C++) and I do >have some ideas I'd like to work out, but I'd appreciate the insights of you >guys. Maybe your expertises add value :-) > >Currently I roughly see 2 options: >- time domain : Peak Response / Impulse Response >- freq. domain: Big-O comparison of freq. delta's > >I think the hard part lies with the possible (but constant) delay. >Any other insights on this? > >I'd appreciate any response! > >Best regards, > >Rob Vermeulen >
Low pass filter to some upper limit you're pretty sure is present in both streams. FFT both, then convert to magnitude/phase. Set the phase in each to some predetermined, same values, maybe all 0. That would avoid the likely phase differences existing between the two streams. Then IFFT the results to get the time domain sequences back. Then perform an autocorrelation. A single large peak should mean MATCH. Greg Knox
Greg,

Thanks!
What you suggest looks like the way I already planned it, but your 
suggestion how to eliminate phase differences hadn't crossed my mind yet, 
and looks like a decent solution for small phase differences.

Have you got any idea how to overcome a LARGE delay like, say, 1.5 seconds? 
That won't work with just ignoring the phases of every FFT bin.

Regards,

Rob Vermeulen


"Greg" <gdk1@bellsouth.net> wrote in message 
news:bj9v72hq025dmh844qr98en8v9pu9s1lmq@4ax.com...
> On Thu, 1 Jun 2006 10:37:58 +0200, "Rob Vermeulen" > <rvermeulen@nospam-arbor-audio-spamless.com> wrote: > >>Hi Folks, >> >>It's been a while since I looked at this NG but now I think I can use some >>fellow thinkers :) >> >>I've been asked to think about comparing 2 audio signals to see if the >>audio >>is roughly (but with high probability) the same. >>One input is a broadcasted signal (over air or internet), the other one is >>its reference (the original signal). >>So the quality of both signals can differ, and there can be a short delay >>(because of transmission time). >> >>For clarity: It is not a quality comparison, but a content-comparator. >>e.g. >>if the source input contains 'Radio 1' and the reference input sounds like >>'classical FM', a trigger must be fired saying: NO_MATCH! :-) >> >>I was planning on building a prototype in software (Windows, C++) and I do >>have some ideas I'd like to work out, but I'd appreciate the insights of >>you >>guys. Maybe your expertises add value :-) >> >>Currently I roughly see 2 options: >>- time domain : Peak Response / Impulse Response >>- freq. domain: Big-O comparison of freq. delta's >> >>I think the hard part lies with the possible (but constant) delay. >>Any other insights on this? >> >>I'd appreciate any response! >> >>Best regards, >> >>Rob Vermeulen >> > > Low pass filter to some upper limit you're pretty sure is present in both > streams. FFT both, then convert to magnitude/phase. Set the phase in each > to some predetermined, same values, maybe all 0. That would avoid the > likely phase differences existing between the two streams. Then IFFT the > results to get the time domain sequences back. Then perform an > autocorrelation. A single large peak should mean MATCH. > > Greg Knox >
I am doing something very similar using the crosscorrelation function
over all samples and then normalize the crosscorrelation output by the
maximum of the autocorrelation function. In this way, at best (i.e.
both signals exactly the same) you would obtain a maximum value of 1,
so you would be searching for a vlaue of 0.99 as your trigger and not
just any single large peak.

Jeremy

Rob Vermeulen wrote:
> Greg, > > Thanks! > What you suggest looks like the way I already planned it, but your > suggestion how to eliminate phase differences hadn't crossed my mind yet, > and looks like a decent solution for small phase differences. > > Have you got any idea how to overcome a LARGE delay like, say, 1.5 seconds? > That won't work with just ignoring the phases of every FFT bin. > > Regards, > > Rob Vermeulen > > > "Greg" <gdk1@bellsouth.net> wrote in message > news:bj9v72hq025dmh844qr98en8v9pu9s1lmq@4ax.com... > > On Thu, 1 Jun 2006 10:37:58 +0200, "Rob Vermeulen" > > <rvermeulen@nospam-arbor-audio-spamless.com> wrote: > > > >>Hi Folks, > >> > >>It's been a while since I looked at this NG but now I think I can use some > >>fellow thinkers :) > >> > >>I've been asked to think about comparing 2 audio signals to see if the > >>audio > >>is roughly (but with high probability) the same. > >>One input is a broadcasted signal (over air or internet), the other one is > >>its reference (the original signal). > >>So the quality of both signals can differ, and there can be a short delay > >>(because of transmission time). > >> > >>For clarity: It is not a quality comparison, but a content-comparator. > >>e.g. > >>if the source input contains 'Radio 1' and the reference input sounds like > >>'classical FM', a trigger must be fired saying: NO_MATCH! :-) > >> > >>I was planning on building a prototype in software (Windows, C++) and I do > >>have some ideas I'd like to work out, but I'd appreciate the insights of > >>you > >>guys. Maybe your expertises add value :-) > >> > >>Currently I roughly see 2 options: > >>- time domain : Peak Response / Impulse Response > >>- freq. domain: Big-O comparison of freq. delta's > >> > >>I think the hard part lies with the possible (but constant) delay. > >>Any other insights on this? > >> > >>I'd appreciate any response! > >> > >>Best regards, > >> > >>Rob Vermeulen > >> > > > > Low pass filter to some upper limit you're pretty sure is present in both > > streams. FFT both, then convert to magnitude/phase. Set the phase in each > > to some predetermined, same values, maybe all 0. That would avoid the > > likely phase differences existing between the two streams. Then IFFT the > > results to get the time domain sequences back. Then perform an > > autocorrelation. A single large peak should mean MATCH. > > > > Greg Knox > >