comp.dsp | Sound comparison: practical approach

Hi Folks,

It's been a while since I looked at this NG but now I think I can use some 
fellow thinkers :)

I've been asked to think about comparing 2 audio signals to see if the audio 
is roughly (but with high probability) the same.
One input is a broadcasted signal (over air or internet), the other one is 
its reference (the original signal).
So the quality of both signals can differ, and there can be a short delay 
(because of transmission time).

For clarity: It is not a quality comparison, but a content-comparator. e.g. 
if the source input contains 'Radio 1' and the reference input sounds like 
'classical FM', a trigger must be fired saying: NO_MATCH! :-)

I was planning on building a prototype in software (Windows, C++) and I do 
have some ideas I'd like to work out, but I'd appreciate the insights of you 
guys. Maybe your expertises add value :-)

Currently I roughly see 2 options:
- time domain : Peak Response / Impulse Response
- freq. domain: Big-O comparison of freq. delta's

I think the hard part lies with the possible (but constant) delay.
Any other insights on this?

I'd appreciate any response!

Best regards,

Rob Vermeulen

Reply by Andor ●June 1, 20062006-06-01

Rob Vermeulen wrote:
...
> I've been asked to think about comparing 2 audio signals to see if the audio
> is roughly (but with high probability) the same.
>
> One input is a broadcasted signal (over air or internet), the other one is
> its reference (the original signal).
> So the quality of both signals can differ, and there can be a short delay
> (because of transmission time).
>
> For clarity: It is not a quality comparison, but a content-comparator. e.g.
> if the source input contains 'Radio 1' and the reference input sounds like
> 'classical FM', a trigger must be fired saying: NO_MATCH! :-)
>
> I was planning on building a prototype in software (Windows, C++) and I do
> have some ideas I'd like to work out, but I'd appreciate the insights of you
> guys. Maybe your expertises add value :-)
>
> Currently I roughly see 2 options:
> - time domain : Peak Response / Impulse Response
> - freq. domain: Big-O comparison of freq. delta's
>
> I think the hard part lies with the possible (but constant) delay.
> Any other insights on this?

There are several levels of difficulty of this problem. If the two
signals are seperated by a time-invariant linear system (which is more
general than a simple constant delay), I would try an adaptive filter.
A measure of equality of the two signals is then the rate of change of
the filter coefficients. Or the rate of change of some statistic based
on the filter coefficients (I think the reflection coefficients are far
more stable to subtle signal changes than the tapped delay
coefficients, for example).

However, lossy audio coding, sample rate changes and transmission
errors (all of which can occur for internet transmissions) are in
general neither time-invariant nor linear transformations. I don't know
how robust the adaptive filter approach would be in that case.

Regards,
Andor

Reply by Rob Vermeulen ●June 1, 20062006-06-01

Thanks for the reply, Andor.

Here's a bit of a naive thought, (and only 8 hours of work, which my boss 
would prefer:-)) but is this an option:

Say I have overcome the delay problem, so I am able to compare 2 signals 
that are fed to the inputs in a similar time scale, and I only have to worry 
about quality differences.
Would it be sufficient enough to low-pass both signals (so to erase 
high-freq distortion caused by encoding and air-transmission), normalize 
them and then compare their average energy-level from the past 1.5 seconds? 
Then, when the difference reaches a certain threshold, I can fire a trigger.

Rob.

"Andor" <andor.bariska@gmail.com> wrote in message 
news:1149156916.638291.172930@u72g2000cwu.googlegroups.com...
> Rob Vermeulen wrote:
> ...
>> I've been asked to think about comparing 2 audio signals to see if the 
>> audio
>> is roughly (but with high probability) the same.
>>
>> One input is a broadcasted signal (over air or internet), the other one 
>> is
>> its reference (the original signal).
>> So the quality of both signals can differ, and there can be a short delay
>> (because of transmission time).
>>
>> For clarity: It is not a quality comparison, but a content-comparator. 
>> e.g.
>> if the source input contains 'Radio 1' and the reference input sounds 
>> like
>> 'classical FM', a trigger must be fired saying: NO_MATCH! :-)
>>
>> I was planning on building a prototype in software (Windows, C++) and I 
>> do
>> have some ideas I'd like to work out, but I'd appreciate the insights of 
>> you
>> guys. Maybe your expertises add value :-)
>>
>> Currently I roughly see 2 options:
>> - time domain : Peak Response / Impulse Response
>> - freq. domain: Big-O comparison of freq. delta's
>>
>> I think the hard part lies with the possible (but constant) delay.
>> Any other insights on this?
>
> There are several levels of difficulty of this problem. If the two
> signals are seperated by a time-invariant linear system (which is more
> general than a simple constant delay), I would try an adaptive filter.
> A measure of equality of the two signals is then the rate of change of
> the filter coefficients. Or the rate of change of some statistic based
> on the filter coefficients (I think the reflection coefficients are far
> more stable to subtle signal changes than the tapped delay
> coefficients, for example).
>
> However, lossy audio coding, sample rate changes and transmission
> errors (all of which can occur for internet transmissions) are in
> general neither time-invariant nor linear transformations. I don't know
> how robust the adaptive filter approach would be in that case.
>
> Regards,
> Andor
>

Reply by Andor ●June 1, 20062006-06-01

Rob Vermeulen wrote:
> Thanks for the reply, Andor.
>
> Here's a bit of a naive thought, (and only 8 hours of work, which my boss
> would prefer:-)) but is this an option:
>
> Say I have overcome the delay problem, so I am able to compare 2 signals
> that are fed to the inputs in a similar time scale, and I only have to worry
> about quality differences.
> Would it be sufficient enough to low-pass both signals (so to erase
> high-freq distortion caused by encoding and air-transmission), normalize
> them and then compare their average energy-level from the past 1.5 seconds?
> Then, when the difference reaches a certain threshold, I can fire a trigger.

I don't know if this method is robust enough for your purposes. Try it
out.

I can imagine that it could be hard to differentiate modern CDs with
this method, because they are all over-compressed - you won't find much
local energy fluctuations to trigger on.

Another problem could be posed by broadcast limiters, which also tend
to flatten out the energy curve (to improve the reach and "punch" of
the radio station).

Regards,
Andor

Reply by Rob Vermeulen ●June 1, 20062006-06-01

Andor,

> I can imagine that it could be hard to differentiate modern CDs with
> this method, because they are all over-compressed - you won't find much
> local energy fluctuations to trigger on.

I see what you mean. I had similar problems when I tried to detect 
bass-drums in modern dance music.(No distinct difference between energy 
levels throughout the song).

How about my suggestion to compare frequency deltas between two signals?
Say I have a frequency splitter (using FFT) that separates a signal into 'n' 
bands (groups of  FFT-bins) and I compare each band's delta with the delta 
from my reference signal's corresponding band.

I think I'll just have to try it out :)

But I'm still open for good suggestions.

Best regards!


"Andor" <andor.bariska@gmail.com> wrote in message 
news:1149166574.984263.268440@y43g2000cwc.googlegroups.com...
> Rob Vermeulen wrote:
>> Thanks for the reply, Andor.
>>
>> Here's a bit of a naive thought, (and only 8 hours of work, which my boss
>> would prefer:-)) but is this an option:
>>
>> Say I have overcome the delay problem, so I am able to compare 2 signals
>> that are fed to the inputs in a similar time scale, and I only have to 
>> worry
>> about quality differences.
>> Would it be sufficient enough to low-pass both signals (so to erase
>> high-freq distortion caused by encoding and air-transmission), normalize
>> them and then compare their average energy-level from the past 1.5 
>> seconds?
>> Then, when the difference reaches a certain threshold, I can fire a 
>> trigger.
>
> I don't know if this method is robust enough for your purposes. Try it
> out.
>
> I can imagine that it could be hard to differentiate modern CDs with
> this method, because they are all over-compressed - you won't find much
> local energy fluctuations to trigger on.
>
> Another problem could be posed by broadcast limiters, which also tend
> to flatten out the energy curve (to improve the reach and "punch" of
> the radio station).
>
> Regards,
> Andor
>

Reply by Greg ●June 1, 20062006-06-01

On Thu, 1 Jun 2006 10:37:58 +0200, "Rob Vermeulen"
<rvermeulen@nospam-arbor-audio-spamless.com> wrote:

>Hi Folks,
>
>It's been a while since I looked at this NG but now I think I can use some 
>fellow thinkers :)
>
>I've been asked to think about comparing 2 audio signals to see if the audio 
>is roughly (but with high probability) the same.
>One input is a broadcasted signal (over air or internet), the other one is 
>its reference (the original signal).
>So the quality of both signals can differ, and there can be a short delay 
>(because of transmission time).
>
>For clarity: It is not a quality comparison, but a content-comparator. e.g. 
>if the source input contains 'Radio 1' and the reference input sounds like 
>'classical FM', a trigger must be fired saying: NO_MATCH! :-)
>
>I was planning on building a prototype in software (Windows, C++) and I do 
>have some ideas I'd like to work out, but I'd appreciate the insights of you 
>guys. Maybe your expertises add value :-)
>
>Currently I roughly see 2 options:
>- time domain : Peak Response / Impulse Response
>- freq. domain: Big-O comparison of freq. delta's
>
>I think the hard part lies with the possible (but constant) delay.
>Any other insights on this?
>
>I'd appreciate any response!
>
>Best regards,
>
>Rob Vermeulen 
>

Low pass filter to some upper limit you're pretty sure is present in both
streams. FFT both, then convert to magnitude/phase. Set the phase in each
to some predetermined, same values, maybe all 0. That would avoid the
likely phase differences existing between the two streams. Then IFFT the
results to get the time domain sequences back. Then perform an
autocorrelation. A single large peak should mean MATCH.

Greg Knox

Reply by Rob Vermeulen ●June 2, 20062006-06-02

Greg,

Thanks!
What you suggest looks like the way I already planned it, but your 
suggestion how to eliminate phase differences hadn't crossed my mind yet, 
and looks like a decent solution for small phase differences.

Have you got any idea how to overcome a LARGE delay like, say, 1.5 seconds? 
That won't work with just ignoring the phases of every FFT bin.

Regards,

Rob Vermeulen


"Greg" <gdk1@bellsouth.net> wrote in message 
news:bj9v72hq025dmh844qr98en8v9pu9s1lmq@4ax.com...
> On Thu, 1 Jun 2006 10:37:58 +0200, "Rob Vermeulen"
> <rvermeulen@nospam-arbor-audio-spamless.com> wrote:
>
>>Hi Folks,
>>
>>It's been a while since I looked at this NG but now I think I can use some
>>fellow thinkers :)
>>
>>I've been asked to think about comparing 2 audio signals to see if the 
>>audio
>>is roughly (but with high probability) the same.
>>One input is a broadcasted signal (over air or internet), the other one is
>>its reference (the original signal).
>>So the quality of both signals can differ, and there can be a short delay
>>(because of transmission time).
>>
>>For clarity: It is not a quality comparison, but a content-comparator. 
>>e.g.
>>if the source input contains 'Radio 1' and the reference input sounds like
>>'classical FM', a trigger must be fired saying: NO_MATCH! :-)
>>
>>I was planning on building a prototype in software (Windows, C++) and I do
>>have some ideas I'd like to work out, but I'd appreciate the insights of 
>>you
>>guys. Maybe your expertises add value :-)
>>
>>Currently I roughly see 2 options:
>>- time domain : Peak Response / Impulse Response
>>- freq. domain: Big-O comparison of freq. delta's
>>
>>I think the hard part lies with the possible (but constant) delay.
>>Any other insights on this?
>>
>>I'd appreciate any response!
>>
>>Best regards,
>>
>>Rob Vermeulen
>>
>
> Low pass filter to some upper limit you're pretty sure is present in both
> streams. FFT both, then convert to magnitude/phase. Set the phase in each
> to some predetermined, same values, maybe all 0. That would avoid the
> likely phase differences existing between the two streams. Then IFFT the
> results to get the time domain sequences back. Then perform an
> autocorrelation. A single large peak should mean MATCH.
>
> Greg Knox
>

Reply by jere...@gmail.com ●June 2, 20062006-06-02

I am doing something very similar using the crosscorrelation function
over all samples and then normalize the crosscorrelation output by the
maximum of the autocorrelation function. In this way, at best (i.e.
both signals exactly the same) you would obtain a maximum value of 1,
so you would be searching for a vlaue of 0.99 as your trigger and not
just any single large peak.

Jeremy

Rob Vermeulen wrote:
> Greg,
>
> Thanks!
> What you suggest looks like the way I already planned it, but your
> suggestion how to eliminate phase differences hadn't crossed my mind yet,
> and looks like a decent solution for small phase differences.
>
> Have you got any idea how to overcome a LARGE delay like, say, 1.5 seconds?
> That won't work with just ignoring the phases of every FFT bin.
>
> Regards,
>
> Rob Vermeulen
>
>
> "Greg" <gdk1@bellsouth.net> wrote in message
> news:bj9v72hq025dmh844qr98en8v9pu9s1lmq@4ax.com...
> > On Thu, 1 Jun 2006 10:37:58 +0200, "Rob Vermeulen"
> > <rvermeulen@nospam-arbor-audio-spamless.com> wrote:
> >
> >>Hi Folks,
> >>
> >>It's been a while since I looked at this NG but now I think I can use some
> >>fellow thinkers :)
> >>
> >>I've been asked to think about comparing 2 audio signals to see if the
> >>audio
> >>is roughly (but with high probability) the same.
> >>One input is a broadcasted signal (over air or internet), the other one is
> >>its reference (the original signal).
> >>So the quality of both signals can differ, and there can be a short delay
> >>(because of transmission time).
> >>
> >>For clarity: It is not a quality comparison, but a content-comparator.
> >>e.g.
> >>if the source input contains 'Radio 1' and the reference input sounds like
> >>'classical FM', a trigger must be fired saying: NO_MATCH! :-)
> >>
> >>I was planning on building a prototype in software (Windows, C++) and I do
> >>have some ideas I'd like to work out, but I'd appreciate the insights of
> >>you
> >>guys. Maybe your expertises add value :-)
> >>
> >>Currently I roughly see 2 options:
> >>- time domain : Peak Response / Impulse Response
> >>- freq. domain: Big-O comparison of freq. delta's
> >>
> >>I think the hard part lies with the possible (but constant) delay.
> >>Any other insights on this?
> >>
> >>I'd appreciate any response!
> >>
> >>Best regards,
> >>
> >>Rob Vermeulen
> >>
> >
> > Low pass filter to some upper limit you're pretty sure is present in both
> > streams. FFT both, then convert to magnitude/phase. Set the phase in each
> > to some predetermined, same values, maybe all 0. That would avoid the
> > likely phase differences existing between the two streams. Then IFFT the
> > results to get the time domain sequences back. Then perform an
> > autocorrelation. A single large peak should mean MATCH.
> >
> > Greg Knox
> >

Sound comparison: practical approach

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group