comp.dsp | Sound Identification/Matching - good starting point?| page 3

Reply by roschler ●November 23, 20082008-11-23

On Nov 23, 1:46&#4294967295;pm, "Fred Marshall" <fmarshallx@remove_the_x.acm.org>
wrote:
>
> Well, I don't want to complicate things but you might consider doing pattern
> recognition on a 2-D "image" of spectral density vs. time with a particular
> set of temporal characteristics. &#4294967295;That would bring in image processing
> techniques but is somewhat the idea in identifying sounds - to look at the
> time variation of the spectral density as a pattern.
>
> When you add the complexity of the "uninteresting" TV then I can imagine it
> tuned to "The Dog Whisperer" while you're trying to tell if your own dog is
> barking!! &#4294967295;Anyway, that suggests perhaps a noise canceller using the TV
> output as a pre-processing step. &#4294967295;I'm not sure how hard or easy that would
> be as you'd likely have to delay the classification stream to accomodate
> dealing with the rapidly varying TV output. &#4294967295;I don't know if that's been
> done. &#4294967295;It might look like this:
>
> microphone >> delay &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; >> summing point
> TV &#4294967295; &#4294967295; &#4294967295; &#4294967295; >> adaptive filter >> &#4294967295; &#4294967295; &#4294967295;^
>
> The delay is necessary so that the delay of the adaptive filter doesn't
> misalign the TV signal at the summing point. &#4294967295;You need the cancellation to
> be aligned I do believe.
>
> It's a lot less complicated without this...... and it's still complicated!!
>
> Fred

Fred,

Well then I'll plan on at least doing a spectrum vs time analysis
then, rather than a static snapshot of a large duration sound
sampling.  The great thing about a problem like this is even if I
don't solve it completely, it's still incredibly fun to try.

Thanks,
Robert

Reply by SteveSmith ●November 24, 20082008-11-24

Rune, Rune Rune...

Now you've motivated me to take action.  Look for the new post.

Reply by ●November 25, 20082008-11-25

On Nov 24, 7:58&#4294967295;am, Rune Allnor <all...@tele.ntnu.no> wrote:
> On 23 Nov, 19:06, HardySpicer <gyansor...@gmail.com> wrote:
>
> > I cannot agree that because humans can do it then it would take a
> > machine eons to do likewise. Speech recognition is a hard task and is
> > just about there
> > now. Some engines are very accurate - say 99% al be it in an
> > environment with a high SNR. That has taken about 50 years.
>
> The problem in speech processing is rather well-conditioned
> compared to most other applications:
>
> - The signal is constrained (e.g. a phone line), so one
> &#4294967295; knows the source is human
> - The characteristics of human speech can be identified
> &#4294967295; from experiments
> - Such experiments can be performed in an ideal environment
> &#4294967295; (anechoic chamber)
>
> and *still* it's a non-trivial exercise.
>
> Steve said that "it's easy for the human brain" to do
> these identifications; I'm thinking it takes a human
> brain to achieve it.
>
> Rune

Speech recognition is a bloody hard problem. It has taken whole teams
of linguists,computer scientists and engineers to sort it out.



Hardy

Reply by Esteban Gaviz ●November 26, 20082008-11-26

On Nov 22, 12:54&#4294967295;pm, roschler <robert.osch...@gmail.com> wrote:
> Can anyone recommend a good starting point for creating code that does
> Sound Identification/Matching? &#4294967295;I was going to start creating a
> library of FFT snapshots consisting of varying time window lengths for
> the sounds I want to identify in audio streams. &#4294967295;Then I was going to
> start matching up sounds that way by comparing snapshots taken from
> incoming audio seeing how close they are to the library, but &#4294967295;then I
> thought I'd better ask to see if there any known techniques, papers,
> etc. that would help me avoid reinventing the wheel. &#4294967295;Anything with
> source code examples would be appreciated.
>
> I know there's a lot of stuff out there, but I'm not technical enough
> to quickly sift out the best technique or to understand the possible
> caveats of using one technique over another; especially if the answer
> is given in high level math form. &#4294967295;That's why I'm asking for some
> tips.
>
> Thanks,
> Robert

Hi, I couldn't be bothered to read every post so if what I have to add
has already come forward, you may feel free to ignore it. I've written
software which does exactly what you are referring to and it was
divided into two main parts: a dsp part and a database part. It works
like this: for any length of audio i want to create a description of,
the dsp part analyzes the incoming audio, creating overlapping
snapshots of fft data. These snapshots are then added together and
this summing helps to multiply the most distinctive frequencies in the
signal. The next step was storing the data. The global snapshot was
now processed into frequency buckets and a float number constructed
for each of the buckets, it's value based on the amplitude of their
frequencies. Each bucket's number now denotes a fictional length. From
the length of the all the buckets, multidimensional vectors are
constructed. Vectors would be a good fit for your project since they
work on a nearest neighbour approach. This basically means that the
vector constructed by one recording of a church bell doesn't need to
be exactly the same for it to match to another recording of a church
bell, i.e. the difference between the sounds is the distance between
the vectors describing them. I hope some of this makes sense.

Reply by roschler ●November 26, 20082008-11-26

On Nov 26, 4:58&#4294967295;am, Esteban Gaviz <esteban.supr...@gmail.com> wrote:
> On Nov 22, 12:54&#4294967295;pm, roschler <robert.osch...@gmail.com> wrote:
>
> > Can anyone recommend a good starting point for creating code that does
> > Sound Identification/Matching? &#4294967295;I was going to start creating a
> > library of FFT snapshots consisting of varying time window lengths for
> > the sounds I want to identify in audio streams. &#4294967295;Then I was going to
> > start matching up sounds that way by comparing snapshots taken from
> > incoming audio seeing how close they are to the library, but &#4294967295;then I
> > thought I'd better ask to see if there any known techniques, papers,
> > etc. that would help me avoid reinventing the wheel. &#4294967295;Anything with
> > source code examples would be appreciated.
>
> > I know there's a lot of stuff out there, but I'm not technical enough
> > to quickly sift out the best technique or to understand the possible
> > caveats of using one technique over another; especially if the answer
> > is given in high level math form. &#4294967295;That's why I'm asking for some
> > tips.
>
> > Thanks,
> > Robert
>
> Hi, I couldn't be bothered to read every post so if what I have to add
> has already come forward, you may feel free to ignore it. I've written
> software which does exactly what you are referring to and it was
> divided into two main parts: a dsp part and a database part. It works
> like this: for any length of audio i want to create a description of,
> the dsp part analyzes the incoming audio, creating overlapping
> snapshots of fft data. These snapshots are then added together and
> this summing helps to multiply the most distinctive frequencies in the
> signal. The next step was storing the data. The global snapshot was
> now processed into frequency buckets and a float number constructed
> for each of the buckets, it's value based on the amplitude of their
> frequencies. Each bucket's number now denotes a fictional length. From
> the length of the all the buckets, multidimensional vectors are
> constructed. Vectors would be a good fit for your project since they
> work on a nearest neighbour approach. This basically means that the
> vector constructed by one recording of a church bell doesn't need to
> be exactly the same for it to match to another recording of a church
> bell, i.e. the difference between the sounds is the distance between
> the vectors describing them. I hope some of this makes sense.

Esteban,

Yes it does and thank you for your suggestions.  What classification
method did you use to compare vectors?  I thought about using PCA
(Principal Components Analysis) on the spectral profiles I created for
each sound, and then comparing new profiles to these Eigenvectors.
Also, how did the FFT profile created by using your overlap/add method
compare to that of one created by performing an FFT over a time window
that encompassed the entire sound?

Thanks,
Robert

Reply by Richard Dobson ●November 26, 20082008-11-26

roschler wrote:
> Can anyone recommend a good starting point for creating code that does
> Sound Identification/Matching?  

NB: those are two distinct jobs. You may be able to match one sound with 
another, without being able to identify either of them.

Keywords for searching: spectral modelling; audio content analysis

I finally tracked down a source for the reference I think may be a good 
starting point, the key subject being MPEG-7 sound descriptors:

"A proposal for the description of audio in the context of MPEG-7"

http://en.scientificcommons.org/541620

(download via the citeseer link)

And the full MPEG-7 Reference Software (mostly C++) is available on a 
DVD with the moderately expensive book:

"Introduction to MPEG-7: Multimedia Content Description Interface"

http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0471486787.html

I downloaded it all only a year ago for my archives, so the software 
package ~may~ still be available online somewhere. Whether it counts as 
a "good starting point" for creating code I  leave to you to decide.

Someone wrote earlier that "the brain can do it easily". It has to be 
pointed out however that the brain is also easily deceived, in different 
ways depending (among other things) on the absence or presence of visual 
support.  Most of the sounds in a film, for example, are not produced by 
what you see on the screen!

Richard Dobson

Reply by Esteban Gaviz ●November 26, 20082008-11-26

On Nov 26, 7:01&#4294967295;pm, roschler <robert.osch...@gmail.com> wrote:
> On Nov 26, 4:58&#4294967295;am, Esteban Gaviz <esteban.supr...@gmail.com> wrote:
>
>
>
> > On Nov 22, 12:54&#4294967295;pm, roschler <robert.osch...@gmail.com> wrote:
>
> > > Can anyone recommend a good starting point for creating code that does
> > > Sound Identification/Matching? &#4294967295;I was going to start creating a
> > > library of FFT snapshots consisting of varying time window lengths for
> > > the sounds I want to identify in audio streams. &#4294967295;Then I was going to
> > > start matching up sounds that way by comparing snapshots taken from
> > > incoming audio seeing how close they are to the library, but &#4294967295;then I
> > > thought I'd better ask to see if there any known techniques, papers,
> > > etc. that would help me avoid reinventing the wheel. &#4294967295;Anything with
> > > source code examples would be appreciated.
>
> > > I know there's a lot of stuff out there, but I'm not technical enough
> > > to quickly sift out the best technique or to understand the possible
> > > caveats of using one technique over another; especially if the answer
> > > is given in high level math form. &#4294967295;That's why I'm asking for some
> > > tips.
>
> > > Thanks,
> > > Robert
>
> > Hi, I couldn't be bothered to read every post so if what I have to add
> > has already come forward, you may feel free to ignore it. I've written
> > software which does exactly what you are referring to and it was
> > divided into two main parts: a dsp part and a database part. It works
> > like this: for any length of audio i want to create a description of,
> > the dsp part analyzes the incoming audio, creating overlapping
> > snapshots of fft data. These snapshots are then added together and
> > this summing helps to multiply the most distinctive frequencies in the
> > signal. The next step was storing the data. The global snapshot was
> > now processed into frequency buckets and a float number constructed
> > for each of the buckets, it's value based on the amplitude of their
> > frequencies. Each bucket's number now denotes a fictional length. From
> > the length of the all the buckets, multidimensional vectors are
> > constructed. Vectors would be a good fit for your project since they
> > work on a nearest neighbour approach. This basically means that the
> > vector constructed by one recording of a church bell doesn't need to
> > be exactly the same for it to match to another recording of a church
> > bell, i.e. the difference between the sounds is the distance between
> > the vectors describing them. I hope some of this makes sense.
>
> Esteban,
>
> Yes it does and thank you for your suggestions. &#4294967295;What classification
> method did you use to compare vectors? &#4294967295;I thought about using PCA
> (Principal Components Analysis) on the spectral profiles I created for
> each sound, and then comparing new profiles to these Eigenvectors.
> Also, how did the FFT profile created by using your overlap/add method
> compare to that of one created by performing an FFT over a time window
> that encompassed the entire sound?
>
> Thanks,
> Robert

> What classification method did you use to compare vectors?
For development I used simple euclidean distance but there are a
number of high dimensional database approaches that can match the
vectors more effectively. Since I am, however, not really following
the latest developments in the field and the different techniques are
tailored for different things (f.i. speed vs. size of data set vs.
number of false positives) I'll leave googling them to you.

> how did the FFT profile created by using your overlap/add method compare to that of one created by performing an FFT over a time window that encompassed the entire sound?
The more simple the input, the more similar the FFT you describe would
be. The more complex, the more dissimilar. What I wanted to do was to
describe complex polyphonic signals and the overlapping served to draw
forward the most relevant frequencies, since these contained all the
information I wanted. Keep in mind that these overlapping slices are
tiny.

PCA did not apply since my frequency buckets were bound to certain
values based on the logarithmic scale of musical notes and I presume
PCA would've skewed that scale. Might work though.

Reply by roschler ●November 27, 20082008-11-27

On Nov 26, 2:34&#4294967295;pm, Richard Dobson <richarddob...@blueyonder.co.uk>
wrote:

>
> NB: those are two distinct jobs. You may be able to match one sound with
> another, without being able to identify either of them.
>
> Keywords for searching: spectral modelling; audio content analysis
>
> I finally tracked down a source for the reference I think may be a good
> starting point, the key subject being MPEG-7 sound descriptors:
>
> "A proposal for the description of audio in the context of MPEG-7"
>
> http://en.scientificcommons.org/541620
>
> (download via the citeseer link)
>
> And the full MPEG-7 Reference Software (mostly C++) is available on a
> DVD with the moderately expensive book:
>
> "Introduction to MPEG-7: Multimedia Content Description Interface"
>
> http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0471486787.html
>
> I downloaded it all only a year ago for my archives, so the software
> package ~may~ still be available online somewhere. Whether it counts as
> a "good starting point" for creating code I &#4294967295;leave to you to decide.
>
> Someone wrote earlier that "the brain can do it easily". It has to be
> pointed out however that the brain is also easily deceived, in different
> ways depending (among other things) on the absence or presence of visual
> support. &#4294967295;Most of the sounds in a film, for example, are not produced by
> what you see on the screen!
>
> Richard Dobson

Thanks Richard.  I'm going to check those references out now.

Robert

Reply by roschler ●November 27, 20082008-11-27

> > On Nov 26, 4:58&#4294967295;am, Esteban Gaviz <esteban.supr...@gmail.com> wrote:
>
> For development I used simple euclidean distance but there are a
> number of high dimensional database approaches that can match the
> vectors more effectively. Since I am, however, not really following
> the latest developments in the field and the different techniques are
> tailored for different things (f.i. speed vs. size of data set vs.
> number of false positives) I'll leave googling them to you.
>
> > how did the FFT profile created by using your overlap/add method compare to that of one created by performing an FFT over a time window that encompassed the entire sound?
>
> The more simple the input, the more similar the FFT you describe would
> be. The more complex, the more dissimilar. What I wanted to do was to
> describe complex polyphonic signals and the overlapping served to draw
> forward the most relevant frequencies, since these contained all the
> information I wanted. Keep in mind that these overlapping slices are
> tiny.
>
> PCA did not apply since my frequency buckets were bound to certain
> values based on the logarithmic scale of musical notes and I presume
> PCA would've skewed that scale. Might work though.

Thanks Esteban.  Definite food for thought.

Robert.

Previous 1 23Next

Sound Identification/Matching - good starting point?

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group