comp.dsp | Sound Identification/Matching - good starting point?| page 2

Reply by SteveSmith ●November 22, 20082008-11-22

Rune: See-- you are capable of writing a good argument that deserves the
respect of your peers.  I give it an 8 out of 10.   I&rsquo;d rate it higher,
but it is based on a faulty premise: Why do you assume that I don&rsquo;t work
in this field and that my career isn&rsquo;t in the balance?


Jerry:   Thanks for the comments; you must have been following the other
thread.   I think it&rsquo;s important to recognize why the LaPlace and Fourier
Transforms are so successful: they are based on complex exponentials, which
are the solution to differential equations.  This makes them exquisite
tools for handling the broad list of physical systems that follow
differential equations, from electromagnetism to heat flow to planetary
movement.  

But what about physical systems that don&rsquo;t follow relatively simple
differential equations, such as the heart beat and stock market prices? 
It&rsquo;s not surprising that Fourier techniques work somewhere between poor
and not at all.  Other basis functions?  As with you and Rune, I have found
they typically don't work well; there is no general reason that they should
be useful in these systems.   Since the brain can easily handle these types
of problems, and our current technology cannot handle them hardly at all,
we know that there is an enormous body of algorithms that are waiting to be
discovered.  Perhaps the study of basis functions will lead us to that
knowledge, or maybe it won&rsquo;t. I don&rsquo;t have any issue with the posts
that describe how difficult this problem is or the lack of success in the
past.  My beef is newcomers being told to give up before they start,
because the older and wiser already know it all.        
Regards,
Steve

Reply by roschler ●November 22, 20082008-11-22

On Nov 22, 3:31&#4294967295;pm, "Fred Marshall" <fmarshallx@remove_the_x.acm.org>
wrote:

> Robert,
>
> Well, I have some experience with this - although in a different application
> area. &#4294967295;Each application area will have its challenges and likely some things
> that are easier to deal with as compared to other applications.
>
<snip>

Hello Fred (and others),

Thanks for responding to my post.  I hope my <snip> was proper forum
etiquette.  If not let me know.

I know sound identification is a big field, but I have learned (the
hard way)  that frequently there are established techniques for
certain complex pattern recognition tasks and I often don't know what
they are.  For example, I recently added DTMF detection to my free
software package for robot owners.  I originally was going to use
FFT's for my analysis work,  but fortunately during my digging I came
across the Goertzel algorithm.  This led to a big time savings both in
development and processing time.

I am not doing speech recognition.  I already have that feature thanks
to the open source (BSD license) CMU Sphinx 3.5 engine and it works
fine.  Instead, I want to identify certain characteristic sounds
around the home; especially pet sounds.  So I'm not trying to identify
the particular sound profile of an enemy submarine (however I did see
an interesting paper on using Fuzzy Sets with Neural Nets for that
purpose), but hopefully the simpler task of telling Dog, from Cat,
from Washing Machine, etc.  In other words, sounds that I would expect
to have fairly gross differences in even their global FFT profiles (an
FFT taken across the entire sound window, from start of sound to end
of sound), and even more drastic differences when doing the kind of
frame analysis that speech engines do where the transition
probabilities between tiny FFT window slices are pre-calculated and
used during the analysis.

A home can be a difficult sound analysis environment, especially when
a loud stereo or TV is on, but if I could be successful in
environments when absent of those complex polyphonic sound sources I
would be happy.  I think it should be doable to identify the set of
sounds I mentoned above, especially since they are short sounds and
not like the Gettysburg address or the like.

Final note.  This is not a commercial project.  As I said, I do all
this crazy stuff for the robot control software I give to people
freely.

Thanks,
Robert

Reply by Jerry Avins ●November 23, 20082008-11-23

SteveSmith wrote:
> Rune: See-- you are capable of writing a good argument that deserves the
> respect of your peers.  I give it an 8 out of 10.   I&rsquo;d rate it higher,
> but it is based on a faulty premise: Why do you assume that I don&rsquo;t work
> in this field and that my career isn&rsquo;t in the balance?
> 
> 
> Jerry:   Thanks for the comments; you must have been following the other
> thread.   I think it&rsquo;s important to recognize why the LaPlace and Fourier
> Transforms are so successful: they are based on complex exponentials, which
> are the solution to differential equations.  This makes them exquisite
> tools for handling the broad list of physical systems that follow
> differential equations, from electromagnetism to heat flow to planetary
> movement.  
> 
> But what about physical systems that don&rsquo;t follow relatively simple
> differential equations, such as the heart beat and stock market prices? 
> It&rsquo;s not surprising that Fourier techniques work somewhere between poor
> and not at all.  Other basis functions?  As with you and Rune, I have found
> they typically don't work well; there is no general reason that they should
> be useful in these systems.   Since the brain can easily handle these types
> of problems, and our current technology cannot handle them hardly at all,
> we know that there is an enormous body of algorithms that are waiting to be
> discovered.  Perhaps the study of basis functions will lead us to that
> knowledge, or maybe it won&rsquo;t. I don&rsquo;t have any issue with the posts
> that describe how difficult this problem is or the lack of success in the
> past.  My beef is newcomers being told to give up before they start,
> because the older and wiser already know it all.     

New paradigms always excite practitioners, but often prove disappointing 
at the end. One example is the vector cardiogram, which generated a stir 
around 1960. Instead of plotting two leads against time, a practice that 
went back to string galvanometer instruments, each was plotted against 
the other as ordinate and abscissa. It was claimed that more detail 
could be seen in the resulting plot even though (or especially because) 
the trace only approximately closed. By father-in-law, who was chief of 
cardiology at a major hospital, was skeptical but open minded, and asked 
if I might shed some light on the issue. I showed him how a "vector" 
plot without time ticks actually hid some information, and how to 
construct a vector plot from the usual time-base display. Despite its 
revealing to an inexperienced eye detail that might be missed in a 
time-base plot, vector cardiograms slid from hype to relative obscurity.

Jerry
-- 
Engineering is the art of making what you want from things you can get.
&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;

Reply by Rune Allnor ●November 23, 20082008-11-23

On 23 Nov, 00:58, "SteveSmith" <Steve.Smi...@SpectrumSDI.com> wrote:
> Rune: See-- you are capable of writing a good argument that deserves the
> respect of your peers.  I give it an 8 out of 10.   I&#4294967295;d rate it higher,
> but it is based on a faulty premise: Why do you assume that I don&#4294967295;t work
> in this field and that my career isn&#4294967295;t in the balance?

Because your arguments don't make sense and you would never
present them as you do if you had the slightest experience
with 'physical' data analysis. Or if anything of even the
slightest value to yourself was at stake.

Let me guess - if you are at all affiliated with DSP, you
are in a managerial, sales type position. You have no
day-to-day experience with the nuts'n bolts of DSP, but you
have some previous exposure, presumably DSP was a peripherical
subject to something you did for your degree (PhD?) in
university?

I wouldn't be the least surprised if you sell these insane
projects and then complain about "it's impossible to get
skilled staff" when (not if) they crash and burn.

....
>   Since the brain can easily handle these types
> of problems, and our current technology cannot handle them hardly at all,

On that we agree.

> we know that there is an enormous body of algorithms that are waiting to be
> discovered.

Well, I think the limitation is a bit more fundamental
than mere algorithms. Homo Sapiens is the only species
to have developed a language in the billion or more
years life has existed on this planet. I believe the
ability to process language and identify and recognize
objects in the real world go hand in hand.

Even 'higher' species work mainly on stimulus and
response. People are taught how to behave around
predators to aviod setting off the kill reflexes.
Predators don't plan. They don't reason. They act
in set patterns based on familiar stimuli. If you
see a hunt, you will generally know what to expect
based on the species of predator: Some hunt in packs,
other are solitary. Some ambush, others charge
head-on.

If you want to replicate Homo Sapiens' analysis
skills you will need to replicate Homo Sapiens'
brain. Since these abilities have evolved exactly
once in the history of the Earth, I believe one
would be wise to approach the task with some
humility.

> Perhaps the study of basis functions will lead us to that
> knowledge, or maybe it won&#4294967295;t.

The basis functions won't. Studying basis functions is
a trivial concept from both the mathematical and the
technical point of view. Take an intro class on Real
Analysis.

That is, if taking an itro class isn't benetah the
dignity of your degree...

> I don&#4294967295;t have any issue with the posts
> that describe how difficult this problem is or the lack of success in the
> past.  My beef is newcomers being told to give up before they start,
> because the older and wiser already know it all.

Again, this is the ramblings of the fool with no relevant
experience. The reason why kids are able survive in this
world is that the 'older and wiser' show them what works
and what doesn't.

Do you teach your kids what works in the world, or do
you leave them to find out everything for themselves?
Give them enough 'leeway' and you might come under
scrutiny for your fitedness to keep responibility for
them. Screw up enough and you will loose custody.

Not quite that dramatic between tutors and students,
but it's the same general mechanisms at work.

DSP has a very definite scope of use. The techniques that
work reasonably well within this scope don't work nearly
as well (or at all) outside. Again, this is something the
practitioner with even the slightest hint of experience
knows.

If you want to solve new problems, you need to know what
the tools at your disposal can and can not do. Detecting
faint sounds in the sea has been a problem for decades,
and people have come up with the weirdest ideas. One idea
for *passive* sonar included dropping 100-200 explosive
charges of 0.5-1 kg TNT as part of the system. Madness.
I don't know what was most insane; that the idea was
proposed at all or that it wasn't dismissed hands down.

I just sat down and saw what the problem was all about.
By doing that I first of all discovered why the task is
so difficult. And believe me, it is. But I also found a
trivial method to improve the detection threshold by maybe
as much as 10 dB, under certain conditions. Nowhere near
a perfect solution, but the conditions I discovered were
general enough and my method robust enough for my method
to make a serious impact in the real world (people went
ballistic if they smelled a 0.5 dB improvment).

But the community was dominated by fools who wanted
their names to be listed among the likes of Einstein
and Newton, people who for years had made a point of not
wasting their precious time with 'easy' or 'standard' stuff;
let a lone 'mere technicalities.' Hence, they didn't know
nowhere near enough basic DSP (or even practical data
analysis) to know when the improvement when it was in
their face.

Rune

Reply by Richard Owlett ●November 23, 20082008-11-23

SteveSmith wrote:
> My beef is newcomers being told to give up before they start,
> because the older and wiser already know it all.        
> Regards,
> Steve      

*Hallelujah*
I've been a target of that in a certain language forum.
[incipient rant deleted ;]

Reply by Richard Owlett ●November 23, 20082008-11-23

Fred Marshall wrote:

> 
> Have you considered something like Dragon Speaking?  Voice to text already 
> works.  

Depends on what you mean by "works" ;/
Market forces have driven the field in the direction of "real time" 
"continuous speech" "large vocabulary".

I have a need for "small/medium vocabulary" "discrete speech" "inexpensive".

My application is inherently discrete speech, but current models use 
"timing" information as part of their *language* model. They have mixed 
together phonemic, lexical and contextual/linguistic(not quite right 
word) analysis. I've had experienced Dragon resellers tell me don't 
bother with current generation as my absolute need for "discrete speech" 
destroys the recognition rate. Older versions (if they were available) 
might be satisfactory for my particular application.

CF electric motorcars. Great grand ma could have one. BUT market 
*DEMANDED* (speed+range). Maybe our grandchildren will have them.

Reply by Richard Owlett ●November 23, 20082008-11-23

roschler wrote:

> On Nov 22, 3:31 pm, "Fred Marshall" <fmarshallx@remove_the_x.acm.org>
> wrote:
> 
> 
>>Robert,
>>
>>Well, I have some experience with this - although in a different application
>>area.  Each application area will have its challenges and likely some things
>>that are easier to deal with as compared to other applications.
>>
> 
> <snip>
> 
> Hello Fred (and others),
> 
> Thanks for responding to my post.  I hope my <snip> was proper forum
> etiquette.  If not let me know.
> 

Not to worry :)

As to etiquette: At how do you address your customers, boss, coworkers, 
family, social peers?

As to topic: Every thing from skull structure of woodpeckers to 
historical linguistics. They start near DSP and end near epistemology.

Reply by HardySpicer ●November 23, 20082008-11-23

On Nov 23, 9:02&#4294967295;pm, Rune Allnor <all...@tele.ntnu.no> wrote:
> On 23 Nov, 00:58, "SteveSmith" <Steve.Smi...@SpectrumSDI.com> wrote:
>
> > Rune: See-- you are capable of writing a good argument that deserves the
> > respect of your peers. &#4294967295;I give it an 8 out of 10. &#4294967295; I&#4294967295;d rate it higher,
> > but it is based on a faulty premise: Why do you assume that I don&#4294967295;t work
> > in this field and that my career isn&#4294967295;t in the balance?
>
> Because your arguments don't make sense and you would never
> present them as you do if you had the slightest experience
> with 'physical' data analysis. Or if anything of even the
> slightest value to yourself was at stake.
>
> Let me guess - if you are at all affiliated with DSP, you
> are in a managerial, sales type position. You have no
> day-to-day experience with the nuts'n bolts of DSP, but you
> have some previous exposure, presumably DSP was a peripherical
> subject to something you did for your degree (PhD?) in
> university?
>
> I wouldn't be the least surprised if you sell these insane
> projects and then complain about "it's impossible to get
> skilled staff" when (not if) they crash and burn.
>
> ....
>
> > &#4294967295; Since the brain can easily handle these types
> > of problems, and our current technology cannot handle them hardly at all,
>
> On that we agree.
>
> > we know that there is an enormous body of algorithms that are waiting to be
> > discovered.
>
> Well, I think the limitation is a bit more fundamental
> than mere algorithms. Homo Sapiens is the only species
> to have developed a language in the billion or more
> years life has existed on this planet. I believe the
> ability to process language and identify and recognize
> objects in the real world go hand in hand.
>
> Even 'higher' species work mainly on stimulus and
> response. People are taught how to behave around
> predators to aviod setting off the kill reflexes.
> Predators don't plan. They don't reason. They act
> in set patterns based on familiar stimuli. If you
> see a hunt, you will generally know what to expect
> based on the species of predator: Some hunt in packs,
> other are solitary. Some ambush, others charge
> head-on.
>
> If you want to replicate Homo Sapiens' analysis
> skills you will need to replicate Homo Sapiens'
> brain. Since these abilities have evolved exactly
> once in the history of the Earth, I believe one
> would be wise to approach the task with some
> humility.
>
> > Perhaps the study of basis functions will lead us to that
> > knowledge, or maybe it won&#4294967295;t.
>
> The basis functions won't. Studying basis functions is
> a trivial concept from both the mathematical and the
> technical point of view. Take an intro class on Real
> Analysis.
>
> That is, if taking an itro class isn't benetah the
> dignity of your degree...
>
> > I don&#4294967295;t have any issue with the posts
> > that describe how difficult this problem is or the lack of success in the
> > past. &#4294967295;My beef is newcomers being told to give up before they start,
> > because the older and wiser already know it all.
>
> Again, this is the ramblings of the fool with no relevant
> experience. The reason why kids are able survive in this
> world is that the 'older and wiser' show them what works
> and what doesn't.
>
> Do you teach your kids what works in the world, or do
> you leave them to find out everything for themselves?
> Give them enough 'leeway' and you might come under
> scrutiny for your fitedness to keep responibility for
> them. Screw up enough and you will loose custody.
>
> Not quite that dramatic between tutors and students,
> but it's the same general mechanisms at work.
>
> DSP has a very definite scope of use. The techniques that
> work reasonably well within this scope don't work nearly
> as well (or at all) outside. Again, this is something the
> practitioner with even the slightest hint of experience
> knows.
>
> If you want to solve new problems, you need to know what
> the tools at your disposal can and can not do. Detecting
> faint sounds in the sea has been a problem for decades,
> and people have come up with the weirdest ideas. One idea
> for *passive* sonar included dropping 100-200 explosive
> charges of 0.5-1 kg TNT as part of the system. Madness.
> I don't know what was most insane; that the idea was
> proposed at all or that it wasn't dismissed hands down.
>
> I just sat down and saw what the problem was all about.
> By doing that I first of all discovered why the task is
> so difficult. And believe me, it is. But I also found a
> trivial method to improve the detection threshold by maybe
> as much as 10 dB, under certain conditions. Nowhere near
> a perfect solution, but the conditions I discovered were
> general enough and my method robust enough for my method
> to make a serious impact in the real world (people went
> ballistic if they smelled a 0.5 dB improvment).
>
> But the community was dominated by fools who wanted
> their names to be listed among the likes of Einstein
> and Newton, people who for years had made a point of not
> wasting their precious time with 'easy' or 'standard' stuff;
> let a lone 'mere technicalities.' Hence, they didn't know
> nowhere near enough basic DSP (or even practical data
> analysis) to know when the improvement when it was in
> their face.
>
> Rune

I cannot agree that because humans can do it then it would take a
machine eons to do likewise. Speech recognition is a hard task and is
just about there
now. Some engines are very accurate - say 99% al be it in an
environment with a high SNR. That has taken about 50 years.


Hardy

Reply by Fred Marshall ●November 23, 20082008-11-23

roschler wrote:
> On Nov 22, 3:31 pm, "Fred Marshall" <fmarshallx@remove_the_x.acm.org>
> wrote:
>
>> Robert,
>>
>> Well, I have some experience with this - although in a different
>> application area. Each application area will have its challenges and
>> likely some things that are easier to deal with as compared to other
>> applications.
>>
> <snip>
>
> Hello Fred (and others),
>
> Thanks for responding to my post.  I hope my <snip> was proper forum
> etiquette.  If not let me know.
>
> I know sound identification is a big field, but I have learned (the
> hard way)  that frequently there are established techniques for
> certain complex pattern recognition tasks and I often don't know what
> they are.  For example, I recently added DTMF detection to my free
> software package for robot owners.  I originally was going to use
> FFT's for my analysis work,  but fortunately during my digging I came
> across the Goertzel algorithm.  This led to a big time savings both in
> development and processing time.
>
> I am not doing speech recognition.  I already have that feature thanks
> to the open source (BSD license) CMU Sphinx 3.5 engine and it works
> fine.  Instead, I want to identify certain characteristic sounds
> around the home; especially pet sounds.  So I'm not trying to identify
> the particular sound profile of an enemy submarine (however I did see
> an interesting paper on using Fuzzy Sets with Neural Nets for that
> purpose), but hopefully the simpler task of telling Dog, from Cat,
> from Washing Machine, etc.  In other words, sounds that I would expect
> to have fairly gross differences in even their global FFT profiles (an
> FFT taken across the entire sound window, from start of sound to end
> of sound), and even more drastic differences when doing the kind of
> frame analysis that speech engines do where the transition
> probabilities between tiny FFT window slices are pre-calculated and
> used during the analysis.
>
> A home can be a difficult sound analysis environment, especially when
> a loud stereo or TV is on, but if I could be successful in
> environments when absent of those complex polyphonic sound sources I
> would be happy.  I think it should be doable to identify the set of
> sounds I mentoned above, especially since they are short sounds and
> not like the Gettysburg address or the like.
>
> Final note.  This is not a commercial project.  As I said, I do all
> this crazy stuff for the robot control software I give to people
> freely.
>
> Thanks,
> Robert

Well, I don't want to complicate things but you might consider doing pattern 
recognition on a 2-D "image" of spectral density vs. time with a particular 
set of temporal characteristics.  That would bring in image processing 
techniques but is somewhat the idea in identifying sounds - to look at the 
time variation of the spectral density as a pattern.

When you add the complexity of the "uninteresting" TV then I can imagine it 
tuned to "The Dog Whisperer" while you're trying to tell if your own dog is 
barking!!  Anyway, that suggests perhaps a noise canceller using the TV 
output as a pre-processing step.  I'm not sure how hard or easy that would 
be as you'd likely have to delay the classification stream to accomodate 
dealing with the rapidly varying TV output.  I don't know if that's been 
done.  It might look like this:

microphone >> delay           >> summing point
TV         >> adaptive filter >>      ^

The delay is necessary so that the delay of the adaptive filter doesn't 
misalign the TV signal at the summing point.  You need the cancellation to 
be aligned I do believe.

It's a lot less complicated without this...... and it's still complicated!!

Fred

Reply by Rune Allnor ●November 23, 20082008-11-23

On 23 Nov, 19:06, HardySpicer <gyansor...@gmail.com> wrote:

> I cannot agree that because humans can do it then it would take a
> machine eons to do likewise. Speech recognition is a hard task and is
> just about there
> now. Some engines are very accurate - say 99% al be it in an
> environment with a high SNR. That has taken about 50 years.

The problem in speech processing is rather well-conditioned
compared to most other applications:

- The signal is constrained (e.g. a phone line), so one
  knows the source is human
- The characteristics of human speech can be identified
  from experiments
- Such experiments can be performed in an ideal environment
  (anechoic chamber)

and *still* it's a non-trivial exercise.

Steve said that "it's easy for the human brain" to do
these identifications; I'm thinking it takes a human
brain to achieve it.

Rune