DSPRelated.com
Forums

Sound Identification/Matching - good starting point?

Started by roschler November 22, 2008
Rune: See-- you are capable of writing a good argument that deserves the
respect of your peers.  I give it an 8 out of 10.   I’d rate it higher,
but it is based on a faulty premise: Why do you assume that I don’t work
in this field and that my career isn’t in the balance?


Jerry:   Thanks for the comments; you must have been following the other
thread.   I think it’s important to recognize why the LaPlace and Fourier
Transforms are so successful: they are based on complex exponentials, which
are the solution to differential equations.  This makes them exquisite
tools for handling the broad list of physical systems that follow
differential equations, from electromagnetism to heat flow to planetary
movement.  

But what about physical systems that don’t follow relatively simple
differential equations, such as the heart beat and stock market prices? 
It’s not surprising that Fourier techniques work somewhere between poor
and not at all.  Other basis functions?  As with you and Rune, I have found
they typically don't work well; there is no general reason that they should
be useful in these systems.   Since the brain can easily handle these types
of problems, and our current technology cannot handle them hardly at all,
we know that there is an enormous body of algorithms that are waiting to be
discovered.  Perhaps the study of basis functions will lead us to that
knowledge, or maybe it won’t. I don’t have any issue with the posts
that describe how difficult this problem is or the lack of success in the
past.  My beef is newcomers being told to give up before they start,
because the older and wiser already know it all.        
Regards,
Steve      
On Nov 22, 3:31&#4294967295;pm, "Fred Marshall" <fmarshallx@remove_the_x.acm.org>
wrote:

> Robert, > > Well, I have some experience with this - although in a different application > area. &#4294967295;Each application area will have its challenges and likely some things > that are easier to deal with as compared to other applications. >
<snip> Hello Fred (and others), Thanks for responding to my post. I hope my <snip> was proper forum etiquette. If not let me know. I know sound identification is a big field, but I have learned (the hard way) that frequently there are established techniques for certain complex pattern recognition tasks and I often don't know what they are. For example, I recently added DTMF detection to my free software package for robot owners. I originally was going to use FFT's for my analysis work, but fortunately during my digging I came across the Goertzel algorithm. This led to a big time savings both in development and processing time. I am not doing speech recognition. I already have that feature thanks to the open source (BSD license) CMU Sphinx 3.5 engine and it works fine. Instead, I want to identify certain characteristic sounds around the home; especially pet sounds. So I'm not trying to identify the particular sound profile of an enemy submarine (however I did see an interesting paper on using Fuzzy Sets with Neural Nets for that purpose), but hopefully the simpler task of telling Dog, from Cat, from Washing Machine, etc. In other words, sounds that I would expect to have fairly gross differences in even their global FFT profiles (an FFT taken across the entire sound window, from start of sound to end of sound), and even more drastic differences when doing the kind of frame analysis that speech engines do where the transition probabilities between tiny FFT window slices are pre-calculated and used during the analysis. A home can be a difficult sound analysis environment, especially when a loud stereo or TV is on, but if I could be successful in environments when absent of those complex polyphonic sound sources I would be happy. I think it should be doable to identify the set of sounds I mentoned above, especially since they are short sounds and not like the Gettysburg address or the like. Final note. This is not a commercial project. As I said, I do all this crazy stuff for the robot control software I give to people freely. Thanks, Robert
SteveSmith wrote:
> Rune: See-- you are capable of writing a good argument that deserves the > respect of your peers. I give it an 8 out of 10. I&rsquo;d rate it higher, > but it is based on a faulty premise: Why do you assume that I don&rsquo;t work > in this field and that my career isn&rsquo;t in the balance? > > > Jerry: Thanks for the comments; you must have been following the other > thread. I think it&rsquo;s important to recognize why the LaPlace and Fourier > Transforms are so successful: they are based on complex exponentials, which > are the solution to differential equations. This makes them exquisite > tools for handling the broad list of physical systems that follow > differential equations, from electromagnetism to heat flow to planetary > movement. > > But what about physical systems that don&rsquo;t follow relatively simple > differential equations, such as the heart beat and stock market prices? > It&rsquo;s not surprising that Fourier techniques work somewhere between poor > and not at all. Other basis functions? As with you and Rune, I have found > they typically don't work well; there is no general reason that they should > be useful in these systems. Since the brain can easily handle these types > of problems, and our current technology cannot handle them hardly at all, > we know that there is an enormous body of algorithms that are waiting to be > discovered. Perhaps the study of basis functions will lead us to that > knowledge, or maybe it won&rsquo;t. I don&rsquo;t have any issue with the posts > that describe how difficult this problem is or the lack of success in the > past. My beef is newcomers being told to give up before they start, > because the older and wiser already know it all.
New paradigms always excite practitioners, but often prove disappointing at the end. One example is the vector cardiogram, which generated a stir around 1960. Instead of plotting two leads against time, a practice that went back to string galvanometer instruments, each was plotted against the other as ordinate and abscissa. It was claimed that more detail could be seen in the resulting plot even though (or especially because) the trace only approximately closed. By father-in-law, who was chief of cardiology at a major hospital, was skeptical but open minded, and asked if I might shed some light on the issue. I showed him how a "vector" plot without time ticks actually hid some information, and how to construct a vector plot from the usual time-base display. Despite its revealing to an inexperienced eye detail that might be missed in a time-base plot, vector cardiograms slid from hype to relative obscurity. Jerry -- Engineering is the art of making what you want from things you can get. &macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;
On 23 Nov, 00:58, "SteveSmith" <Steve.Smi...@SpectrumSDI.com> wrote:
> Rune: See-- you are capable of writing a good argument that deserves the > respect of your peers. I give it an 8 out of 10. I&#4294967295;d rate it higher, > but it is based on a faulty premise: Why do you assume that I don&#4294967295;t work > in this field and that my career isn&#4294967295;t in the balance?
Because your arguments don't make sense and you would never present them as you do if you had the slightest experience with 'physical' data analysis. Or if anything of even the slightest value to yourself was at stake. Let me guess - if you are at all affiliated with DSP, you are in a managerial, sales type position. You have no day-to-day experience with the nuts'n bolts of DSP, but you have some previous exposure, presumably DSP was a peripherical subject to something you did for your degree (PhD?) in university? I wouldn't be the least surprised if you sell these insane projects and then complain about "it's impossible to get skilled staff" when (not if) they crash and burn. ....
> Since the brain can easily handle these types > of problems, and our current technology cannot handle them hardly at all,
On that we agree.
> we know that there is an enormous body of algorithms that are waiting to be > discovered.
Well, I think the limitation is a bit more fundamental than mere algorithms. Homo Sapiens is the only species to have developed a language in the billion or more years life has existed on this planet. I believe the ability to process language and identify and recognize objects in the real world go hand in hand. Even 'higher' species work mainly on stimulus and response. People are taught how to behave around predators to aviod setting off the kill reflexes. Predators don't plan. They don't reason. They act in set patterns based on familiar stimuli. If you see a hunt, you will generally know what to expect based on the species of predator: Some hunt in packs, other are solitary. Some ambush, others charge head-on. If you want to replicate Homo Sapiens' analysis skills you will need to replicate Homo Sapiens' brain. Since these abilities have evolved exactly once in the history of the Earth, I believe one would be wise to approach the task with some humility.
> Perhaps the study of basis functions will lead us to that > knowledge, or maybe it won&#4294967295;t.
The basis functions won't. Studying basis functions is a trivial concept from both the mathematical and the technical point of view. Take an intro class on Real Analysis. That is, if taking an itro class isn't benetah the dignity of your degree...
> I don&#4294967295;t have any issue with the posts > that describe how difficult this problem is or the lack of success in the > past. My beef is newcomers being told to give up before they start, > because the older and wiser already know it all.
Again, this is the ramblings of the fool with no relevant experience. The reason why kids are able survive in this world is that the 'older and wiser' show them what works and what doesn't. Do you teach your kids what works in the world, or do you leave them to find out everything for themselves? Give them enough 'leeway' and you might come under scrutiny for your fitedness to keep responibility for them. Screw up enough and you will loose custody. Not quite that dramatic between tutors and students, but it's the same general mechanisms at work. DSP has a very definite scope of use. The techniques that work reasonably well within this scope don't work nearly as well (or at all) outside. Again, this is something the practitioner with even the slightest hint of experience knows. If you want to solve new problems, you need to know what the tools at your disposal can and can not do. Detecting faint sounds in the sea has been a problem for decades, and people have come up with the weirdest ideas. One idea for *passive* sonar included dropping 100-200 explosive charges of 0.5-1 kg TNT as part of the system. Madness. I don't know what was most insane; that the idea was proposed at all or that it wasn't dismissed hands down. I just sat down and saw what the problem was all about. By doing that I first of all discovered why the task is so difficult. And believe me, it is. But I also found a trivial method to improve the detection threshold by maybe as much as 10 dB, under certain conditions. Nowhere near a perfect solution, but the conditions I discovered were general enough and my method robust enough for my method to make a serious impact in the real world (people went ballistic if they smelled a 0.5 dB improvment). But the community was dominated by fools who wanted their names to be listed among the likes of Einstein and Newton, people who for years had made a point of not wasting their precious time with 'easy' or 'standard' stuff; let a lone 'mere technicalities.' Hence, they didn't know nowhere near enough basic DSP (or even practical data analysis) to know when the improvement when it was in their face. Rune
SteveSmith wrote:
> My beef is newcomers being told to give up before they start, > because the older and wiser already know it all. > Regards, > Steve
*Hallelujah* I've been a target of that in a certain language forum. [incipient rant deleted ;]
Fred Marshall wrote:

> > Have you considered something like Dragon Speaking? Voice to text already > works.
Depends on what you mean by "works" ;/ Market forces have driven the field in the direction of "real time" "continuous speech" "large vocabulary". I have a need for "small/medium vocabulary" "discrete speech" "inexpensive". My application is inherently discrete speech, but current models use "timing" information as part of their *language* model. They have mixed together phonemic, lexical and contextual/linguistic(not quite right word) analysis. I've had experienced Dragon resellers tell me don't bother with current generation as my absolute need for "discrete speech" destroys the recognition rate. Older versions (if they were available) might be satisfactory for my particular application. CF electric motorcars. Great grand ma could have one. BUT market *DEMANDED* (speed+range). Maybe our grandchildren will have them.
roschler wrote:

> On Nov 22, 3:31 pm, "Fred Marshall" <fmarshallx@remove_the_x.acm.org> > wrote: > > >>Robert, >> >>Well, I have some experience with this - although in a different application >>area. Each application area will have its challenges and likely some things >>that are easier to deal with as compared to other applications. >> > > <snip> > > Hello Fred (and others), > > Thanks for responding to my post. I hope my <snip> was proper forum > etiquette. If not let me know. >
Not to worry :) As to etiquette: At how do you address your customers, boss, coworkers, family, social peers? As to topic: Every thing from skull structure of woodpeckers to historical linguistics. They start near DSP and end near epistemology.
On Nov 23, 9:02&#4294967295;pm, Rune Allnor <all...@tele.ntnu.no> wrote:
> On 23 Nov, 00:58, "SteveSmith" <Steve.Smi...@SpectrumSDI.com> wrote: > > > Rune: See-- you are capable of writing a good argument that deserves the > > respect of your peers. &#4294967295;I give it an 8 out of 10. &#4294967295; I&#4294967295;d rate it higher, > > but it is based on a faulty premise: Why do you assume that I don&#4294967295;t work > > in this field and that my career isn&#4294967295;t in the balance? > > Because your arguments don't make sense and you would never > present them as you do if you had the slightest experience > with 'physical' data analysis. Or if anything of even the > slightest value to yourself was at stake. > > Let me guess - if you are at all affiliated with DSP, you > are in a managerial, sales type position. You have no > day-to-day experience with the nuts'n bolts of DSP, but you > have some previous exposure, presumably DSP was a peripherical > subject to something you did for your degree (PhD?) in > university? > > I wouldn't be the least surprised if you sell these insane > projects and then complain about "it's impossible to get > skilled staff" when (not if) they crash and burn. > > .... > > > &#4294967295; Since the brain can easily handle these types > > of problems, and our current technology cannot handle them hardly at all, > > On that we agree. > > > we know that there is an enormous body of algorithms that are waiting to be > > discovered. > > Well, I think the limitation is a bit more fundamental > than mere algorithms. Homo Sapiens is the only species > to have developed a language in the billion or more > years life has existed on this planet. I believe the > ability to process language and identify and recognize > objects in the real world go hand in hand. > > Even 'higher' species work mainly on stimulus and > response. People are taught how to behave around > predators to aviod setting off the kill reflexes. > Predators don't plan. They don't reason. They act > in set patterns based on familiar stimuli. If you > see a hunt, you will generally know what to expect > based on the species of predator: Some hunt in packs, > other are solitary. Some ambush, others charge > head-on. > > If you want to replicate Homo Sapiens' analysis > skills you will need to replicate Homo Sapiens' > brain. Since these abilities have evolved exactly > once in the history of the Earth, I believe one > would be wise to approach the task with some > humility. > > > Perhaps the study of basis functions will lead us to that > > knowledge, or maybe it won&#4294967295;t. > > The basis functions won't. Studying basis functions is > a trivial concept from both the mathematical and the > technical point of view. Take an intro class on Real > Analysis. > > That is, if taking an itro class isn't benetah the > dignity of your degree... > > > I don&#4294967295;t have any issue with the posts > > that describe how difficult this problem is or the lack of success in the > > past. &#4294967295;My beef is newcomers being told to give up before they start, > > because the older and wiser already know it all. > > Again, this is the ramblings of the fool with no relevant > experience. The reason why kids are able survive in this > world is that the 'older and wiser' show them what works > and what doesn't. > > Do you teach your kids what works in the world, or do > you leave them to find out everything for themselves? > Give them enough 'leeway' and you might come under > scrutiny for your fitedness to keep responibility for > them. Screw up enough and you will loose custody. > > Not quite that dramatic between tutors and students, > but it's the same general mechanisms at work. > > DSP has a very definite scope of use. The techniques that > work reasonably well within this scope don't work nearly > as well (or at all) outside. Again, this is something the > practitioner with even the slightest hint of experience > knows. > > If you want to solve new problems, you need to know what > the tools at your disposal can and can not do. Detecting > faint sounds in the sea has been a problem for decades, > and people have come up with the weirdest ideas. One idea > for *passive* sonar included dropping 100-200 explosive > charges of 0.5-1 kg TNT as part of the system. Madness. > I don't know what was most insane; that the idea was > proposed at all or that it wasn't dismissed hands down. > > I just sat down and saw what the problem was all about. > By doing that I first of all discovered why the task is > so difficult. And believe me, it is. But I also found a > trivial method to improve the detection threshold by maybe > as much as 10 dB, under certain conditions. Nowhere near > a perfect solution, but the conditions I discovered were > general enough and my method robust enough for my method > to make a serious impact in the real world (people went > ballistic if they smelled a 0.5 dB improvment). > > But the community was dominated by fools who wanted > their names to be listed among the likes of Einstein > and Newton, people who for years had made a point of not > wasting their precious time with 'easy' or 'standard' stuff; > let a lone 'mere technicalities.' Hence, they didn't know > nowhere near enough basic DSP (or even practical data > analysis) to know when the improvement when it was in > their face. > > Rune
I cannot agree that because humans can do it then it would take a machine eons to do likewise. Speech recognition is a hard task and is just about there now. Some engines are very accurate - say 99% al be it in an environment with a high SNR. That has taken about 50 years. Hardy
roschler wrote:
> On Nov 22, 3:31 pm, "Fred Marshall" <fmarshallx@remove_the_x.acm.org> > wrote: > >> Robert, >> >> Well, I have some experience with this - although in a different >> application area. Each application area will have its challenges and >> likely some things that are easier to deal with as compared to other >> applications. >> > <snip> > > Hello Fred (and others), > > Thanks for responding to my post. I hope my <snip> was proper forum > etiquette. If not let me know. > > I know sound identification is a big field, but I have learned (the > hard way) that frequently there are established techniques for > certain complex pattern recognition tasks and I often don't know what > they are. For example, I recently added DTMF detection to my free > software package for robot owners. I originally was going to use > FFT's for my analysis work, but fortunately during my digging I came > across the Goertzel algorithm. This led to a big time savings both in > development and processing time. > > I am not doing speech recognition. I already have that feature thanks > to the open source (BSD license) CMU Sphinx 3.5 engine and it works > fine. Instead, I want to identify certain characteristic sounds > around the home; especially pet sounds. So I'm not trying to identify > the particular sound profile of an enemy submarine (however I did see > an interesting paper on using Fuzzy Sets with Neural Nets for that > purpose), but hopefully the simpler task of telling Dog, from Cat, > from Washing Machine, etc. In other words, sounds that I would expect > to have fairly gross differences in even their global FFT profiles (an > FFT taken across the entire sound window, from start of sound to end > of sound), and even more drastic differences when doing the kind of > frame analysis that speech engines do where the transition > probabilities between tiny FFT window slices are pre-calculated and > used during the analysis. > > A home can be a difficult sound analysis environment, especially when > a loud stereo or TV is on, but if I could be successful in > environments when absent of those complex polyphonic sound sources I > would be happy. I think it should be doable to identify the set of > sounds I mentoned above, especially since they are short sounds and > not like the Gettysburg address or the like. > > Final note. This is not a commercial project. As I said, I do all > this crazy stuff for the robot control software I give to people > freely. > > Thanks, > Robert
Well, I don't want to complicate things but you might consider doing pattern recognition on a 2-D "image" of spectral density vs. time with a particular set of temporal characteristics. That would bring in image processing techniques but is somewhat the idea in identifying sounds - to look at the time variation of the spectral density as a pattern. When you add the complexity of the "uninteresting" TV then I can imagine it tuned to "The Dog Whisperer" while you're trying to tell if your own dog is barking!! Anyway, that suggests perhaps a noise canceller using the TV output as a pre-processing step. I'm not sure how hard or easy that would be as you'd likely have to delay the classification stream to accomodate dealing with the rapidly varying TV output. I don't know if that's been done. It might look like this: microphone >> delay >> summing point TV >> adaptive filter >> ^ The delay is necessary so that the delay of the adaptive filter doesn't misalign the TV signal at the summing point. You need the cancellation to be aligned I do believe. It's a lot less complicated without this...... and it's still complicated!! Fred
On 23 Nov, 19:06, HardySpicer <gyansor...@gmail.com> wrote:

> I cannot agree that because humans can do it then it would take a > machine eons to do likewise. Speech recognition is a hard task and is > just about there > now. Some engines are very accurate - say 99% al be it in an > environment with a high SNR. That has taken about 50 years.
The problem in speech processing is rather well-conditioned compared to most other applications: - The signal is constrained (e.g. a phone line), so one knows the source is human - The characteristics of human speech can be identified from experiments - Such experiments can be performed in an ideal environment (anechoic chamber) and *still* it's a non-trivial exercise. Steve said that "it's easy for the human brain" to do these identifications; I'm thinking it takes a human brain to achieve it. Rune