DSPRelated.com
Forums

speech recognition

Started by RichD January 19, 2012
On Jan 18, 10:51&#4294967295;pm, RichD <r_delaney2...@yahoo.com> wrote:
> http://www.sfgate.com/cgi-bin/article.cgi?f=/c/a/2012/01/16/BU8C1MOO2... > > He claims he can filter speech from background noise. > > I recall discussing this possibility years ago. &#4294967295;Someone said, these > filters already exist. &#4294967295;They do - they're notch filters! &#4294967295;It's close > to brain dead, believing that constitutes 'voice filtering'. > > Dr. Watts has been working on this for years, so I was wondering > what techniques he's using, how much is public domain. &#4294967295;Anyone > here know anything about the subject, or this &#4294967295;product? > Is it neural nets, DSP filters, or what? > > -- > Rich
The filter in his paper looks like the mel-frequency cepstrum coefficients (MFCC) used in speech recognition, and multiplied by a weighting )weighted MFCC perhaps). I found a paper by Jont Allen on Cochlear modeling (ASSP Mag., Jan 1985). Maybe Watts is combining the two concepts (thus the weighting). Additionally, Auience has a patent (7,076,315) titled "Efficient computation of log-frequency-scale digital filter cascade". MFCC is log-frequency-scale analysis, and Watt's paper refers to a propritary bi-quad IIR filter. Hmmmmm Just a thought. Maurice Givens
RichD wrote:

> http://www.sfgate.com/cgi-bin/article.cgi?f=/c/a/2012/01/16/BU8C1MOO20.DTL > > > He claims he can filter speech from background noise. > > I recall discussing this possibility years ago. Someone said, these > filters already exist. They do - they're notch filters! It's close > to brain dead, believing that constitutes 'voice filtering'. > > Dr. Watts has been working on this for years, so I was wondering > what techniques he's using, how much is public domain. Anyone > here know anything about the subject, or this product? > Is it neural nets, DSP filters, or what? > > > -- > Rich
For what ever reason, I have never been able to get a speech to text working here well enough with my voice to make it useable with out detecting errors in miss use of words or at times, totally incorrect words. But all of them seem to work well with woman voices from what I've seen. I understand the training cycle you need to perform in such tools to build a profile for your voice. The latest Dragon Speech does seem to work well however, it is not so much just being able to correlate with my voice, it seems to have issues deciding what is, as is and what is as CMDS. The technology has come a long ways and I can remember the first one I tried, which was for Windows 3.x and found it to work amazingly well for such things back then. Even the speed response was good however, it seems that as hardware speeds up, the software gets bloated proportionally as they add more things, use newer tools that just puts more un-wanted bloat in your code. Maybe we'll stop adding layers to these tools one day. Jamie
On Jan 19, 5:51&#4294967295;pm, RichD <r_delaney2...@yahoo.com> wrote:
> http://www.sfgate.com/cgi-bin/article.cgi?f=/c/a/2012/01/16/BU8C1MOO2... > > He claims he can filter speech from background noise. > > I recall discussing this possibility years ago. &#4294967295;Someone said, these > filters already exist. &#4294967295;They do - they're notch filters! &#4294967295;It's close > to brain dead, believing that constitutes 'voice filtering'. > > Dr. Watts has been working on this for years, so I was wondering > what techniques he's using, how much is public domain. &#4294967295;Anyone > here know anything about the subject, or this &#4294967295;product? > Is it neural nets, DSP filters, or what? > > -- > Rich
Couldn't say but a notch filter won't work if it's wideband noise with spectral overlap. Things that might work...adaptive beamforming, at a stretch Blind Source Separation (doubt it though). Hardy
On Jan 20, 2:35&#4294967295;am, Phil Hobbs
<pcdhSpamMeSensel...@electrooptical.net> wrote:
> RichD wrote: > > >http://www.sfgate.com/cgi-bin/article.cgi?f=/c/a/2012/01/16/BU8C1MOO2... > > > He claims he can filter speech from background noise. > > > I recall discussing this possibility years ago. &#4294967295;Someone said, these > > filters already exist. &#4294967295;They do - they're notch filters! &#4294967295;It's close > > to brain dead, believing that constitutes 'voice filtering'. > > > Dr. Watts has been working on this for years, so I was wondering > > what techniques he's using, how much is public domain. &#4294967295;Anyone > > here know anything about the subject, or this &#4294967295;product? > > Is it neural nets, DSP filters, or what? > > > -- > > Rich > > A friend of mine, Professor Dana Anderson of the University of Colorado, > Boulder, made a statistics-based digital filter that could separate > different kinds of music mixed together, as well as music from noise. > The demo was really striking--you mix together, say jazz and classical > music from two MP3 players, feed it through the gizmo, and after (iirc) > about 10 seconds of learning, classical comes out of one speaker and > jazz out of the other. &#4294967295; Magic stuff--published in IEEE Acoustics around > 2006, I think. > > Cheers > > Phil Hobbs > -- > Dr Philip C D Hobbs > Principal Consultant > ElectroOptical Innovations LLC > Optics, Electro-optics, Photonics, Analog Electronics > > 160 North State Road #203 > Briarcliff Manor NY 10510 > 845-480-2058 > > hobbs at electrooptical dot nethttp://electrooptical.net
Yes but how are the two sources mixed in the first place? Acoustically or pure computer addition? Important because one of those is convolutive mixing. hardy
On Jan 20, 7:05&#4294967295;am, c...@kcwc.com (Curt Welch) wrote:
> "Jesse F. Hughes" <je...@phiwumbda.org> wrote: > > > Phil Hobbs <pcdhSpamMeSensel...@electrooptical.net> writes: > > > > A friend of mine, Professor Dana Anderson of the University of > > > Colorado, Boulder, made a statistics-based digital filter that could > > > separate different kinds of music mixed together, as well as music from > > > noise. The demo was really striking--you mix together, say jazz and > > > classical music from two MP3 players, feed it through the gizmo, and > > > after (iirc) about 10 seconds of learning, classical comes out of one > > > speaker and jazz out of the other. &#4294967295; Magic stuff--published in IEEE > > > Acoustics around 2006, I think. > > > That sounds really impressive, if it works as well as you describe. > > Here's a great little web demo of ICA - Independent Component Analyses. &#4294967295;It > can separate sources mixed together when recorded in different > "microphones" (I assume the demo is just a mathematical mixing and not done > by recording). > > http://research.ics.tkk.fi/ica/cocktail/cocktail_en.cgi > > This approach makes the assumption that the source signals are linearly > mixed together at different levels in each microphone recording (due to the > different distances each source is away from the microphone) but can > separate as many different sources as you have microphones. > > More info: > > http://en.wikipedia.org/wiki/Independent_component_analysis > > I would guess the telephone technology is using something similar since > they added a second microphone. > > The only statistical requirement for this to work is that the sources must > have a non-Gaussian distribution. > > BTW, this stuff is WAY past "notch filters" in complexity and power and > performance. > > -- > Curt Welch &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295;http://CurtWelch.Com/ > c...@kcwc.com &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295;http://NewsReader.Com/
again, don't be so impressed! Depends how they are mixed! Simple constant matrix mixing is quite easy to separate whereas more realistic convolutive mixing is much harder. in real acoustic environments the mixing polynomial matrix is more than often non-min phase too and of a very high order in some environments. Hardy
HardySpicer wrote:
> > On Jan 20, 2:35 am, Phil Hobbs > <pcdhSpamMeSensel...@electrooptical.net> wrote: > > RichD wrote: > > > > >http://www.sfgate.com/cgi-bin/article.cgi?f=/c/a/2012/01/16/BU8C1MOO2... > > > > > He claims he can filter speech from background noise. > > > > > I recall discussing this possibility years ago. Someone said, these > > > filters already exist. They do - they're notch filters! It's close > > > to brain dead, believing that constitutes 'voice filtering'. > > > > > Dr. Watts has been working on this for years, so I was wondering > > > what techniques he's using, how much is public domain. Anyone > > > here know anything about the subject, or this product? > > > Is it neural nets, DSP filters, or what? > > > > > -- > > > Rich > > > > A friend of mine, Professor Dana Anderson of the University of Colorado, > > Boulder, made a statistics-based digital filter that could separate > > different kinds of music mixed together, as well as music from noise. > > The demo was really striking--you mix together, say jazz and classical > > music from two MP3 players, feed it through the gizmo, and after (iirc) > > about 10 seconds of learning, classical comes out of one speaker and > > jazz out of the other. Magic stuff--published in IEEE Acoustics around > > 2006, I think. > > > > Cheers > > > > Phil Hobbs > > -- > > Dr Philip C D Hobbs > > Principal Consultant > > ElectroOptical Innovations LLC > > Optics, Electro-optics, Photonics, Analog Electronics > > > > 160 North State Road #203 > > Briarcliff Manor NY 10510 > > 845-480-2058 > > > > hobbs at electrooptical dot nethttp://electrooptical.net > > Yes but how are the two sources mixed in the first place? Acoustically > or pure computer addition? Important because one of those is > convolutive mixing. > > hardy
"Convolutive mixing"? As in use one as a filter for the other? They were summed, just as in every other audio mixer. Cheers Phil Hobbs -- Dr Philip C D Hobbs Principal Consultant ElectroOptical Innovations LLC Optics, Electro-optics, Photonics, Analog Electronics 160 North State Road #203 Briarcliff Manor NY 10510 845-480-2058 hobbs at electrooptical dot net http://electrooptical.net
On 1/19/2012 7:22 PM, HardySpicer wrote:
> On Jan 19, 5:51 pm, RichD<r_delaney2...@yahoo.com> wrote: >> http://www.sfgate.com/cgi-bin/article.cgi?f=/c/a/2012/01/16/BU8C1MOO2... >> >> He claims he can filter speech from background noise. >> >> I recall discussing this possibility years ago. Someone said, these >> filters already exist. They do - they're notch filters! It's close >> to brain dead, believing that constitutes 'voice filtering'. >> >> Dr. Watts has been working on this for years, so I was wondering >> what techniques he's using, how much is public domain. Anyone >> here know anything about the subject, or this product? >> Is it neural nets, DSP filters, or what? >> >> -- >> Rich > > Couldn't say but a notch filter won't work if it's wideband noise with > spectral overlap. > Things that might work...adaptive beamforming, at a stretch Blind > Source Separation (doubt it though). > > > Hardy
It is far better to use a better sensor (or sensor array as the case may be) than to clean up the signal after the fact. Microphones are cheap. Complex signal processing is power hungry. I rather have a good noise cancelling scheme than a pile of DSP post-processing. BTW, the worse people to give a technical opinion of anything is a stock analyst. ;-) Phone voice quality got crappy when it was deemed flip phones are not cool. Their is simply nothing like having the microphone in the right place. Some of the cellular cases mimic the old flip phone so that there is a pressure zone effect from the mouth to the microphone. You can google "sena flip case" to see them. Note some of the iphone cases are designed for docking, so they flip at the wrong end of the phone. Most other Sena flip cases flip so that the flap provides a path for the voice. Don't even get me started on bluetooth headsets where the microphone barely goes past the ear.
On Jan 18, 11:51&#4294967295;pm, RichD <r_delaney2...@yahoo.com> wrote:
> http://www.sfgate.com/cgi-bin/article.cgi?f=/c/a/2012/01/16/BU8C1MOO2... > > He claims he can filter speech from background noise. > > I recall discussing this possibility years ago. &#4294967295;Someone said, these > filters already exist. &#4294967295;They do - they're notch filters! &#4294967295;It's close > to brain dead, believing that constitutes 'voice filtering'. > > Dr. Watts has been working on this for years, so I was wondering > what techniques he's using, how much is public domain. &#4294967295;Anyone > here know anything about the subject, or this &#4294967295;product? > Is it neural nets, DSP filters, or what? > > -- > Rich
A cascade of simple bi-quad filters followed by ... implemented in silicon hardware Has nothing to do with speech recognition (at least at the present moment) The real question is how much of someone else's IP they have used in building their technology ? Audience's patents are available to anyone for viewing: https://www.google.com/search?tbo=p&tbm=pts&hl=en&q=ininventor:%22Lloyd+Watts%22 Some critical ingredients are clearly missing in those patents, most notably "pitch detection" Pitch is a key ingredient in any workable computational auditory scene analysis (CASA) model Too bad Audience's chips are about 1/10 inch in size and made in Taiwan ... go figure... Busting IPOs can be a profitable business... Any ideas ??? ____________________________________________________________________ "Audience, Inc: 12 years in the making - on the verge of profitability" "I wish I had an uncle named Paul Allen"
On Jan 20, 5:13&#4294967295;pm, Phil Hobbs
<pcdhSpamMeSensel...@electrooptical.net> wrote:
> HardySpicer wrote: > > > On Jan 20, 2:35 am, Phil Hobbs > > <pcdhSpamMeSensel...@electrooptical.net> wrote: > > > RichD wrote: > > > > >http://www.sfgate.com/cgi-bin/article.cgi?f=/c/a/2012/01/16/BU8C1MOO2... > > > > > He claims he can filter speech from background noise. > > > > > I recall discussing this possibility years ago. &#4294967295;Someone said, these > > > > filters already exist. &#4294967295;They do - they're notch filters! &#4294967295;It's close > > > > to brain dead, believing that constitutes 'voice filtering'. > > > > > Dr. Watts has been working on this for years, so I was wondering > > > > what techniques he's using, how much is public domain. &#4294967295;Anyone > > > > here know anything about the subject, or this &#4294967295;product? > > > > Is it neural nets, DSP filters, or what? > > > > > -- > > > > Rich > > > > A friend of mine, Professor Dana Anderson of the University of Colorado, > > > Boulder, made a statistics-based digital filter that could separate > > > different kinds of music mixed together, as well as music from noise. > > > The demo was really striking--you mix together, say jazz and classical > > > music from two MP3 players, feed it through the gizmo, and after (iirc) > > > about 10 seconds of learning, classical comes out of one speaker and > > > jazz out of the other. &#4294967295; Magic stuff--published in IEEE Acoustics around > > > 2006, I think. > > > > Cheers > > > > Phil Hobbs > > > -- > > > Dr Philip C D Hobbs > > > Principal Consultant > > > ElectroOptical Innovations LLC > > > Optics, Electro-optics, Photonics, Analog Electronics > > > > 160 North State Road #203 > > > Briarcliff Manor NY 10510 > > > 845-480-2058 > > > > hobbs at electrooptical dot nethttp://electrooptical.net > > > Yes but how are the two sources mixed in the first place? Acoustically > > or pure computer addition? Important because one of those is > > convolutive mixing. > > > hardy > > "Convolutive mixing"? &#4294967295;As in use one as a filter for the other? &#4294967295;They > were &#4294967295;summed, just as in every other audio mixer. > > Cheers > > Phil Hobbs > -- > Dr Philip C D Hobbs > Principal Consultant > ElectroOptical Innovations LLC > Optics, Electro-optics, Photonics, Analog Electronics > > 160 North State Road #203 > Briarcliff Manor NY 10510 > 845-480-2058 > > hobbs at electrooptical dot nethttp://electrooptical.net
Ordinary summing is quite easy to separate. convolutive mixing is not, the early work on BSS was with constant matrix mixing, the more recent work with convolutive mixing. Let me explain suppose we have two speech signals s1(k) and s2(k). Then define a vector S(k)=[s1(k) s2(k)]' ' is transpose. Define a vector of outputs Y(k)=[y1(k) y2(k)]' Then for ordinary mixing we measure Y(k)= AS(k) where A is unknown constant matrix. For convolutive mixing we have Y(k)=A(z^-1)S(k) where A(z^-1) is a polynomial matrix of some sort possibly of a high dimension in z^-1 (z-transform operator or unit step delay). The polynomial matrix could well be non-min phase ie the roots of det [A(z^-1)] may lie outside of the unit circle. Hardy
"Curt Welch"  wrote in message news:20120119130524.248$HD@newsreader.com...
> >"Jesse F. Hughes" <jesse@phiwumbda.org> wrote: >> Phil Hobbs <pcdhSpamMeSenseless@electrooptical.net> writes: >> >> > A friend of mine, Professor Dana Anderson of the University of >> > Colorado, Boulder, made a statistics-based digital filter that could >> > separate different kinds of music mixed together, as well as music from >> > noise. The demo was really striking--you mix together, say jazz and >> > classical music from two MP3 players, feed it through the gizmo, and >> > after (iirc) about 10 seconds of learning, classical comes out of one >> > speaker and jazz out of the other. Magic stuff--published in IEEE >> > Acoustics around 2006, I think. >> >> That sounds really impressive, if it works as well as you describe. > >Here's a great little web demo of ICA - Independent Component Analyses. It >can separate sources mixed together when recorded in different >"microphones" (I assume the demo is just a mathematical mixing and not done >by recording). > >http://research.ics.tkk.fi/ica/cocktail/cocktail_en.cgi > >This approach makes the assumption that the source signals are linearly >mixed together at different levels in each microphone recording (due to the >different distances each source is away from the microphone) but can >separate as many different sources as you have microphones. > >More info: > >http://en.wikipedia.org/wiki/Independent_component_analysis > >I would guess the telephone technology is using something similar since >they added a second microphone. > >The only statistical requirement for this to work is that the sources must >have a non-Gaussian distribution. > >BTW, this stuff is WAY past "notch filters" in complexity and power and >performance. >
This doesn't seem very impressive. Essentially one has a linear combination of the sounds B = Sum(a_k*A_k) where A_k is the original audio and a_k is how much it contributes to a mic. Given n mic's we have n such equations B_i = Sum(a_(k,i)*A_k) All it takes is simple linear algebra to recover the original A_k's. The a_(k,i)'s could easily be estimated by since they are in direct proportion to the mic placement. My guess why the demo is sounds good is because they use the exact coefficients use to create the mixed signals in the first place. I would bet the real world scenario would be must worse. --