Given the state of development in speech enhancement - here is a list of sites that I could find that have before and after samples: http://labrosa.ee.columbia.edu/speechsep/examples.html http://www.ee.columbia.edu/~marios/ctflp/ctflp.html http://www.uni-oldenburg.de/medi/demo/demo_separation.html http://www.uni-oldenburg.de/medi/demos/AnemullerKollmeier_2004/index.html http://www.cnl.salk.edu/~tewon/Blind/blind_audio.html http://www.uni-oldenburg.de/medi/demo/demo_asm.html http://www.cslu.ogi.edu/nsel/demos/index.html http://www.cslu.ogi.edu/nsel/demos/hybrid.html http://www.dspalgorithms.com/products/nr.html http://www.dspalgorithms.com/products/canec.html and an enhancer for speech disabilities: http://www.speechenhancer.com/hearforyourself.htm I was wondering why speech enhancement products haven't appeared in low cost consumer headphones and mobile phones (mobile phone speech output remains tinny - although that could be related to the speaker limitations and ambient sound variability). Is it because the better algorithms are computationally very expensive (thus requiring bigger DSPs and thus greater cost) ? Is it the sort of thing that could be implemented on chips of the PIC 18F variety ? That is ultra cheap and low component count. How about a $10 chip with up to 512Kbytes of built-in RAM ? Or is the complexity so far out that it is unlikely to appear in mobile phones very soon ? Some of the results from the two-microphone samples above is quite impressive. I am not an expert in DSP chips, but can understand tech/math answers.
computational load of current speech enhancement algorithms
Started by ●September 1, 2007
Reply by ●September 1, 20072007-09-01
On 2007-09-01 11:34:41 -0300, "dspspeech" <dspspeech@yahoo.com> said:> Given the state of development in speech enhancement - here is a list of > sites that I could find that have before and after samples: > > http://labrosa.ee.columbia.edu/speechsep/examples.html > http://www.ee.columbia.edu/~marios/ctflp/ctflp.html > http://www.uni-oldenburg.de/medi/demo/demo_separation.html > http://www.uni-oldenburg.de/medi/demos/AnemullerKollmeier_2004/index.html > http://www.cnl.salk.edu/~tewon/Blind/blind_audio.html > http://www.uni-oldenburg.de/medi/demo/demo_asm.html > http://www.cslu.ogi.edu/nsel/demos/index.html > http://www.cslu.ogi.edu/nsel/demos/hybrid.html > http://www.dspalgorithms.com/products/nr.html > http://www.dspalgorithms.com/products/canec.html > and > an enhancer for speech disabilities: > http://www.speechenhancer.com/hearforyourself.htm > > > I was wondering why speech enhancement products haven't appeared in low > cost consumer headphones and mobile phones (mobile phone speech output > remains tinny - although that could be related to the speaker limitations > and ambient sound variability). > > Is it because the better algorithms are computationally very expensive > (thus requiring bigger DSPs and thus greater cost) ?My Panasonic cordless telephone offers both speech enhancement and speech slowing. It is a consumer product but it cost more than the old Princess corded phones that used to go for $8 or so.> Is it the sort of thing that could be implemented on chips of the PIC 18F > variety ? That is ultra cheap and low component count. How about a $10 > chip with up to 512Kbytes of built-in RAM ? > > Or is the complexity so far out that it is unlikely to appear in mobile > phones very soon ? Some of the results from the two-microphone samples > above is quite impressive. > > > I am not an expert in DSP chips, but can understand tech/math answers.
Reply by ●September 1, 20072007-09-01
dspspeech wrote:> Given the state of development in speech enhancement - here is a list of > sites that I could find that have before and after samples: > > http://labrosa.ee.columbia.edu/speechsep/examples.html > http://www.ee.columbia.edu/~marios/ctflp/ctflp.html > http://www.uni-oldenburg.de/medi/demo/demo_separation.html > http://www.uni-oldenburg.de/medi/demos/AnemullerKollmeier_2004/index.html > http://www.cnl.salk.edu/~tewon/Blind/blind_audio.html > http://www.uni-oldenburg.de/medi/demo/demo_asm.html > http://www.cslu.ogi.edu/nsel/demos/index.html > http://www.cslu.ogi.edu/nsel/demos/hybrid.html > http://www.dspalgorithms.com/products/nr.html > http://www.dspalgorithms.com/products/canec.html > and > an enhancer for speech disabilities: > http://www.speechenhancer.com/hearforyourself.htm > > > I was wondering why speech enhancement products haven't appeared in low > cost consumer headphones and mobile phones (mobile phone speech output > remains tinny - although that could be related to the speaker limitations > and ambient sound variability). > > Is it because the better algorithms are computationally very expensive > (thus requiring bigger DSPs and thus greater cost) ? > > > Is it the sort of thing that could be implemented on chips of the PIC 18F > variety ? That is ultra cheap and low component count. How about a $10 > chip with up to 512Kbytes of built-in RAM ? > > Or is the complexity so far out that it is unlikely to appear in mobile > phones very soon ? Some of the results from the two-microphone samples > above is quite impressive. > > > I am not an expert in DSP chips, but can understand tech/math answers.The problem is that speech recognition works well only if you have an SNR of 20dB or more. In any noise environment the hit rate goes down. That's why there is a big effort on topics like acoustic beamformers to try and improve SNR. Hardy
Reply by ●September 1, 20072007-09-01
On Sep 1, 10:34 am, "dspspeech" <dspspe...@yahoo.com> wrote:> Given the state of development in speech enhancement -what do speech enhancement algorithms do? change a boring and tedious speech into an interesting and dynamic one? r b-j
Reply by ●September 1, 20072007-09-01
"robert bristow-johnson" <rbj@audioimagination.com> wrote in message news:1188691905.058469.288800@19g2000hsx.googlegroups.com...> On Sep 1, 10:34 am, "dspspeech" <dspspe...@yahoo.com> > wrote: >> Given the state of development in speech enhancement - > > what do speech enhancement algorithms do? change a boring > and tedious > speech into an interesting and dynamic one? >They turn an inebriated Ted Kennedy into a sober Barney Frank. :-)
Reply by ●September 1, 20072007-09-01
"John E. Hadstate" <jh113355@hotmail.com> writes:> "robert bristow-johnson" <rbj@audioimagination.com> wrote in > message > news:1188691905.058469.288800@19g2000hsx.googlegroups.com... >> On Sep 1, 10:34 am, "dspspeech" <dspspe...@yahoo.com> >> wrote: >>> Given the state of development in speech enhancement - >> >> what do speech enhancement algorithms do? change a boring >> and tedious >> speech into an interesting and dynamic one? >> > > They turn an inebriated Ted Kennedy into a sober Barney > Frank. :-)Perhaps toe-tapping masking algorithms would be useful in the future as well... -- % Randy Yates % "With time with what you've learned, %% Fuquay-Varina, NC % they'll kiss the ground you walk %%% 919-577-9882 % upon." %%%% <yates@ieee.org> % '21st Century Man', *Time*, ELO http://home.earthlink.net/~yatescr
Reply by ●September 1, 20072007-09-01
dspspeech wrote:> Given the state of development in speech enhancement - here is a list of > sites that I could find that have before and after samples: > > http://labrosa.ee.columbia.edu/speechsep/examples.html > http://www.ee.columbia.edu/~marios/ctflp/ctflp.html > http://www.uni-oldenburg.de/medi/demo/demo_separation.html > http://www.uni-oldenburg.de/medi/demos/AnemullerKollmeier_2004/index.html > http://www.cnl.salk.edu/~tewon/Blind/blind_audio.html > http://www.uni-oldenburg.de/medi/demo/demo_asm.html > http://www.cslu.ogi.edu/nsel/demos/index.html > http://www.cslu.ogi.edu/nsel/demos/hybrid.html > http://www.dspalgorithms.com/products/nr.html > http://www.dspalgorithms.com/products/canec.html > and > an enhancer for speech disabilities: > http://www.speechenhancer.com/hearforyourself.htmDon't the marketing samples sound good? :-)> I was wondering why speech enhancement products haven't appeared in low > cost consumer headphones and mobile phones (mobile phone speech output > remains tinny - although that could be related to the speaker limitations > and ambient sound variability).Cell phones do use some forms of enhancement. However, when you get beyond the marketing samples you find most aggressive noise reduction schemes come at the price of must poorer voice quality.> Is it because the better algorithms are computationally very expensive > (thus requiring bigger DSPs and thus greater cost) ?I doubt that is a huge consideration these days, especially for high end phones.> Is it the sort of thing that could be implemented on chips of the PIC 18F > variety ? That is ultra cheap and low component count. How about a $10 > chip with up to 512Kbytes of built-in RAM ?An entire cell phone costs <$10 in silicon, so a $10 chip would be pretty exotic. :-)> Or is the complexity so far out that it is unlikely to appear in mobile > phones very soon ? Some of the results from the two-microphone samples > above is quite impressive.Multiple microphones, well spaced, change the picture completely, but what consumer products could make effective use of multiple microphones?> I am not an expert in DSP chips, but can understand tech/math answers.Try some of the denoising demos for denoising a wave file, rather than a more complex setup. I haven't found one where the after is preferable to the before. Sure, the after has less noise, but the voice is so badly affected I'd rather live with the noise. Speech recognition systems have rather different listening criteria from me, and some of those love the results of the denoisers. :-) Regards, Steve
Reply by ●September 1, 20072007-09-01
Thanks for the replies. Regarding the dramatic "the state of speech enhancement", I guess should have called it a survey of existing techniques/cost. The problem I have is that I have a preliminary algorithm in development which could be used for that purpose. I was trying to see what the universe of applications would be however (as motivation) and what the state of computational complexity stands in this field (so I have some understanding of just how complex is usable/better) for common use in cheap consumer hardware. From my survey (list provided in original post), the multi-microphone ones perform significantly better. My effort is for monophonic at the moment (although I see the multi-microphone approach will probably win out for cell phones etc. for it's considerable advantages). In looking at competitor algorithms, I was trying to see what type of DSP chips would currently be used for some of the better speech enhancements, thus giving some sense of the cost ballpark. If it is above $5 say (requiring more than the simple PIC chips) and that type of stuff or do they currently require significantly more hardware (and that is why they are currently not that commonly implemented - in mobile phones for instance).
Reply by ●September 1, 20072007-09-01
>Don't the marketing samples sound good? :-)>An entire cell phone costs <$10 in silicon, so a $10 chip would be >pretty exotic. :-)I am assuming that on a cell phone there is already dedicated hardware (some standard and some specific to say Nokia for some aspect of speech cleaning). So maybe the question should be "that would be implementable by Nokia in hardware" without denting their hardware real estate too much. But my question on use of PIC chips is appropriate in that it gives an idea of the complexity in silicon required etc.>Multiple microphones, well spaced, change the picture completely, but >what consumer products could make effective use of multiple microphones?I not asking specifically about multiple microphones (because I am thinking about monophonic right now), however cell phones would be a perfect match.>more complex setup. I haven't found one where the after is preferable to>the before. Sure, the after has less noise, but the voice is so badly >affected I'd rather live with the noise. Speech recognition systems have>rather different listening criteria from me, and some of those love the >results of the denoisers. :-)Yes this is my experience with the (at least the list in the first post). Most denoisers seem to muffle the speech as well, when the original is better filtered by the human ear. The multi-microphone one however were much better however. On a multi-microphone and a two person speaking setup (a few of the links in the original post) while the results were good, I wondered if they were actually doing pitch or voice tract personality (i.e. speaker recognition) tracking, or was it purely (and simply) volume based - i.e. the dominant speaker is separated (or vice versa the weaker speaker is separated). The dominant speaker way while simple would still work for mobile phone applications (since the main speaker is usually the closest) and even for speakerphone use by many people talking at the same time, "filtering by closest/loudest speaker" IS a canonical approach from a user interface/human expectation standpoint (fits well nicely with how humans structure their communal speech - loudest is usually considered most urgent and given most attention).
Reply by ●September 2, 20072007-09-02
Hi, a side note: one could put a noise reduction algorithm into an existing integrated circuit. For example together with the voice CODEC. But forget about any hardware solution that requires an additional IC, even if it's a 1 cent component. Have a look at the iphone inside (last picture is most interesting) http://www.macnn.com/articles/07/06/29/first.iphone.disassembly/ or the N95, which supports more radio bands and is more complex in general: http://www.flickr.com/photos/devilsrejection/sets/72157594527138974/ The space inside those boxes is VERY expensive for the manufacturers. -mn






