Reply by Jon Harris April 12, 20062006-04-12
"Richard Owlett" <rowlett@atlascomm.net> wrote in message 
news:123dj14mgtn4d44@corp.supernews.com...
> Jerry Avins wrote: >> rickman wrote: >> >>> BTW, I know you want to do it "your way", but sampling at 44 kHz will >>> not give you any extra useful information that you won't get at 8 kHz. >>> The phone company is not dumb and they realized a long time ago that >>> the range of frequencies required to carry intelligible voice is less >>> than 4 kHz. The higher frequencies do not add much to the picture, but >>> will require a lot more power to analyze. Have you found any content >>> in the higher frequencies that others have not? >> >> >> I find that intelligibility of speech -- even screechy soprano speech -- >> is hardly impaired by my hearing loss. I can't hear the top two notes on a >> piano. Now that's a brick-wall filter! > > I look for wider than telco land line bandwidth for 2 sets of reasons: > > A1. I've read articles saying telco speech is "intelligible" but it is > easier if higher frequencies available. The distinguishing features > of consonants are in the higher frequencies. *HUMANS* automatically > resolve ambiguities based on context (several layers of?).
This is easy to verify by having someone say a long set of random letters over the phone while you try to right them down. You might be surprised how poorly the accuracy is, especially in discriminating pairs such as s/f, v/z, p/t, etc.. This problem is probably also part of the reason the military uses the "alpha, bravo, charlie..." alphabet for spelling out things. So while the human brain does a very good job at understanding normal speech from context clues over a limited frequency channel, additional high frequencies make it much easier on the brain (and hence less tiring for long conversations). There is certainly a matter of diminishing returns, but in my experience, even extending from the normal phone high frequency response to 4-5 kHz makes a substantial difference.
Reply by Richard Owlett April 7, 20062006-04-07
Jerry Avins wrote:
> rickman wrote: > > ... > >> BTW, I know you want to do it "your way", but sampling at 44 kHz will >> not give you any extra useful information that you won't get at 8 kHz. >> The phone company is not dumb and they realized a long time ago that >> the range of frequencies required to carry intelligible voice is less >> than 4 kHz. The higher frequencies do not add much to the picture, but >> will require a lot more power to analyze. Have you found any content >> in the higher frequencies that others have not? > > > I find that intelligibility of speech -- even screechy soprano speech -- > is hardly impaired by my hearing loss. I can't hear the top two notes on > a piano. Now that's a brick-wall filter!
/ begin chuckle I'll see your "normal" hearing loss and raise you my *abnormal* loss / end chuckle I look for wider than telco land line bandwidth for 2 sets of reasons: A1. I've read articles saying telco speech is "intelligible" but it is easier if higher frequencies available. The distinguishing features of consonants are in the higher frequencies. *HUMANS* automatically resolve ambiguities based on context (several layers of?). A2. If I read correctly, speech recognition tends to use _AT MOST_ the first three formants (~150 -> ~1300 Hz) for vowels and presumably some similar range for consonants. B. I have a "good" ear and a *BAD* ear ;) "Bad ear" degraded due to scars on ear drum and nerve damage related to childhood ear infections. B1. If enough points are recorded, the response of my "bad ear" resembles a "picket fence" [Guess what a lifer seargant said when I took an *ENLISTMENT* physical with a bunch of _DRAFTEES_ during Viet Nam ;] B2.
> > Sampling at 44.1 will make recursive filters easier and transversal > filters harder. Given Richard's view that he needs linear phase, 44.1 is > asking for trouble. In his place, though I would plan to sample at > around 12 KHz or, if it's enough simpler, 22.05. You can't tell that the > high end isn't useful unless you have it. > > Jerry
Reply by Jerry Avins March 28, 20062006-03-28
rickman wrote:

   ...

> BTW, I know you want to do it "your way", but sampling at 44 kHz will > not give you any extra useful information that you won't get at 8 kHz. > The phone company is not dumb and they realized a long time ago that > the range of frequencies required to carry intelligible voice is less > than 4 kHz. The higher frequencies do not add much to the picture, but > will require a lot more power to analyze. Have you found any content > in the higher frequencies that others have not?
I find that intelligibility of speech -- even screechy soprano speech -- is hardly impaired by my hearing loss. I can't hear the top two notes on a piano. Now that's a brick-wall filter! Sampling at 44.1 will make recursive filters easier and transversal filters harder. Given Richard's view that he needs linear phase, 44.1 is asking for trouble. In his place, though I would plan to sample at around 12 KHz or, if it's enough simpler, 22.05. You can't tell that the high end isn't useful unless you have it. Jerry -- Engineering is the art of making what you want from things you can get. &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;
Reply by rickman March 28, 20062006-03-28
Richard Owlett wrote:
> I often drive experts nuts. > I ask a question they -- *THROUGH EXPERIENCE* -- 'recognize' as ill > defined. They try to *FORCE* me into asking what they consider a "well > defined" question. > > _BUT_ I *did not* want to ask their "well defined" question. > > I WANTED TO ASK *my* QUESTION ;) > In this case, due to possibility of parallelisms, FPGA's came to mind. > So my question is much more about > "How much of this would fit on what size FPGA?" > rather than > "Would a sane person use a DSP or FPGA for this application?"
Ok, I get it, you are not trying to actually solve a problem in the best manner given the available tools. You want to explore the domain of tools available for solving the problem. Then knock yourself out! It will take considerable resources to do this sort of algorithm on an FPGA if you are going to do it all in parallel. With speeds of hundreds of MHz, both DSPs and FPGAs are capable of processing audio signals by multiplexing a single processing element. But if you want to build lots of little processing units and clock them all slowly, that will work as well. It is hard to advise you to reasonable answers to your questions until you frame them in a way that can be answered. Regardless of whether I think you are using the best string for the job or not, I can't answer the question, "how long is a piece of string"... or in other words, "how large of an FPGA do I need"? You need to define the processing first. BTW, I know you want to do it "your way", but sampling at 44 kHz will not give you any extra useful information that you won't get at 8 kHz. The phone company is not dumb and they realized a long time ago that the range of frequencies required to carry intelligible voice is less than 4 kHz. The higher frequencies do not add much to the picture, but will require a lot more power to analyze. Have you found any content in the higher frequencies that others have not?
Reply by Ron N. March 27, 20062006-03-27
Richard Owlett wrote:
> While looking at "the problem", I started wondering if a massively > parallel solution would be worth exploring.
Please note that parallelism and FPGA implementation are two orthogonal concepts. Current large FPGA's do allow one to implemently highly parallel algorithms on certain classes of problems. But there are products other than those marketed as FPGA's which also offer various forms of parallelism (forms of VLIW, SIMD, multi-core, multi-thread, and configurable arithmetic, to name just a few). IMHO. YMMV. -- rhn A.T nicholson d.0.t C-o-M
Reply by Richard Owlett March 27, 20062006-03-27
rickman wrote:
> Richard Owlett wrote: > >>I have a interest in speech recognition. I haven't purchased anything >>because users *and* VARs have told be I probably wouldn't be satisfied. >>[My desires would be better met by a 'discrete speech' rather than the >>current orientation to 'continuous speech -- but that discussion goes >>far OT] >> >>That said, I read in comp.speech.users and comp.speech.research of what >>my gut says are excessively tight constraints on the acoustic >>environment [especially normal office noise and careful mike placement]. >> >>I envision an external signal conditioning module containing >>5-8 band pass linear phase with 5 < Q < 20 >>2-10 band pass linear phase with 20 < Q < 200 >> [all above outputs in time sync with a latency of < .1 second] >>{ How I combine this to get off chip is up in the air.] >> >>The above is based on some almost *TOTALLY UNTESTED* ideas of what I >>really want to accomplish. I was thinking the parallel nature of what I >>envisioned lent it to a FPGA approach. > > > Given that what you really want, but did not state, is an evrironment > for exploring your ideas, I would recommend that you focus on your > algorithms by writing software on a PC.
I'm already doing almost that. I'm using Scilab to explore various pieces of "the problem" [N.B. not "the solution" ;] While looking at "the problem", I started wondering if a massively parallel solution would be worth exploring. /start side bar I think in different paths than most modern engineers [won't say what family thinks ;] Jerry's old enough to pick up on inferences of following 1. My father a. helped build and *LEGALLY* operate a spark gap transmitter b. perused a BS(ME) as it had more EE than a BS(EE) at that time c. published article(s?) on "magnetic amplifiers" 2. I a. remember chopper stabilized op amps b. have used op amps which required >= 1 inch of 19" rack [ how many youngsters even know what a 19" rack is ] c. read of/on/about *ANALOG COMPUTERS* while still used ;) /temporary end side bar :) I suspect that the key to my thought pattern is 2.c. YEPP want 'digital' implementation of *ANALOG COMPUTER* How many analog simulation blocks can I stuff in a FPGA? Jerry, you program/use FORTH, do you get an idea of where I wish to attempt and possibly to go?
> The decision of whether to use > an FPGA or a DSP is an implementation decision. Given that your > signals are audio frequency, even lower that the full range of human > hearing most likely, I would say you can do anything you can think of > on a DSP. There are low power, fixed point devices that can process > 100's of millions instructions per second. There are larger, more > power hungry units that can process up to a billion or more useful > instructions per second. So it is unlikely that you can create an > algorithm that requires more processing than this from an 8 kHz sampled > audio stream.
Initial sampling will be at 44 kHz or greater. My "gut" says that this is part of problem with modern speech recognition systems -- they "ignore"/"throw away" *too* much data. The output of my 'black box' would be "sampled" as speech recognizer _PRESUMES_.
> > That said, there is also the development effort to consider. Typically > before you even begin working with DSPs the algorithm is fully explored > and tested on a PC in C or other environment of your choice such as > Matlab. Once you have a fully functional model, then you can be > concerned with the implementation. This goes double for FPGAs because > they are not nearly as flexible to significant archetecture changes. > Sure, you can change your code (VHDL or verilog), but typically the > code is woven tighter and changes can impact the code a lot more.
Now you come up against another aberrant aspect of how I think. I often drive experts nuts. I ask a question they -- *THROUGH EXPERIENCE* -- 'recognize' as ill defined. They try to *FORCE* me into asking what they consider a "well defined" question. _BUT_ I *did not* want to ask their "well defined" question. I WANTED TO ASK *my* QUESTION ;) In this case, due to possibility of parallelisms, FPGA's came to mind. So my question is much more about "How much of this would fit on what size FPGA?" rather than "Would a sane person use a DSP or FPGA for this application?"
> > The main difference between the two is that the DSP has just one or two > ALUs to do the MAC operation (which is the most often used function in > DSP). It runs at a high speed and, just like in a PC, it switches > between functions to do things serially, but appear to be in parallel. > The FPGA is capable of actually doing things in parallel. With very > high speed sample rates this can be important since there is not time > to switch a DSP between different tasks. But at audio rates there is > normally tons of time for the DSP to do many different processing on > the data before the next sample or batch of samples come in. > > So in summary, it would be prudent to simulate your algorithm on a PC > to get the idea fleshed out and working. Then you can decide if you > want to implement on a floating point DSP, a fixed point DSP or an > FPGA. Then you get all the fun of actually doing the real work. >
Reply by Jerry Avins March 25, 20062006-03-25
rickman wrote:
> Richard Owlett wrote: > >>I have a interest in speech recognition. I haven't purchased anything >>because users *and* VARs have told be I probably wouldn't be satisfied.
...
> Given that what you really want, but did not state, is an evrironment > for exploring your ideas, I would recommend that you focus on your > algorithms by writing software on a PC.
... That and the rest of what you wrote makes good sense. (I suppose that's why most of us work that way.) I'm glad you wrote it. I was only beginning to formulate it, and you said it better than I would have. Jerry -- Engineering is the art of making what you want from things you can get. &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;
Reply by rickman March 25, 20062006-03-25
Richard Owlett wrote:
> I have a interest in speech recognition. I haven't purchased anything > because users *and* VARs have told be I probably wouldn't be satisfied. > [My desires would be better met by a 'discrete speech' rather than the > current orientation to 'continuous speech -- but that discussion goes > far OT] > > That said, I read in comp.speech.users and comp.speech.research of what > my gut says are excessively tight constraints on the acoustic > environment [especially normal office noise and careful mike placement]. > > I envision an external signal conditioning module containing > 5-8 band pass linear phase with 5 < Q < 20 > 2-10 band pass linear phase with 20 < Q < 200 > [all above outputs in time sync with a latency of < .1 second] > { How I combine this to get off chip is up in the air.] > > The above is based on some almost *TOTALLY UNTESTED* ideas of what I > really want to accomplish. I was thinking the parallel nature of what I > envisioned lent it to a FPGA approach.
Given that what you really want, but did not state, is an evrironment for exploring your ideas, I would recommend that you focus on your algorithms by writing software on a PC. The decision of whether to use an FPGA or a DSP is an implementation decision. Given that your signals are audio frequency, even lower that the full range of human hearing most likely, I would say you can do anything you can think of on a DSP. There are low power, fixed point devices that can process 100's of millions instructions per second. There are larger, more power hungry units that can process up to a billion or more useful instructions per second. So it is unlikely that you can create an algorithm that requires more processing than this from an 8 kHz sampled audio stream. That said, there is also the development effort to consider. Typically before you even begin working with DSPs the algorithm is fully explored and tested on a PC in C or other environment of your choice such as Matlab. Once you have a fully functional model, then you can be concerned with the implementation. This goes double for FPGAs because they are not nearly as flexible to significant archetecture changes. Sure, you can change your code (VHDL or verilog), but typically the code is woven tighter and changes can impact the code a lot more. The main difference between the two is that the DSP has just one or two ALUs to do the MAC operation (which is the most often used function in DSP). It runs at a high speed and, just like in a PC, it switches between functions to do things serially, but appear to be in parallel. The FPGA is capable of actually doing things in parallel. With very high speed sample rates this can be important since there is not time to switch a DSP between different tasks. But at audio rates there is normally tons of time for the DSP to do many different processing on the data before the next sample or batch of samples come in. So in summary, it would be prudent to simulate your algorithm on a PC to get the idea fleshed out and working. Then you can decide if you want to implement on a floating point DSP, a fixed point DSP or an FPGA. Then you get all the fun of actually doing the real work.
Reply by Jerry Avins March 24, 20062006-03-24
Richard Owlett wrote:
> Jerry Avins wrote: > >> Richard Owlett wrote: >> >>> Andor wrote: >>> >>>> Jerry Avins wrote: >>>> >>>>> Richard Owlett wrote: >>>>> >>>>> ... >>>>> >>>>> >>>>>> I envision an external signal conditioning module containing >>>>>> 5-8 band pass linear phase with 5 < Q < 20 >>>>>> 2-10 band pass linear phase with 20 < Q < 200 >>>>> >>>>> >>>>> >>>>> >>>>> I'm left wondering how "Q" and "linear phase" are implemented >>>>> together. >>>> >>>> >>>> >>>> >>>> >>>> For FIR (and some IIR) systems, magnitude and phase response can be >>>> specified independently. I don't see any problems in implementing those >>>> specs. >>>> >>> >>> My recollection of a definition of Q was simply ratio of (center >>> frequency) to (bandwidth). I'm implicitly leaving flatness in the >>> passband loosely (if at all) constrained. >>> >>> Also I thought that FIR filters had intrinsically constant group >>> delay -- which is what's needed. Should I've not said "linear phase", >>> thought they were implicitly equivalent? >> >> >> >> I suspected that you meant bandwidth. Q has other implications that I >> imagine you don't mean. >> >> Jerry > > > What other implications does Q have? > Q was a natural fit as, at least for the Q < 20 an series R-L-C , would > have a magnitude response appropriate to my ideas. [Would it have > constant group delay though?]
Your RLC with a Q of 20 won't have a flat top. A good IF passband may have the same 3-dB points as as a tank with a Q of to, but it will be a lot flatter in the passband and have much better skirt selectivity. The tank will have leading phase above midband and lagging below. The IF filter will better approximate a constant group delay. Jerry -- Engineering is the art of making what you want from things you can get. &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;
Reply by Richard Owlett March 24, 20062006-03-24
Jerry Avins wrote:

> Richard Owlett wrote: > >> Andor wrote: >> >>> Jerry Avins wrote: >>> >>>> Richard Owlett wrote: >>>> >>>> ... >>>> >>>> >>>>> I envision an external signal conditioning module containing >>>>> 5-8 band pass linear phase with 5 < Q < 20 >>>>> 2-10 band pass linear phase with 20 < Q < 200 >>>> >>>> >>>> >>>> I'm left wondering how "Q" and "linear phase" are implemented together. >>> >>> >>> >>> >>> For FIR (and some IIR) systems, magnitude and phase response can be >>> specified independently. I don't see any problems in implementing those >>> specs. >>> >> >> My recollection of a definition of Q was simply ratio of (center >> frequency) to (bandwidth). I'm implicitly leaving flatness in the >> passband loosely (if at all) constrained. >> >> Also I thought that FIR filters had intrinsically constant group delay >> -- which is what's needed. Should I've not said "linear phase", >> thought they were implicitly equivalent? > > > I suspected that you meant bandwidth. Q has other implications that I > imagine you don't mean. > > Jerry
What other implications does Q have? Q was a natural fit as, at least for the Q < 20 an series R-L-C , would have a magnitude response appropriate to my ideas. [Would it have constant group delay though?]