"Richard Owlett" <rowlett@atlascomm.net> wrote in message
news:123dj14mgtn4d44@corp.supernews.com...
> Jerry Avins wrote:
>> rickman wrote:
>>
>>> BTW, I know you want to do it "your way", but sampling at 44 kHz will
>>> not give you any extra useful information that you won't get at 8 kHz.
>>> The phone company is not dumb and they realized a long time ago that
>>> the range of frequencies required to carry intelligible voice is less
>>> than 4 kHz. The higher frequencies do not add much to the picture, but
>>> will require a lot more power to analyze. Have you found any content
>>> in the higher frequencies that others have not?
>>
>>
>> I find that intelligibility of speech -- even screechy soprano speech --
>> is hardly impaired by my hearing loss. I can't hear the top two notes on a
>> piano. Now that's a brick-wall filter!
>
> I look for wider than telco land line bandwidth for 2 sets of reasons:
>
> A1. I've read articles saying telco speech is "intelligible" but it is
> easier if higher frequencies available. The distinguishing features
> of consonants are in the higher frequencies. *HUMANS* automatically
> resolve ambiguities based on context (several layers of?).
This is easy to verify by having someone say a long set of random letters over
the phone while you try to right them down. You might be surprised how poorly
the accuracy is, especially in discriminating pairs such as s/f, v/z, p/t, etc..
This problem is probably also part of the reason the military uses the "alpha,
bravo, charlie..." alphabet for spelling out things. So while the human brain
does a very good job at understanding normal speech from context clues over a
limited frequency channel, additional high frequencies make it much easier on
the brain (and hence less tiring for long conversations). There is certainly a
matter of diminishing returns, but in my experience, even extending from the
normal phone high frequency response to 4-5 kHz makes a substantial difference.
Reply by Richard Owlett●April 7, 20062006-04-07
Jerry Avins wrote:
> rickman wrote:
>
> ...
>
>> BTW, I know you want to do it "your way", but sampling at 44 kHz will
>> not give you any extra useful information that you won't get at 8 kHz.
>> The phone company is not dumb and they realized a long time ago that
>> the range of frequencies required to carry intelligible voice is less
>> than 4 kHz. The higher frequencies do not add much to the picture, but
>> will require a lot more power to analyze. Have you found any content
>> in the higher frequencies that others have not?
>
>
> I find that intelligibility of speech -- even screechy soprano speech --
> is hardly impaired by my hearing loss. I can't hear the top two notes on
> a piano. Now that's a brick-wall filter!
/ begin chuckle
I'll see your "normal" hearing loss and raise you my *abnormal* loss
/ end chuckle
I look for wider than telco land line bandwidth for 2 sets of reasons:
A1. I've read articles saying telco speech is "intelligible" but it is
easier if higher frequencies available. The distinguishing features
of consonants are in the higher frequencies. *HUMANS* automatically
resolve ambiguities based on context (several layers of?).
A2. If I read correctly, speech recognition tends to use _AT MOST_ the
first three formants (~150 -> ~1300 Hz) for vowels and presumably
some similar range for consonants.
B. I have a "good" ear and a *BAD* ear ;)
"Bad ear" degraded due to scars on ear drum and nerve damage related
to childhood ear infections.
B1. If enough points are recorded, the response of my "bad ear"
resembles a "picket fence" [Guess what a lifer seargant said when I
took an *ENLISTMENT* physical with a bunch of _DRAFTEES_ during
Viet Nam ;]
B2.
>
> Sampling at 44.1 will make recursive filters easier and transversal
> filters harder. Given Richard's view that he needs linear phase, 44.1 is
> asking for trouble. In his place, though I would plan to sample at
> around 12 KHz or, if it's enough simpler, 22.05. You can't tell that the
> high end isn't useful unless you have it.
>
> Jerry
Reply by Jerry Avins●March 28, 20062006-03-28
rickman wrote:
...
> BTW, I know you want to do it "your way", but sampling at 44 kHz will
> not give you any extra useful information that you won't get at 8 kHz.
> The phone company is not dumb and they realized a long time ago that
> the range of frequencies required to carry intelligible voice is less
> than 4 kHz. The higher frequencies do not add much to the picture, but
> will require a lot more power to analyze. Have you found any content
> in the higher frequencies that others have not?
I find that intelligibility of speech -- even screechy soprano speech --
is hardly impaired by my hearing loss. I can't hear the top two notes on
a piano. Now that's a brick-wall filter!
Sampling at 44.1 will make recursive filters easier and transversal
filters harder. Given Richard's view that he needs linear phase, 44.1 is
asking for trouble. In his place, though I would plan to sample at
around 12 KHz or, if it's enough simpler, 22.05. You can't tell that the
high end isn't useful unless you have it.
Jerry
--
Engineering is the art of making what you want from things you can get.
�����������������������������������������������������������������������
Reply by rickman●March 28, 20062006-03-28
Richard Owlett wrote:
> I often drive experts nuts.
> I ask a question they -- *THROUGH EXPERIENCE* -- 'recognize' as ill
> defined. They try to *FORCE* me into asking what they consider a "well
> defined" question.
>
> _BUT_ I *did not* want to ask their "well defined" question.
>
> I WANTED TO ASK *my* QUESTION ;)
> In this case, due to possibility of parallelisms, FPGA's came to mind.
> So my question is much more about
> "How much of this would fit on what size FPGA?"
> rather than
> "Would a sane person use a DSP or FPGA for this application?"
Ok, I get it, you are not trying to actually solve a problem in the
best manner given the available tools. You want to explore the domain
of tools available for solving the problem. Then knock yourself out!
It will take considerable resources to do this sort of algorithm on an
FPGA if you are going to do it all in parallel. With speeds of
hundreds of MHz, both DSPs and FPGAs are capable of processing audio
signals by multiplexing a single processing element. But if you want
to build lots of little processing units and clock them all slowly,
that will work as well.
It is hard to advise you to reasonable answers to your questions until
you frame them in a way that can be answered. Regardless of whether I
think you are using the best string for the job or not, I can't answer
the question, "how long is a piece of string"... or in other words,
"how large of an FPGA do I need"? You need to define the processing
first.
BTW, I know you want to do it "your way", but sampling at 44 kHz will
not give you any extra useful information that you won't get at 8 kHz.
The phone company is not dumb and they realized a long time ago that
the range of frequencies required to carry intelligible voice is less
than 4 kHz. The higher frequencies do not add much to the picture, but
will require a lot more power to analyze. Have you found any content
in the higher frequencies that others have not?
Reply by Ron N.●March 27, 20062006-03-27
Richard Owlett wrote:
> While looking at "the problem", I started wondering if a massively
> parallel solution would be worth exploring.
Please note that parallelism and FPGA implementation are two
orthogonal concepts. Current large FPGA's do allow one to
implemently highly parallel algorithms on certain classes
of problems. But there are products other than those marketed
as FPGA's which also offer various forms of parallelism (forms of
VLIW, SIMD, multi-core, multi-thread, and configurable arithmetic,
to name just a few).
IMHO. YMMV.
--
rhn A.T nicholson d.0.t C-o-M
Reply by Richard Owlett●March 27, 20062006-03-27
rickman wrote:
> Richard Owlett wrote:
>
>>I have a interest in speech recognition. I haven't purchased anything
>>because users *and* VARs have told be I probably wouldn't be satisfied.
>>[My desires would be better met by a 'discrete speech' rather than the
>>current orientation to 'continuous speech -- but that discussion goes
>>far OT]
>>
>>That said, I read in comp.speech.users and comp.speech.research of what
>>my gut says are excessively tight constraints on the acoustic
>>environment [especially normal office noise and careful mike placement].
>>
>>I envision an external signal conditioning module containing
>>5-8 band pass linear phase with 5 < Q < 20
>>2-10 band pass linear phase with 20 < Q < 200
>> [all above outputs in time sync with a latency of < .1 second]
>>{ How I combine this to get off chip is up in the air.]
>>
>>The above is based on some almost *TOTALLY UNTESTED* ideas of what I
>>really want to accomplish. I was thinking the parallel nature of what I
>>envisioned lent it to a FPGA approach.
>
>
> Given that what you really want, but did not state, is an evrironment
> for exploring your ideas, I would recommend that you focus on your
> algorithms by writing software on a PC.
I'm already doing almost that. I'm using Scilab to explore various
pieces of "the problem" [N.B. not "the solution" ;]
While looking at "the problem", I started wondering if a massively
parallel solution would be worth exploring.
/start side bar
I think in different paths than most modern engineers
[won't say what family thinks ;]
Jerry's old enough to pick up on inferences of following
1. My father
a. helped build and *LEGALLY* operate a spark gap transmitter
b. perused a BS(ME) as it had more EE than a BS(EE) at that time
c. published article(s?) on "magnetic amplifiers"
2. I
a. remember chopper stabilized op amps
b. have used op amps which required >= 1 inch of 19" rack
[ how many youngsters even know what a 19" rack is ]
c. read of/on/about *ANALOG COMPUTERS* while still used ;)
/temporary end side bar :)
I suspect that the key to my thought pattern is 2.c.
YEPP
want 'digital' implementation of *ANALOG COMPUTER*
How many analog simulation blocks can I stuff in a FPGA?
Jerry, you program/use FORTH, do you get an idea of where I wish to
attempt and possibly to go?
> The decision of whether to use
> an FPGA or a DSP is an implementation decision. Given that your
> signals are audio frequency, even lower that the full range of human
> hearing most likely, I would say you can do anything you can think of
> on a DSP. There are low power, fixed point devices that can process
> 100's of millions instructions per second. There are larger, more
> power hungry units that can process up to a billion or more useful
> instructions per second. So it is unlikely that you can create an
> algorithm that requires more processing than this from an 8 kHz sampled
> audio stream.
Initial sampling will be at 44 kHz or greater.
My "gut" says that this is part of problem with modern speech
recognition systems -- they "ignore"/"throw away" *too* much data.
The output of my 'black box' would be "sampled" as speech recognizer
_PRESUMES_.
>
> That said, there is also the development effort to consider. Typically
> before you even begin working with DSPs the algorithm is fully explored
> and tested on a PC in C or other environment of your choice such as
> Matlab. Once you have a fully functional model, then you can be
> concerned with the implementation. This goes double for FPGAs because
> they are not nearly as flexible to significant archetecture changes.
> Sure, you can change your code (VHDL or verilog), but typically the
> code is woven tighter and changes can impact the code a lot more.
Now you come up against another aberrant aspect of how I think.
I often drive experts nuts.
I ask a question they -- *THROUGH EXPERIENCE* -- 'recognize' as ill
defined. They try to *FORCE* me into asking what they consider a "well
defined" question.
_BUT_ I *did not* want to ask their "well defined" question.
I WANTED TO ASK *my* QUESTION ;)
In this case, due to possibility of parallelisms, FPGA's came to mind.
So my question is much more about
"How much of this would fit on what size FPGA?"
rather than
"Would a sane person use a DSP or FPGA for this application?"
>
> The main difference between the two is that the DSP has just one or two
> ALUs to do the MAC operation (which is the most often used function in
> DSP). It runs at a high speed and, just like in a PC, it switches
> between functions to do things serially, but appear to be in parallel.
> The FPGA is capable of actually doing things in parallel. With very
> high speed sample rates this can be important since there is not time
> to switch a DSP between different tasks. But at audio rates there is
> normally tons of time for the DSP to do many different processing on
> the data before the next sample or batch of samples come in.
>
> So in summary, it would be prudent to simulate your algorithm on a PC
> to get the idea fleshed out and working. Then you can decide if you
> want to implement on a floating point DSP, a fixed point DSP or an
> FPGA. Then you get all the fun of actually doing the real work.
>
Reply by Jerry Avins●March 25, 20062006-03-25
rickman wrote:
> Richard Owlett wrote:
>
>>I have a interest in speech recognition. I haven't purchased anything
>>because users *and* VARs have told be I probably wouldn't be satisfied.
...
> Given that what you really want, but did not state, is an evrironment
> for exploring your ideas, I would recommend that you focus on your
> algorithms by writing software on a PC.
...
That and the rest of what you wrote makes good sense. (I suppose that's
why most of us work that way.) I'm glad you wrote it. I was only
beginning to formulate it, and you said it better than I would have.
Jerry
--
Engineering is the art of making what you want from things you can get.
�����������������������������������������������������������������������
Reply by rickman●March 25, 20062006-03-25
Richard Owlett wrote:
> I have a interest in speech recognition. I haven't purchased anything
> because users *and* VARs have told be I probably wouldn't be satisfied.
> [My desires would be better met by a 'discrete speech' rather than the
> current orientation to 'continuous speech -- but that discussion goes
> far OT]
>
> That said, I read in comp.speech.users and comp.speech.research of what
> my gut says are excessively tight constraints on the acoustic
> environment [especially normal office noise and careful mike placement].
>
> I envision an external signal conditioning module containing
> 5-8 band pass linear phase with 5 < Q < 20
> 2-10 band pass linear phase with 20 < Q < 200
> [all above outputs in time sync with a latency of < .1 second]
> { How I combine this to get off chip is up in the air.]
>
> The above is based on some almost *TOTALLY UNTESTED* ideas of what I
> really want to accomplish. I was thinking the parallel nature of what I
> envisioned lent it to a FPGA approach.
Given that what you really want, but did not state, is an evrironment
for exploring your ideas, I would recommend that you focus on your
algorithms by writing software on a PC. The decision of whether to use
an FPGA or a DSP is an implementation decision. Given that your
signals are audio frequency, even lower that the full range of human
hearing most likely, I would say you can do anything you can think of
on a DSP. There are low power, fixed point devices that can process
100's of millions instructions per second. There are larger, more
power hungry units that can process up to a billion or more useful
instructions per second. So it is unlikely that you can create an
algorithm that requires more processing than this from an 8 kHz sampled
audio stream.
That said, there is also the development effort to consider. Typically
before you even begin working with DSPs the algorithm is fully explored
and tested on a PC in C or other environment of your choice such as
Matlab. Once you have a fully functional model, then you can be
concerned with the implementation. This goes double for FPGAs because
they are not nearly as flexible to significant archetecture changes.
Sure, you can change your code (VHDL or verilog), but typically the
code is woven tighter and changes can impact the code a lot more.
The main difference between the two is that the DSP has just one or two
ALUs to do the MAC operation (which is the most often used function in
DSP). It runs at a high speed and, just like in a PC, it switches
between functions to do things serially, but appear to be in parallel.
The FPGA is capable of actually doing things in parallel. With very
high speed sample rates this can be important since there is not time
to switch a DSP between different tasks. But at audio rates there is
normally tons of time for the DSP to do many different processing on
the data before the next sample or batch of samples come in.
So in summary, it would be prudent to simulate your algorithm on a PC
to get the idea fleshed out and working. Then you can decide if you
want to implement on a floating point DSP, a fixed point DSP or an
FPGA. Then you get all the fun of actually doing the real work.
Reply by Jerry Avins●March 24, 20062006-03-24
Richard Owlett wrote:
> Jerry Avins wrote:
>
>> Richard Owlett wrote:
>>
>>> Andor wrote:
>>>
>>>> Jerry Avins wrote:
>>>>
>>>>> Richard Owlett wrote:
>>>>>
>>>>> ...
>>>>>
>>>>>
>>>>>> I envision an external signal conditioning module containing
>>>>>> 5-8 band pass linear phase with 5 < Q < 20
>>>>>> 2-10 band pass linear phase with 20 < Q < 200
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> I'm left wondering how "Q" and "linear phase" are implemented
>>>>> together.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> For FIR (and some IIR) systems, magnitude and phase response can be
>>>> specified independently. I don't see any problems in implementing those
>>>> specs.
>>>>
>>>
>>> My recollection of a definition of Q was simply ratio of (center
>>> frequency) to (bandwidth). I'm implicitly leaving flatness in the
>>> passband loosely (if at all) constrained.
>>>
>>> Also I thought that FIR filters had intrinsically constant group
>>> delay -- which is what's needed. Should I've not said "linear phase",
>>> thought they were implicitly equivalent?
>>
>>
>>
>> I suspected that you meant bandwidth. Q has other implications that I
>> imagine you don't mean.
>>
>> Jerry
>
>
> What other implications does Q have?
> Q was a natural fit as, at least for the Q < 20 an series R-L-C , would
> have a magnitude response appropriate to my ideas. [Would it have
> constant group delay though?]
Your RLC with a Q of 20 won't have a flat top. A good IF passband may
have the same 3-dB points as as a tank with a Q of to, but it will be a
lot flatter in the passband and have much better skirt selectivity. The
tank will have leading phase above midband and lagging below. The IF
filter will better approximate a constant group delay.
Jerry
--
Engineering is the art of making what you want from things you can get.
�����������������������������������������������������������������������
Reply by Richard Owlett●March 24, 20062006-03-24
Jerry Avins wrote:
> Richard Owlett wrote:
>
>> Andor wrote:
>>
>>> Jerry Avins wrote:
>>>
>>>> Richard Owlett wrote:
>>>>
>>>> ...
>>>>
>>>>
>>>>> I envision an external signal conditioning module containing
>>>>> 5-8 band pass linear phase with 5 < Q < 20
>>>>> 2-10 band pass linear phase with 20 < Q < 200
>>>>
>>>>
>>>>
>>>> I'm left wondering how "Q" and "linear phase" are implemented together.
>>>
>>>
>>>
>>>
>>> For FIR (and some IIR) systems, magnitude and phase response can be
>>> specified independently. I don't see any problems in implementing those
>>> specs.
>>>
>>
>> My recollection of a definition of Q was simply ratio of (center
>> frequency) to (bandwidth). I'm implicitly leaving flatness in the
>> passband loosely (if at all) constrained.
>>
>> Also I thought that FIR filters had intrinsically constant group delay
>> -- which is what's needed. Should I've not said "linear phase",
>> thought they were implicitly equivalent?
>
>
> I suspected that you meant bandwidth. Q has other implications that I
> imagine you don't mean.
>
> Jerry
What other implications does Q have?
Q was a natural fit as, at least for the Q < 20 an series R-L-C , would
have a magnitude response appropriate to my ideas. [Would it have
constant group delay though?]