comp.dsp | As "Nyquist" is to "sample rate" "????" is to "sample period/duration/width/?" ?

I'm interested in speech signals as input to speech recognition software.

I get the impression that minimum acceptable sample rates begin at 8 kHz 
( or above ). I assume this is based on which formants are considered 
"significant". I have somewhat arbitrally chosen 44.1 kHz. The data I 
have available is a studio quality CD.

 From another thread, I assume that some characteristic time of a 
phoneme is somewhere between .01 and .1 seconds (+- xx %).

Assuming whatever analysis I do is based on samples of width mm seconds 
taken every nn seconds ( nn presumed < mm ) what are appropriate values 
from a DSP point of view.

[ For perspective see my previous thread titled 'Low freq "analog" of 
Nyquist? ( possibly naive question )' . I'm hoping I've learned enough 
to better phrase my question ]

My ultimate goal is to reduce dependence of speech recognition's 
accuracy on "good mikes" and "good acoustic environment'. Primarily the 
later.

[ for those of you old enough, "this ram keeps butting the dam" ]

Reply by Tim Wescott ●September 19, 20042004-09-19

Richard Owlett wrote:

> I'm interested in speech signals as input to speech recognition software.
> 
> I get the impression that minimum acceptable sample rates begin at 8 kHz 
> ( or above ). I assume this is based on which formants are considered 
> "significant". I have somewhat arbitrally chosen 44.1 kHz. The data I 
> have available is a studio quality CD.
> 
>  From another thread, I assume that some characteristic time of a 
> phoneme is somewhere between .01 and .1 seconds (+- xx %).
> 
> Assuming whatever analysis I do is based on samples of width mm seconds 
> taken every nn seconds ( nn presumed < mm ) what are appropriate values 
> from a DSP point of view.
> 
> [ For perspective see my previous thread titled 'Low freq "analog" of 
> Nyquist? ( possibly naive question )' . I'm hoping I've learned enough 
> to better phrase my question ]
> 
> My ultimate goal is to reduce dependence of speech recognition's 
> accuracy on "good mikes" and "good acoustic environment'. Primarily the 
> later.
> 
> [ for those of you old enough, "this ram keeps butting the dam" ]

As "Nyquist" is to "sample rate", "frequency resolution" is to "sample 
set duration".

-- 

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

Reply by Rick Lyons ●September 20, 20042004-09-20

On Sun, 19 Sep 2004 14:53:26 -0500, Richard Owlett
<rowlett@atlascomm.net> wrote:

>I'm interested in speech signals as input to speech recognition software.
>
>I get the impression that minimum acceptable sample rates begin at 8 kHz 
>( or above ). I assume this is based on which formants are considered 
>"significant". I have somewhat arbitrally chosen 44.1 kHz. The data I 
>have available is a studio quality CD.
>
> From another thread, I assume that some characteristic time of a 
>phoneme is somewhere between .01 and .1 seconds (+- xx %).
>
>Assuming whatever analysis I do is based on samples of width mm seconds 
>taken every nn seconds ( nn presumed < mm ) what are appropriate values 
>from a DSP point of view.
>
>[ For perspective see my previous thread titled 'Low freq "analog" of 
>Nyquist? ( possibly naive question )' . I'm hoping I've learned enough 
>to better phrase my question ]
>
>My ultimate goal is to reduce dependence of speech recognition's 
>accuracy on "good mikes" and "good acoustic environment'. Primarily the 
>later.
>
>[ for those of you old enough, "this ram keeps butting the dam" ]

Hi,

I'm responding to the Subject text; 
that is I'm responding to the words:
"Nyquist" is to "sample rate".

Please know that I have no clue whatsoever as 
to the meaning of that single word "Nyquist".
However, I do have a rough notion of the 
meaning of the two words "sample rate".

  I'm not in the audio business, but here's what 
I've heard.  In telephones, the microphone signal 
is filtered so its frequency bandwidth is just 
less than 4 kHz.  Then that analog signal is 
digitized at a sample rate of 8 kHz, which 
satisfies the "Nyquist Criteron".

Prepare for rant: I don't think people should 
use the phrase "Nyquist frequency".  That 
phrase means different things to different people,
and this leads to confusion.  I think we 
should use the phrase "sample rate" when we mean 
the "sample rate" and we should use the phrase "half 
the sample rate" when we mean "half the sample rate"
Simple!!

Back to sampling human speech: As it turns out,
for good fidelity a human voice signal should 
have a wider bandwidth than 4 kHz.  But to reduce the 
cost of telephone systems (so they can process as many 
simultaneous speech signals as possible) early 
telephone designers realized that you could limit a
human speech signal to a bandwidth as low as 
(roughly) 4 kHz and people (their brains) could 
still understand the speech signal.

Audio fanatics know that human hearing goes up to 
(roughly) 18-20 kHz, so they want their systems 
to cover that full frequency range in their
"high-fidelity" audio systems.  Well, if you have 
an analog signal whose bandwidth is 20 kHz, then your 
A/D sample rate must be greater than twice that 
frequency (Nyquist Criterion, again) which leads 
to the "studio quality" sample rate of 44.1 kHz.

Sorry I can't be of more help.  I wouldn't know 
a "formant", or a "phoneme", if I found one dead 
in my lunchbox.

[-Rick-]

Reply by Jon Harris ●September 20, 20042004-09-20

"Rick Lyons" <r.lyons@_BOGUS_ieee.org> wrote in message
news:414ed4b6.394648093@news.sf.sbcglobal.net...
> On Sun, 19 Sep 2004 14:53:26 -0500, Richard Owlett
> <rowlett@atlascomm.net> wrote:
>
> Back to sampling human speech: As it turns out,
> for good fidelity a human voice signal should
> have a wider bandwidth than 4 kHz.  But to reduce the
> cost of telephone systems (so they can process as many
> simultaneous speech signals as possible) early
> telephone designers realized that you could limit a
> human speech signal to a bandwidth as low as
> (roughly) 4 kHz and people (their brains) could
> still understand the speech signal.

Right.  On the phone, it is generally quite easy to understand normal
conversation speech even with the limited frequency response.  However, if
someone tries to read a string of random letters, it is quite a bit more
difficult to understand them on the other end.  Losing those high frequencies
makes consonants difficult to differentiate.  The brain normally does a good job
of compensating for the loss of high frequencies by using context clues.  But
since very few context clues exist with a string of random letters, it becomes
difficult to understand.

So saying that a 4 kHz bandwidth is adequate for speech is a bit misleading.
Consonant sounds have some frequency content up to close to 20kHz, though there
is limited benefit to increasing to anything more than 10kHz IMO.

Reply by Martin Blume ●September 20, 20042004-09-20

"Richard Owlett" schrieb
> As "Nyquist" is to "sample rate"
>  "????" is to "sample period/duration/width/?"  ?

As "sample rate" is  "1 / (sample period)"
"sample period" is to "1/Nyquist"

This may be answer to your question, but not of much help.

I think you are mixing up two domains here: the one of
strict mathematics and signal processing and the other one
- much fuzzier - about the human perception of hearing and
the generation of speech. While human hearing is obviously
based on the same mathematics and physics of acoustics, there
are many tricks that evolution has come up with.
You might want to check the "Scientist's and Engineer's Guide
to Digital Signal Processing":
http://www.analog.com/processors/resources/technicalLibrary/manuals/
training/materials/pdf/dsp_book_frontmat.pdf
especially chapter 22, "Audio Processing".

HTH
Martin

Reply by Rick Lyons ●September 21, 20042004-09-21

On Mon, 20 Sep 2004 10:30:42 -0700, "Jon Harris"
<goldentully@hotmail.com> wrote:

>"Rick Lyons" <r.lyons@_BOGUS_ieee.org> wrote in message
>news:414ed4b6.394648093@news.sf.sbcglobal.net...
>> On Sun, 19 Sep 2004 14:53:26 -0500, Richard Owlett
>> <rowlett@atlascomm.net> wrote:
>>
>> Back to sampling human speech: As it turns out,
>> for good fidelity a human voice signal should
>> have a wider bandwidth than 4 kHz.  But to reduce the
>> cost of telephone systems (so they can process as many
>> simultaneous speech signals as possible) early
>> telephone designers realized that you could limit a
>> human speech signal to a bandwidth as low as
>> (roughly) 4 kHz and people (their brains) could
>> still understand the speech signal.
>
>Right.  On the phone, it is generally quite easy to understand normal
>conversation speech even with the limited frequency response.  However, if
>someone tries to read a string of random letters, it is quite a bit more
>difficult to understand them on the other end.  Losing those high frequencies
>makes consonants difficult to differentiate.  The brain normally does a good job
>of compensating for the loss of high frequencies by using context clues.  But
>since very few context clues exist with a string of random letters, it becomes
>difficult to understand.
>
>So saying that a 4 kHz bandwidth is adequate for speech is a bit misleading.
>Consonant sounds have some frequency content up to close to 20kHz, though there
>is limited benefit to increasing to anything more than 10kHz IMO.

Yes yes.  You're right!  I hadn't thought 
about the consonants.  

That's why, over the phone to say "FFT", 
we'd say "foxtrot" "foxtrot" "tango".

[-Rick-]

Reply by Jim Thomas ●September 21, 20042004-09-21

Rick Lyons wrote:
> That's why, over the phone to say "FFT", 
> we'd say "foxtrot" "foxtrot" "tango".
> 

When I was at Raytheon, we had an operator/receptionist who made up her own 
phonetic alphabet.  She used it to announce license plate numbers when a driver 
forgot to turn off the headlights.

She generally made up her phonetic alphabet on the spot as needed.  My favorite 
was "F as in Fun.  L as in Love.  And N as in... NEVER!"  She sounded a lot like 
Aretha Franklin in the Blues Brothers.

One day she paged a license plate by saying "Y as in You."  That threw everyone 
for a loop, because we all heard it as "Y as in U."

She inspired my coworkers and I to formualate a phonetic alphabet whose purpose 
was to obfuscate rather than clarify.  We favored the names of letters, 
homophones that start with different letters (gnu, knew, new), names that didn't 
add information (T as in tea), or words that sound like they start with a 
different letter than they really do.

A as in aye
B as in bdellium
C as in cue
D as in Djibouti
E as in eye
F as in Fun (a nod to our operator)
G as in gnu
H as in hour
I as in inn
J as in jalapeno
K as in knew
L as in llama
M as in Mneumonic
N as in new
O as in ofal
P as in pea
Q as in Quay
R as in ... never found a good one for R
S as in sea
T as in tea
U as in ... oops.  forgot that one
V as in vee
W as in why
Y as in you
Z as in zee (or zed)

-- 
Jim Thomas            Principal Applications Engineer  Bittware, Inc
jthomas@bittware.com  http://www.bittware.com    (603) 226-0404 x536
Nothing is ever so bad that it can't get worse. - Calvin

Reply by Eric Jacobsen ●September 21, 20042004-09-21

Cute.   I've done similar things, and I like that you overloaded the
"new" and "eye" sounds, which completely defeats the purpose of a
phonetic alphabet.  ;)

Overloading similar sounds works, too, like B = boy and T = toy.   A
low SNR connection creates ambiguities.   So I used to work on rhyming
phonetic alphabets that were similarly useless.

I think you cheated on V and Z, though.


On Tue, 21 Sep 2004 09:32:36 -0400, Jim Thomas <jthomas@bittware.com>
wrote:

>Rick Lyons wrote:
>> That's why, over the phone to say "FFT", 
>> we'd say "foxtrot" "foxtrot" "tango".
>> 
>
>When I was at Raytheon, we had an operator/receptionist who made up her own 
>phonetic alphabet.  She used it to announce license plate numbers when a driver 
>forgot to turn off the headlights.
>
>She generally made up her phonetic alphabet on the spot as needed.  My favorite 
>was "F as in Fun.  L as in Love.  And N as in... NEVER!"  She sounded a lot like 
>Aretha Franklin in the Blues Brothers.
>
>One day she paged a license plate by saying "Y as in You."  That threw everyone 
>for a loop, because we all heard it as "Y as in U."
>
>She inspired my coworkers and I to formualate a phonetic alphabet whose purpose 
>was to obfuscate rather than clarify.  We favored the names of letters, 
>homophones that start with different letters (gnu, knew, new), names that didn't 
>add information (T as in tea), or words that sound like they start with a 
>different letter than they really do.
>
>A as in aye
>B as in bdellium
>C as in cue
>D as in Djibouti
>E as in eye
>F as in Fun (a nod to our operator)
>G as in gnu
>H as in hour
>I as in inn
>J as in jalapeno
>K as in knew
>L as in llama
>M as in Mneumonic
>N as in new
>O as in ofal
>P as in pea
>Q as in Quay
>R as in ... never found a good one for R
>S as in sea
>T as in tea
>U as in ... oops.  forgot that one
>V as in vee
>W as in why
>Y as in you
>Z as in zee (or zed)
>
>-- 
>Jim Thomas            Principal Applications Engineer  Bittware, Inc
>jthomas@bittware.com  http://www.bittware.com    (603) 226-0404 x536
>Nothing is ever so bad that it can't get worse. - Calvin

Eric Jacobsen
Minister of Algorithms, Intel Corp.
My opinions may not be Intel's opinions.
http://www.ericjacobsen.org

Reply by Jon Harris ●September 21, 20042004-09-21

One time, I overhead someone spelling something over the phone saying "C as at
cat, M as in mat, and B as in bat".  I got a good chuckle out of that, as did
they when I explained how the phonetics chosen didn't really help much!  :-)

"Eric Jacobsen" <eric.jacobsen@ieee.org> wrote in message
news:415043c5.502019890@news.west.cox.net...
> Cute.   I've done similar things, and I like that you overloaded the
> "new" and "eye" sounds, which completely defeats the purpose of a
> phonetic alphabet.  ;)
>
> Overloading similar sounds works, too, like B = boy and T = toy.   A
> low SNR connection creates ambiguities.   So I used to work on rhyming
> phonetic alphabets that were similarly useless.

Reply by Jon Harris ●September 21, 20042004-09-21

"Rick Lyons" <r.lyons@_BOGUS_ieee.org> wrote in message
news:414ffd9c.470654359@news.sf.sbcglobal.net...
> On Mon, 20 Sep 2004 10:30:42 -0700, "Jon Harris"
> <goldentully@hotmail.com> wrote:
>
> >"Rick Lyons" <r.lyons@_BOGUS_ieee.org> wrote in message
> >news:414ed4b6.394648093@news.sf.sbcglobal.net...
> >> On Sun, 19 Sep 2004 14:53:26 -0500, Richard Owlett
> >> <rowlett@atlascomm.net> wrote:
> >>
> >> Back to sampling human speech: As it turns out,
> >> for good fidelity a human voice signal should
> >> have a wider bandwidth than 4 kHz.  But to reduce the
> >> cost of telephone systems (so they can process as many
> >> simultaneous speech signals as possible) early
> >> telephone designers realized that you could limit a
> >> human speech signal to a bandwidth as low as
> >> (roughly) 4 kHz and people (their brains) could
> >> still understand the speech signal.
> >
> >Right.  On the phone, it is generally quite easy to understand normal
> >conversation speech even with the limited frequency response.  However, if
> >someone tries to read a string of random letters, it is quite a bit more
> >difficult to understand them on the other end.  Losing those high frequencies
> >makes consonants difficult to differentiate.  The brain normally does a good
job
> >of compensating for the loss of high frequencies by using context clues.  But
> >since very few context clues exist with a string of random letters, it
becomes
> >difficult to understand.
> >
> >So saying that a 4 kHz bandwidth is adequate for speech is a bit misleading.
> >Consonant sounds have some frequency content up to close to 20kHz, though
there
> >is limited benefit to increasing to anything more than 10kHz IMO.
>
> Yes yes.  You're right!  I hadn't thought
> about the consonants.
>
> That's why, over the phone to say "FFT",
> we'd say "foxtrot" "foxtrot" "tango".

Exactly!  The military phonetic alphabet is designed to minimize ambiguity with
a poor quality communication link (unlike the fun ones we've been posting here).

Previous12 Next

As "Nyquist" is to "sample rate" "????" is to "sample period/duration/width/?" ?

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group