comp.dsp | G.729 with different sampling rates?

I'm trying to understand G.729. The only compression algorithm I've
coded before is ADPCM, and it wasn't keyed to a certain sampling rate
- that is, it would be just as happy with 8100 samples per second or
7900 samples per second as it would have been with 8000, just the
quality would be slightly higher or lower.

With G.729, all the documentation refers to a sampling rate of 8 KHz.
Is it really the only rate that makes sense? That is, if my real
sampling rate is slightly higher or lower (but constant, and same for
the encoder and the decoder) can't I just feed the samples to the
algorithm a little faster or a little slower? Is there really
something special about 8000 samples / sec?

Reply by Jerry Avins ●October 5, 20042004-10-05

Jack wrote:

> I'm trying to understand G.729. The only compression algorithm I've
> coded before is ADPCM, and it wasn't keyed to a certain sampling rate
> - that is, it would be just as happy with 8100 samples per second or
> 7900 samples per second as it would have been with 8000, just the
> quality would be slightly higher or lower.
> 
> With G.729, all the documentation refers to a sampling rate of 8 KHz.
> Is it really the only rate that makes sense? That is, if my real
> sampling rate is slightly higher or lower (but constant, and same for
> the encoder and the decoder) can't I just feed the samples to the
> algorithm a little faster or a little slower? Is there really
> something special about 8000 samples / sec?

It's what's used in digital telephony. That's what the standard covers.

Jerry
-- 
... they proceeded on the sound principle that the magnitude of a lie
always contains a certain factor of credibility, ... and that therefor
... they more easily fall victim to a big lie than to a little one ...
                                                                   A. H.
&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;

Reply by James Salsman ●October 5, 20042004-10-05

> Is there really something special about 8000 samples / sec?

Some dolt in Bell Labs during the 1920s decreed that voice transmission
requires a frequency response from 250 Hz to only 3000 Hz.  Even though
Harry Nyquist rounded it up to 4000 Hz to be on the safe side (and
because we all like round numbers) around 1938, we're all still stuck
saying things like "S as in Sam" and "F as in Frank" over modern
telephones because apparently nobody actually bothered to check the
frequency spectrum of actual speech.

So, sure, "special," as in, "special education."

Sincerely,
James Salsman
-- 
www.readsay.com - maker of the ReadSay PROnounce English literacy system
currently $359; soon $499 because of distributor margin requirements
      http://www.readsay.com/PROnounce.html

Reply by Steve Underwood ●October 5, 20042004-10-05

James Salsman wrote:

>> Is there really something special about 8000 samples / sec?
>
>
> Some dolt in Bell Labs during the 1920s decreed that voice transmission
> requires a frequency response from 250 Hz to only 3000 Hz.  Even though
> Harry Nyquist rounded it up to 4000 Hz to be on the safe side (and
> because we all like round numbers) around 1938, we're all still stuck
> saying things like "S as in Sam" and "F as in Frank" over modern
> telephones because apparently nobody actually bothered to check the
> frequency spectrum of actual speech.
>
> So, sure, "special," as in, "special education."

I don't think that is entirely fair. For most of the life of the 
telephone network, using twice the bandwidth would have incurred 
significant additional cost. Considering how few words in a typical 
conversation cause the problem you describe, I think the compromise they 
chose was none too bad.

What was dumber, was the half-hearted effort to improve things in the 
early days of ISDN. The addition of a 7.1kHz bandwidth audio mode was 
handled so poorly it never caught on at all.

With modern speech compression, wider bandwidth need have little impact 
on the bit rate. However, most of  the newer codecs, like G.729, still 
only provide for an 8kHz sampled audio world. The latest 3GPP codecs do, 
however, provide wideband modes, so maybe phone speech clarity will 
improve in the next few years.

Regards,
Steve

Reply by Raymond Toy ●October 5, 20042004-10-05

>>>>> "Jack" == Jack  <jack8051@lightspawn.removethisbit.org> writes:

    Jack> I'm trying to understand G.729. The only compression algorithm I've
    Jack> coded before is ADPCM, and it wasn't keyed to a certain sampling rate
    Jack> - that is, it would be just as happy with 8100 samples per second or
    Jack> 7900 samples per second as it would have been with 8000, just the
    Jack> quality would be slightly higher or lower.

    Jack> With G.729, all the documentation refers to a sampling rate of 8 KHz.
    Jack> Is it really the only rate that makes sense? That is, if my real
    Jack> sampling rate is slightly higher or lower (but constant, and same for
    Jack> the encoder and the decoder) can't I just feed the samples to the
    Jack> algorithm a little faster or a little slower? Is there really
    Jack> something special about 8000 samples / sec?

You could, probably, vary the sample rate some and it would sound ok.
But G.729 tries to model the speech signal so if things happen faster
or slower than expected, the model may no longer be as accurate.  For
example, if you sampled at 16000 samples, the pitch period would now
be twice the number of samples.  This might confuse G.729.

I don't know the fine details of G.729, so I might be wrong.

Ray

Reply by Jack ●October 5, 20042004-10-05

>It's what's used in digital telephony. That's what the standard covers.
>
>Jerry

I realize that if I change the sampling rate I won't strictly be
adhering to the standard any more. But when I write both the encoder
and the decoder, it's doesn't seem like such an issue (at least in my
case). If I increase the rate from 8 KHz (to, say, 8.1 or 8.2) will it
sound at least as good as the standard? Or is the algorithm somehow
"optimized" for that sampling rate so that it actually sounds worse at
a slightly higher rate?

Reply by Jerry Avins ●October 5, 20042004-10-05

Jack wrote:
>>It's what's used in digital telephony. That's what the standard covers.
>>
>>Jerry
> 
> 
> I realize that if I change the sampling rate I won't strictly be
> adhering to the standard any more. But when I write both the encoder
> and the decoder, it's doesn't seem like such an issue (at least in my
> case). If I increase the rate from 8 KHz (to, say, 8.1 or 8.2) will it
> sound at least as good as the standard? Or is the algorithm somehow
> "optimized" for that sampling rate so that it actually sounds worse at
> a slightly higher rate?

I don't really know, but it doesn't seem likely.

Jerry
-- 
... they proceeded on the sound principle that the magnitude of a lie
always contains a certain factor of credibility, ... and that therefor
... they more easily fall victim to a big lie than to a little one ...
                                                                   A. H.
&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;

Reply by Jon Harris ●October 5, 20042004-10-05

"Steve Underwood" <steveu@dis.org> wrote in message
news:cjtmsk$cp3$1@home.itg.ti.com...
> James Salsman wrote:
>
> >> Is there really something special about 8000 samples / sec?
> >
> >
> > Some dolt in Bell Labs during the 1920s decreed that voice transmission
> > requires a frequency response from 250 Hz to only 3000 Hz.  Even though
> > Harry Nyquist rounded it up to 4000 Hz to be on the safe side (and
> > because we all like round numbers) around 1938, we're all still stuck
> > saying things like "S as in Sam" and "F as in Frank" over modern
> > telephones because apparently nobody actually bothered to check the
> > frequency spectrum of actual speech.
>
> I don't think that is entirely fair. For most of the life of the
> telephone network, using twice the bandwidth would have incurred
> significant additional cost. Considering how few words in a typical
> conversation cause the problem you describe, I think the compromise they
> chose was none too bad.

From a recent thread:

On the phone, it is generally quite easy to understand normal
conversation speech even with the limited frequency response.  However, if
someone tries to read a string of random letters, it is quite a bit more
difficult to understand them on the other end.  Losing those high frequencies
makes consonants difficult to differentiate.  The brain normally does a good job
of compensating for the loss of high frequencies by using context clues.  But
since very few context clues exist with a string of random letters, it becomes
difficult to understand.

So the phone is generally quite adequate for its primary intended
application--communicating normal conversational speech.  However, it is
certainly not a perfect medium and doesn't do as well in other applications.

I don't think the original researches were so "dumb" as to not realize that
speech had higher frequency components.  Given the technology limits of the
time, they chose the sample rate that allowed for "intelligible" speech (not
perfect speech) at a reasonable cost.  In other words, the criteria for choosing
the frequency response was "what is the minimum frequency response that is still
intelligible in normal speech" vs. "what is the minimum frequency response for
full fidelity speech".  That's what engineering is all about--trade-offs!

Reply by Jerry Avins ●October 5, 20042004-10-05

Jon Harris wrote:

   ...

> I don't think the original researches were so "dumb" as to not realize that
> speech had higher frequency components.  Given the technology limits of the
> time, they chose the sample rate that allowed for "intelligible" speech (not
> perfect speech) at a reasonable cost.  In other words, the criteria for choosing
> the frequency response was "what is the minimum frequency response that is still
> intelligible in normal speech" vs. "what is the minimum frequency response for
> full fidelity speech".  That's what engineering is all about--trade-offs!

Originally, there was no sample rate involved. Hybrids had to be
terminated with dummy lines that closely matched the real line impedance
over the bandwidth of intended use. Ear pieces and carbon microphones
had to cover the band. In all respects, bandwidth cost money. This is
also easy to see with analog frequency-division multiplexing. The actual
guaranteed analog high frequency was 3600 Hz, if I remember correctly,
but actual response was usually better starting around 1950. The 8 KHz
sample rate was adequate to preserve the quality of the analog service.

Jerry
-- 
... they proceeded on the sound principle that the magnitude of a lie
always contains a certain factor of credibility, ... and that therefor
... they more easily fall victim to a big lie than to a little one ...
                                                                   A. H.
&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;

Reply by Jon Harris ●October 5, 20042004-10-05

"Jerry Avins" <jya@ieee.org> wrote in message
news:cjuqqq$jkq$1@bob.news.rcn.net...
> Jon Harris wrote:
>
> > I don't think the original researches were so "dumb" as to not realize that
> > speech had higher frequency components.  Given the technology limits of the
> > time, they chose the sample rate that allowed for "intelligible" speech (not
> > perfect speech) at a reasonable cost.  In other words, the criteria for
choosing
> > the frequency response was "what is the minimum frequency response that is
still
> > intelligible in normal speech" vs. "what is the minimum frequency response
for
> > full fidelity speech".  That's what engineering is all about--trade-offs!
>
> Originally, there was no sample rate involved. Hybrids had to be
> terminated with dummy lines that closely matched the real line impedance
> over the bandwidth of intended use. Ear pieces and carbon microphones
> had to cover the band. In all respects, bandwidth cost money. This is
> also easy to see with analog frequency-division multiplexing. The actual
> guaranteed analog high frequency was 3600 Hz, if I remember correctly,
> but actual response was usually better starting around 1950. The 8 KHz
> sample rate was adequate to preserve the quality of the analog service.

Thanks for the historical clarifications, Jerry.

Previous12 Next

G.729 with different sampling rates?

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group