G.729 with different sampling rates?

Started by Jack October 4, 2004
I'm trying to understand G.729. The only compression algorithm I've
coded before is ADPCM, and it wasn't keyed to a certain sampling rate
- that is, it would be just as happy with 8100 samples per second or
7900 samples per second as it would have been with 8000, just the
quality would be slightly higher or lower.

With G.729, all the documentation refers to a sampling rate of 8 KHz.
Is it really the only rate that makes sense? That is, if my real
sampling rate is slightly higher or lower (but constant, and same for
the encoder and the decoder) can't I just feed the samples to the
algorithm a little faster or a little slower? Is there really
something special about 8000 samples / sec?

Jack wrote:

> I'm trying to understand G.729. The only compression algorithm I've > coded before is ADPCM, and it wasn't keyed to a certain sampling rate > - that is, it would be just as happy with 8100 samples per second or > 7900 samples per second as it would have been with 8000, just the > quality would be slightly higher or lower. > > With G.729, all the documentation refers to a sampling rate of 8 KHz. > Is it really the only rate that makes sense? That is, if my real > sampling rate is slightly higher or lower (but constant, and same for > the encoder and the decoder) can't I just feed the samples to the > algorithm a little faster or a little slower? Is there really > something special about 8000 samples / sec?
It's what's used in digital telephony. That's what the standard covers. Jerry -- ... they proceeded on the sound principle that the magnitude of a lie always contains a certain factor of credibility, ... and that therefor ... they more easily fall victim to a big lie than to a little one ... A. H. �����������������������������������������������������������������������
> Is there really something special about 8000 samples / sec?
Some dolt in Bell Labs during the 1920s decreed that voice transmission requires a frequency response from 250 Hz to only 3000 Hz. Even though Harry Nyquist rounded it up to 4000 Hz to be on the safe side (and because we all like round numbers) around 1938, we're all still stuck saying things like "S as in Sam" and "F as in Frank" over modern telephones because apparently nobody actually bothered to check the frequency spectrum of actual speech. So, sure, "special," as in, "special education." Sincerely, James Salsman -- www.readsay.com - maker of the ReadSay PROnounce English literacy system currently $359; soon $499 because of distributor margin requirements http://www.readsay.com/PROnounce.html
James Salsman wrote:

>> Is there really something special about 8000 samples / sec? > > > Some dolt in Bell Labs during the 1920s decreed that voice transmission > requires a frequency response from 250 Hz to only 3000 Hz. Even though > Harry Nyquist rounded it up to 4000 Hz to be on the safe side (and > because we all like round numbers) around 1938, we're all still stuck > saying things like "S as in Sam" and "F as in Frank" over modern > telephones because apparently nobody actually bothered to check the > frequency spectrum of actual speech. > > So, sure, "special," as in, "special education."
I don't think that is entirely fair. For most of the life of the telephone network, using twice the bandwidth would have incurred significant additional cost. Considering how few words in a typical conversation cause the problem you describe, I think the compromise they chose was none too bad. What was dumber, was the half-hearted effort to improve things in the early days of ISDN. The addition of a 7.1kHz bandwidth audio mode was handled so poorly it never caught on at all. With modern speech compression, wider bandwidth need have little impact on the bit rate. However, most of the newer codecs, like G.729, still only provide for an 8kHz sampled audio world. The latest 3GPP codecs do, however, provide wideband modes, so maybe phone speech clarity will improve in the next few years. Regards, Steve
>>>>> "Jack" == Jack <jack8051@lightspawn.removethisbit.org> writes:
Jack> I'm trying to understand G.729. The only compression algorithm I've Jack> coded before is ADPCM, and it wasn't keyed to a certain sampling rate Jack> - that is, it would be just as happy with 8100 samples per second or Jack> 7900 samples per second as it would have been with 8000, just the Jack> quality would be slightly higher or lower. Jack> With G.729, all the documentation refers to a sampling rate of 8 KHz. Jack> Is it really the only rate that makes sense? That is, if my real Jack> sampling rate is slightly higher or lower (but constant, and same for Jack> the encoder and the decoder) can't I just feed the samples to the Jack> algorithm a little faster or a little slower? Is there really Jack> something special about 8000 samples / sec? You could, probably, vary the sample rate some and it would sound ok. But G.729 tries to model the speech signal so if things happen faster or slower than expected, the model may no longer be as accurate. For example, if you sampled at 16000 samples, the pitch period would now be twice the number of samples. This might confuse G.729. I don't know the fine details of G.729, so I might be wrong. Ray
>It's what's used in digital telephony. That's what the standard covers. > >Jerry
I realize that if I change the sampling rate I won't strictly be adhering to the standard any more. But when I write both the encoder and the decoder, it's doesn't seem like such an issue (at least in my case). If I increase the rate from 8 KHz (to, say, 8.1 or 8.2) will it sound at least as good as the standard? Or is the algorithm somehow "optimized" for that sampling rate so that it actually sounds worse at a slightly higher rate?
Jack wrote:
>>It's what's used in digital telephony. That's what the standard covers. >> >>Jerry > > > I realize that if I change the sampling rate I won't strictly be > adhering to the standard any more. But when I write both the encoder > and the decoder, it's doesn't seem like such an issue (at least in my > case). If I increase the rate from 8 KHz (to, say, 8.1 or 8.2) will it > sound at least as good as the standard? Or is the algorithm somehow > "optimized" for that sampling rate so that it actually sounds worse at > a slightly higher rate?
I don't really know, but it doesn't seem likely. Jerry -- ... they proceeded on the sound principle that the magnitude of a lie always contains a certain factor of credibility, ... and that therefor ... they more easily fall victim to a big lie than to a little one ... A. H. &#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;
"Steve Underwood" <steveu@dis.org> wrote in message
news:cjtmsk$cp3$1@home.itg.ti.com...
> James Salsman wrote: > > >> Is there really something special about 8000 samples / sec? > > > > > > Some dolt in Bell Labs during the 1920s decreed that voice transmission > > requires a frequency response from 250 Hz to only 3000 Hz. Even though > > Harry Nyquist rounded it up to 4000 Hz to be on the safe side (and > > because we all like round numbers) around 1938, we're all still stuck > > saying things like "S as in Sam" and "F as in Frank" over modern > > telephones because apparently nobody actually bothered to check the > > frequency spectrum of actual speech. > > I don't think that is entirely fair. For most of the life of the > telephone network, using twice the bandwidth would have incurred > significant additional cost. Considering how few words in a typical > conversation cause the problem you describe, I think the compromise they > chose was none too bad.
From a recent thread: On the phone, it is generally quite easy to understand normal conversation speech even with the limited frequency response. However, if someone tries to read a string of random letters, it is quite a bit more difficult to understand them on the other end. Losing those high frequencies makes consonants difficult to differentiate. The brain normally does a good job of compensating for the loss of high frequencies by using context clues. But since very few context clues exist with a string of random letters, it becomes difficult to understand. So the phone is generally quite adequate for its primary intended application--communicating normal conversational speech. However, it is certainly not a perfect medium and doesn't do as well in other applications. I don't think the original researches were so "dumb" as to not realize that speech had higher frequency components. Given the technology limits of the time, they chose the sample rate that allowed for "intelligible" speech (not perfect speech) at a reasonable cost. In other words, the criteria for choosing the frequency response was "what is the minimum frequency response that is still intelligible in normal speech" vs. "what is the minimum frequency response for full fidelity speech". That's what engineering is all about--trade-offs!
Jon Harris wrote:

   ...

> I don't think the original researches were so "dumb" as to not realize that > speech had higher frequency components. Given the technology limits of the > time, they chose the sample rate that allowed for "intelligible" speech (not > perfect speech) at a reasonable cost. In other words, the criteria for choosing > the frequency response was "what is the minimum frequency response that is still > intelligible in normal speech" vs. "what is the minimum frequency response for > full fidelity speech". That's what engineering is all about--trade-offs!
Originally, there was no sample rate involved. Hybrids had to be terminated with dummy lines that closely matched the real line impedance over the bandwidth of intended use. Ear pieces and carbon microphones had to cover the band. In all respects, bandwidth cost money. This is also easy to see with analog frequency-division multiplexing. The actual guaranteed analog high frequency was 3600 Hz, if I remember correctly, but actual response was usually better starting around 1950. The 8 KHz sample rate was adequate to preserve the quality of the analog service. Jerry -- ... they proceeded on the sound principle that the magnitude of a lie always contains a certain factor of credibility, ... and that therefor ... they more easily fall victim to a big lie than to a little one ... A. H. &#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;&#2013266071;
"Jerry Avins" <jya@ieee.org> wrote in message
news:cjuqqq$jkq$1@bob.news.rcn.net...
> Jon Harris wrote: > > > I don't think the original researches were so "dumb" as to not realize that > > speech had higher frequency components. Given the technology limits of the > > time, they chose the sample rate that allowed for "intelligible" speech (not > > perfect speech) at a reasonable cost. In other words, the criteria for
choosing
> > the frequency response was "what is the minimum frequency response that is
still
> > intelligible in normal speech" vs. "what is the minimum frequency response
for
> > full fidelity speech". That's what engineering is all about--trade-offs! > > Originally, there was no sample rate involved. Hybrids had to be > terminated with dummy lines that closely matched the real line impedance > over the bandwidth of intended use. Ear pieces and carbon microphones > had to cover the band. In all respects, bandwidth cost money. This is > also easy to see with analog frequency-division multiplexing. The actual > guaranteed analog high frequency was 3600 Hz, if I remember correctly, > but actual response was usually better starting around 1950. The 8 KHz > sample rate was adequate to preserve the quality of the analog service.
Thanks for the historical clarifications, Jerry.