Reply by Piyush Kaul October 7, 20042004-10-07
Can't think of a reason why sampling at different rate won't work with
the algorithm. The only thing I can think of which might be adversely
affected is the VQ part. The codebook is trained with speech at 10ms
per frame (at 8KHz). So maybe putting a 5 ms frame (sampled at 16KHz)
would make a mess of the codebook.
All this is just a guess. Never really tried anything like this. 


Regards
Piyush

Jack <jack8051@lightspawn.removethisbit.org> wrote in message news:<ahj5m0hf7b6at2vh2apskntn3bg0so1cni@4ax.com>...
> >It's what's used in digital telephony. That's what the standard covers. > > > >Jerry > > I realize that if I change the sampling rate I won't strictly be > adhering to the standard any more. But when I write both the encoder > and the decoder, it's doesn't seem like such an issue (at least in my > case). If I increase the rate from 8 KHz (to, say, 8.1 or 8.2) will it > sound at least as good as the standard? Or is the algorithm somehow > "optimized" for that sampling rate so that it actually sounds worse at > a slightly higher rate?
Reply by Steve Underwood October 5, 20042004-10-05
Jerry Avins wrote:

> Jon Harris wrote: > > ... > >> I don't think the original researches were so "dumb" as to not >> realize that >> speech had higher frequency components. Given the technology limits >> of the >> time, they chose the sample rate that allowed for "intelligible" >> speech (not >> perfect speech) at a reasonable cost. In other words, the criteria >> for choosing >> the frequency response was "what is the minimum frequency response >> that is still >> intelligible in normal speech" vs. "what is the minimum frequency >> response for >> full fidelity speech". That's what engineering is all >> about--trade-offs! > > > Originally, there was no sample rate involved. Hybrids had to be > terminated with dummy lines that closely matched the real line impedance > over the bandwidth of intended use. Ear pieces and carbon microphones > had to cover the band. In all respects, bandwidth cost money. This is > also easy to see with analog frequency-division multiplexing. The actual > guaranteed analog high frequency was 3600 Hz, if I remember correctly, > but actual response was usually better starting around 1950. The 8 KHz > sample rate was adequate to preserve the quality of the analog service. > > Jerry
There was no sample rate, but early on there were FDM stacks. That demanded the same choices about permitted bandwidth, and that is where the choices we live with today were set in (somewhat flaky) concrete. On simple local loop analogue lines, saying the bandwidth is 3600Hz is more a quality of service issues, than a hard engineering one. In 99% of cases the bandwidth there is pretty much arbitrary. Regards, Steve
Reply by Phil Frisbie, Jr. October 5, 20042004-10-05
Jack wrote:

>>It's what's used in digital telephony. That's what the standard covers. >> >>Jerry > > > I realize that if I change the sampling rate I won't strictly be > adhering to the standard any more. But when I write both the encoder > and the decoder, it's doesn't seem like such an issue (at least in my > case). If I increase the rate from 8 KHz (to, say, 8.1 or 8.2) will it > sound at least as good as the standard? Or is the algorithm somehow > "optimized" for that sampling rate so that it actually sounds worse at > a slightly higher rate?
If you vary the sampling rate slightly everything should be fine, but what sound card samples at 8200 Hz? Perhaps if you explain WHY you want to change the sample rate there is another way to do it. However, if you doubled the sampling rate the tone would greatly change for some people due to the filters in the encoder. -- Phil Frisbie, Jr. Hawk Software http://www.hawksoft.com
Reply by Jon Harris October 5, 20042004-10-05
"Jerry Avins" <jya@ieee.org> wrote in message
news:cjuqqq$jkq$1@bob.news.rcn.net...
> Jon Harris wrote: > > > I don't think the original researches were so "dumb" as to not realize that > > speech had higher frequency components. Given the technology limits of the > > time, they chose the sample rate that allowed for "intelligible" speech (not > > perfect speech) at a reasonable cost. In other words, the criteria for
choosing
> > the frequency response was "what is the minimum frequency response that is
still
> > intelligible in normal speech" vs. "what is the minimum frequency response
for
> > full fidelity speech". That's what engineering is all about--trade-offs! > > Originally, there was no sample rate involved. Hybrids had to be > terminated with dummy lines that closely matched the real line impedance > over the bandwidth of intended use. Ear pieces and carbon microphones > had to cover the band. In all respects, bandwidth cost money. This is > also easy to see with analog frequency-division multiplexing. The actual > guaranteed analog high frequency was 3600 Hz, if I remember correctly, > but actual response was usually better starting around 1950. The 8 KHz > sample rate was adequate to preserve the quality of the analog service.
Thanks for the historical clarifications, Jerry.
Reply by Jerry Avins October 5, 20042004-10-05
Jon Harris wrote:

   ...

> I don't think the original researches were so "dumb" as to not realize that > speech had higher frequency components. Given the technology limits of the > time, they chose the sample rate that allowed for "intelligible" speech (not > perfect speech) at a reasonable cost. In other words, the criteria for choosing > the frequency response was "what is the minimum frequency response that is still > intelligible in normal speech" vs. "what is the minimum frequency response for > full fidelity speech". That's what engineering is all about--trade-offs!
Originally, there was no sample rate involved. Hybrids had to be terminated with dummy lines that closely matched the real line impedance over the bandwidth of intended use. Ear pieces and carbon microphones had to cover the band. In all respects, bandwidth cost money. This is also easy to see with analog frequency-division multiplexing. The actual guaranteed analog high frequency was 3600 Hz, if I remember correctly, but actual response was usually better starting around 1950. The 8 KHz sample rate was adequate to preserve the quality of the analog service. Jerry -- ... they proceeded on the sound principle that the magnitude of a lie always contains a certain factor of credibility, ... and that therefor ... they more easily fall victim to a big lie than to a little one ... A. H. &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;
Reply by Jon Harris October 5, 20042004-10-05
"Steve Underwood" <steveu@dis.org> wrote in message
news:cjtmsk$cp3$1@home.itg.ti.com...
> James Salsman wrote: > > >> Is there really something special about 8000 samples / sec? > > > > > > Some dolt in Bell Labs during the 1920s decreed that voice transmission > > requires a frequency response from 250 Hz to only 3000 Hz. Even though > > Harry Nyquist rounded it up to 4000 Hz to be on the safe side (and > > because we all like round numbers) around 1938, we're all still stuck > > saying things like "S as in Sam" and "F as in Frank" over modern > > telephones because apparently nobody actually bothered to check the > > frequency spectrum of actual speech. > > I don't think that is entirely fair. For most of the life of the > telephone network, using twice the bandwidth would have incurred > significant additional cost. Considering how few words in a typical > conversation cause the problem you describe, I think the compromise they > chose was none too bad.
From a recent thread: On the phone, it is generally quite easy to understand normal conversation speech even with the limited frequency response. However, if someone tries to read a string of random letters, it is quite a bit more difficult to understand them on the other end. Losing those high frequencies makes consonants difficult to differentiate. The brain normally does a good job of compensating for the loss of high frequencies by using context clues. But since very few context clues exist with a string of random letters, it becomes difficult to understand. So the phone is generally quite adequate for its primary intended application--communicating normal conversational speech. However, it is certainly not a perfect medium and doesn't do as well in other applications. I don't think the original researches were so "dumb" as to not realize that speech had higher frequency components. Given the technology limits of the time, they chose the sample rate that allowed for "intelligible" speech (not perfect speech) at a reasonable cost. In other words, the criteria for choosing the frequency response was "what is the minimum frequency response that is still intelligible in normal speech" vs. "what is the minimum frequency response for full fidelity speech". That's what engineering is all about--trade-offs!
Reply by Jerry Avins October 5, 20042004-10-05
Jack wrote:
>>It's what's used in digital telephony. That's what the standard covers. >> >>Jerry > > > I realize that if I change the sampling rate I won't strictly be > adhering to the standard any more. But when I write both the encoder > and the decoder, it's doesn't seem like such an issue (at least in my > case). If I increase the rate from 8 KHz (to, say, 8.1 or 8.2) will it > sound at least as good as the standard? Or is the algorithm somehow > "optimized" for that sampling rate so that it actually sounds worse at > a slightly higher rate?
I don't really know, but it doesn't seem likely. Jerry -- ... they proceeded on the sound principle that the magnitude of a lie always contains a certain factor of credibility, ... and that therefor ... they more easily fall victim to a big lie than to a little one ... A. H. &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;
Reply by Jack October 5, 20042004-10-05
>It's what's used in digital telephony. That's what the standard covers. > >Jerry
I realize that if I change the sampling rate I won't strictly be adhering to the standard any more. But when I write both the encoder and the decoder, it's doesn't seem like such an issue (at least in my case). If I increase the rate from 8 KHz (to, say, 8.1 or 8.2) will it sound at least as good as the standard? Or is the algorithm somehow "optimized" for that sampling rate so that it actually sounds worse at a slightly higher rate?
Reply by Raymond Toy October 5, 20042004-10-05
>>>>> "Jack" == Jack <jack8051@lightspawn.removethisbit.org> writes:
Jack> I'm trying to understand G.729. The only compression algorithm I've Jack> coded before is ADPCM, and it wasn't keyed to a certain sampling rate Jack> - that is, it would be just as happy with 8100 samples per second or Jack> 7900 samples per second as it would have been with 8000, just the Jack> quality would be slightly higher or lower. Jack> With G.729, all the documentation refers to a sampling rate of 8 KHz. Jack> Is it really the only rate that makes sense? That is, if my real Jack> sampling rate is slightly higher or lower (but constant, and same for Jack> the encoder and the decoder) can't I just feed the samples to the Jack> algorithm a little faster or a little slower? Is there really Jack> something special about 8000 samples / sec? You could, probably, vary the sample rate some and it would sound ok. But G.729 tries to model the speech signal so if things happen faster or slower than expected, the model may no longer be as accurate. For example, if you sampled at 16000 samples, the pitch period would now be twice the number of samples. This might confuse G.729. I don't know the fine details of G.729, so I might be wrong. Ray
Reply by Steve Underwood October 5, 20042004-10-05
James Salsman wrote:

>> Is there really something special about 8000 samples / sec? > > > Some dolt in Bell Labs during the 1920s decreed that voice transmission > requires a frequency response from 250 Hz to only 3000 Hz. Even though > Harry Nyquist rounded it up to 4000 Hz to be on the safe side (and > because we all like round numbers) around 1938, we're all still stuck > saying things like "S as in Sam" and "F as in Frank" over modern > telephones because apparently nobody actually bothered to check the > frequency spectrum of actual speech. > > So, sure, "special," as in, "special education."
I don't think that is entirely fair. For most of the life of the telephone network, using twice the bandwidth would have incurred significant additional cost. Considering how few words in a typical conversation cause the problem you describe, I think the compromise they chose was none too bad. What was dumber, was the half-hearted effort to improve things in the early days of ISDN. The addition of a 7.1kHz bandwidth audio mode was handled so poorly it never caught on at all. With modern speech compression, wider bandwidth need have little impact on the bit rate. However, most of the newer codecs, like G.729, still only provide for an 8kHz sampled audio world. The latest 3GPP codecs do, however, provide wideband modes, so maybe phone speech clarity will improve in the next few years. Regards, Steve