Can't think of a reason why sampling at different rate won't work with
the algorithm. The only thing I can think of which might be adversely
affected is the VQ part. The codebook is trained with speech at 10ms
per frame (at 8KHz). So maybe putting a 5 ms frame (sampled at 16KHz)
would make a mess of the codebook.
All this is just a guess. Never really tried anything like this.
Regards
Piyush
Jack <jack8051@lightspawn.removethisbit.org> wrote in message news:<ahj5m0hf7b6at2vh2apskntn3bg0so1cni@4ax.com>...
> >It's what's used in digital telephony. That's what the standard covers.
> >
> >Jerry
>
> I realize that if I change the sampling rate I won't strictly be
> adhering to the standard any more. But when I write both the encoder
> and the decoder, it's doesn't seem like such an issue (at least in my
> case). If I increase the rate from 8 KHz (to, say, 8.1 or 8.2) will it
> sound at least as good as the standard? Or is the algorithm somehow
> "optimized" for that sampling rate so that it actually sounds worse at
> a slightly higher rate?
Reply by Steve Underwood●October 5, 20042004-10-05
Jerry Avins wrote:
> Jon Harris wrote:
>
> ...
>
>> I don't think the original researches were so "dumb" as to not
>> realize that
>> speech had higher frequency components. Given the technology limits
>> of the
>> time, they chose the sample rate that allowed for "intelligible"
>> speech (not
>> perfect speech) at a reasonable cost. In other words, the criteria
>> for choosing
>> the frequency response was "what is the minimum frequency response
>> that is still
>> intelligible in normal speech" vs. "what is the minimum frequency
>> response for
>> full fidelity speech". That's what engineering is all
>> about--trade-offs!
>
>
> Originally, there was no sample rate involved. Hybrids had to be
> terminated with dummy lines that closely matched the real line impedance
> over the bandwidth of intended use. Ear pieces and carbon microphones
> had to cover the band. In all respects, bandwidth cost money. This is
> also easy to see with analog frequency-division multiplexing. The actual
> guaranteed analog high frequency was 3600 Hz, if I remember correctly,
> but actual response was usually better starting around 1950. The 8 KHz
> sample rate was adequate to preserve the quality of the analog service.
>
> Jerry
There was no sample rate, but early on there were FDM stacks. That
demanded the same choices about permitted bandwidth, and that is where
the choices we live with today were set in (somewhat flaky) concrete. On
simple local loop analogue lines, saying the bandwidth is 3600Hz is more
a quality of service issues, than a hard engineering one. In 99% of
cases the bandwidth there is pretty much arbitrary.
Regards,
Steve
Reply by Phil Frisbie, Jr.●October 5, 20042004-10-05
Jack wrote:
>>It's what's used in digital telephony. That's what the standard covers.
>>
>>Jerry
>
>
> I realize that if I change the sampling rate I won't strictly be
> adhering to the standard any more. But when I write both the encoder
> and the decoder, it's doesn't seem like such an issue (at least in my
> case). If I increase the rate from 8 KHz (to, say, 8.1 or 8.2) will it
> sound at least as good as the standard? Or is the algorithm somehow
> "optimized" for that sampling rate so that it actually sounds worse at
> a slightly higher rate?
If you vary the sampling rate slightly everything should be fine, but what sound
card samples at 8200 Hz? Perhaps if you explain WHY you want to change the
sample rate there is another way to do it.
However, if you doubled the sampling rate the tone would greatly change for some
people due to the filters in the encoder.
--
Phil Frisbie, Jr.
Hawk Software
http://www.hawksoft.com
Reply by Jon Harris●October 5, 20042004-10-05
"Jerry Avins" <jya@ieee.org> wrote in message
news:cjuqqq$jkq$1@bob.news.rcn.net...
> Jon Harris wrote:
>
> > I don't think the original researches were so "dumb" as to not realize that
> > speech had higher frequency components. Given the technology limits of the
> > time, they chose the sample rate that allowed for "intelligible" speech (not
> > perfect speech) at a reasonable cost. In other words, the criteria for
choosing
> > the frequency response was "what is the minimum frequency response that is
still
> > intelligible in normal speech" vs. "what is the minimum frequency response
for
> > full fidelity speech". That's what engineering is all about--trade-offs!
>
> Originally, there was no sample rate involved. Hybrids had to be
> terminated with dummy lines that closely matched the real line impedance
> over the bandwidth of intended use. Ear pieces and carbon microphones
> had to cover the band. In all respects, bandwidth cost money. This is
> also easy to see with analog frequency-division multiplexing. The actual
> guaranteed analog high frequency was 3600 Hz, if I remember correctly,
> but actual response was usually better starting around 1950. The 8 KHz
> sample rate was adequate to preserve the quality of the analog service.
Thanks for the historical clarifications, Jerry.
Reply by Jerry Avins●October 5, 20042004-10-05
Jon Harris wrote:
...
> I don't think the original researches were so "dumb" as to not realize that
> speech had higher frequency components. Given the technology limits of the
> time, they chose the sample rate that allowed for "intelligible" speech (not
> perfect speech) at a reasonable cost. In other words, the criteria for choosing
> the frequency response was "what is the minimum frequency response that is still
> intelligible in normal speech" vs. "what is the minimum frequency response for
> full fidelity speech". That's what engineering is all about--trade-offs!
Originally, there was no sample rate involved. Hybrids had to be
terminated with dummy lines that closely matched the real line impedance
over the bandwidth of intended use. Ear pieces and carbon microphones
had to cover the band. In all respects, bandwidth cost money. This is
also easy to see with analog frequency-division multiplexing. The actual
guaranteed analog high frequency was 3600 Hz, if I remember correctly,
but actual response was usually better starting around 1950. The 8 KHz
sample rate was adequate to preserve the quality of the analog service.
Jerry
--
... they proceeded on the sound principle that the magnitude of a lie
always contains a certain factor of credibility, ... and that therefor
... they more easily fall victim to a big lie than to a little one ...
A. H.
�����������������������������������������������������������������������
Reply by Jon Harris●October 5, 20042004-10-05
"Steve Underwood" <steveu@dis.org> wrote in message
news:cjtmsk$cp3$1@home.itg.ti.com...
> James Salsman wrote:
>
> >> Is there really something special about 8000 samples / sec?
> >
> >
> > Some dolt in Bell Labs during the 1920s decreed that voice transmission
> > requires a frequency response from 250 Hz to only 3000 Hz. Even though
> > Harry Nyquist rounded it up to 4000 Hz to be on the safe side (and
> > because we all like round numbers) around 1938, we're all still stuck
> > saying things like "S as in Sam" and "F as in Frank" over modern
> > telephones because apparently nobody actually bothered to check the
> > frequency spectrum of actual speech.
>
> I don't think that is entirely fair. For most of the life of the
> telephone network, using twice the bandwidth would have incurred
> significant additional cost. Considering how few words in a typical
> conversation cause the problem you describe, I think the compromise they
> chose was none too bad.
From a recent thread:
On the phone, it is generally quite easy to understand normal
conversation speech even with the limited frequency response. However, if
someone tries to read a string of random letters, it is quite a bit more
difficult to understand them on the other end. Losing those high frequencies
makes consonants difficult to differentiate. The brain normally does a good job
of compensating for the loss of high frequencies by using context clues. But
since very few context clues exist with a string of random letters, it becomes
difficult to understand.
So the phone is generally quite adequate for its primary intended
application--communicating normal conversational speech. However, it is
certainly not a perfect medium and doesn't do as well in other applications.
I don't think the original researches were so "dumb" as to not realize that
speech had higher frequency components. Given the technology limits of the
time, they chose the sample rate that allowed for "intelligible" speech (not
perfect speech) at a reasonable cost. In other words, the criteria for choosing
the frequency response was "what is the minimum frequency response that is still
intelligible in normal speech" vs. "what is the minimum frequency response for
full fidelity speech". That's what engineering is all about--trade-offs!
Reply by Jerry Avins●October 5, 20042004-10-05
Jack wrote:
>>It's what's used in digital telephony. That's what the standard covers.
>>
>>Jerry
>
>
> I realize that if I change the sampling rate I won't strictly be
> adhering to the standard any more. But when I write both the encoder
> and the decoder, it's doesn't seem like such an issue (at least in my
> case). If I increase the rate from 8 KHz (to, say, 8.1 or 8.2) will it
> sound at least as good as the standard? Or is the algorithm somehow
> "optimized" for that sampling rate so that it actually sounds worse at
> a slightly higher rate?
I don't really know, but it doesn't seem likely.
Jerry
--
... they proceeded on the sound principle that the magnitude of a lie
always contains a certain factor of credibility, ... and that therefor
... they more easily fall victim to a big lie than to a little one ...
A. H.
�����������������������������������������������������������������������
Reply by Jack●October 5, 20042004-10-05
>It's what's used in digital telephony. That's what the standard covers.
>
>Jerry
I realize that if I change the sampling rate I won't strictly be
adhering to the standard any more. But when I write both the encoder
and the decoder, it's doesn't seem like such an issue (at least in my
case). If I increase the rate from 8 KHz (to, say, 8.1 or 8.2) will it
sound at least as good as the standard? Or is the algorithm somehow
"optimized" for that sampling rate so that it actually sounds worse at
a slightly higher rate?
Reply by Raymond Toy●October 5, 20042004-10-05
>>>>> "Jack" == Jack <jack8051@lightspawn.removethisbit.org> writes:
Jack> I'm trying to understand G.729. The only compression algorithm I've
Jack> coded before is ADPCM, and it wasn't keyed to a certain sampling rate
Jack> - that is, it would be just as happy with 8100 samples per second or
Jack> 7900 samples per second as it would have been with 8000, just the
Jack> quality would be slightly higher or lower.
Jack> With G.729, all the documentation refers to a sampling rate of 8 KHz.
Jack> Is it really the only rate that makes sense? That is, if my real
Jack> sampling rate is slightly higher or lower (but constant, and same for
Jack> the encoder and the decoder) can't I just feed the samples to the
Jack> algorithm a little faster or a little slower? Is there really
Jack> something special about 8000 samples / sec?
You could, probably, vary the sample rate some and it would sound ok.
But G.729 tries to model the speech signal so if things happen faster
or slower than expected, the model may no longer be as accurate. For
example, if you sampled at 16000 samples, the pitch period would now
be twice the number of samples. This might confuse G.729.
I don't know the fine details of G.729, so I might be wrong.
Ray
Reply by Steve Underwood●October 5, 20042004-10-05
James Salsman wrote:
>> Is there really something special about 8000 samples / sec?
>
>
> Some dolt in Bell Labs during the 1920s decreed that voice transmission
> requires a frequency response from 250 Hz to only 3000 Hz. Even though
> Harry Nyquist rounded it up to 4000 Hz to be on the safe side (and
> because we all like round numbers) around 1938, we're all still stuck
> saying things like "S as in Sam" and "F as in Frank" over modern
> telephones because apparently nobody actually bothered to check the
> frequency spectrum of actual speech.
>
> So, sure, "special," as in, "special education."
I don't think that is entirely fair. For most of the life of the
telephone network, using twice the bandwidth would have incurred
significant additional cost. Considering how few words in a typical
conversation cause the problem you describe, I think the compromise they
chose was none too bad.
What was dumber, was the half-hearted effort to improve things in the
early days of ISDN. The addition of a 7.1kHz bandwidth audio mode was
handled so poorly it never caught on at all.
With modern speech compression, wider bandwidth need have little impact
on the bit rate. However, most of the newer codecs, like G.729, still
only provide for an 8kHz sampled audio world. The latest 3GPP codecs do,
however, provide wideband modes, so maybe phone speech clarity will
improve in the next few years.
Regards,
Steve