DSPRelated.com
Forums

How to adjust timing of speech frame

Started by dbga...@gmail.com July 17, 2009
Hi,

Anyone here know a method to adjust timing of 22.5ms speech frames to 20ms.
I am working with different vocoders including AMBE (Advanced Multiband Excitation) versus CELP/VSELP. The generated vocoder frames are originally 22.5ms, but the vocoder DSP chip takes in 20ms.

Basically, are there techniques that exist such as time compression, post-filtering/processing, duplicating speech frames, silence frames, etc. ?

Any help, thoughts, opinions, intuition will be GREATLY appreciated!

Thanks,

DBG
DB-

> Anyone here know a method to adjust timing of 22.5ms speech frames
> to 20ms. I am working with different vocoders including AMBE
> (Advanced Multiband Excitation) versus CELP/VSELP. The generated
> vocoder frames are originally 22.5ms, but the vocoder DSP chip
> takes in 20ms.

I can't figure out your data flow from this explanation, and some of your details
don't seem to make sense, such as CELP frame size (which is 30 msec, not 22.5), so I
can only guess at what you're doing. But in a general case, you can build and
maintain different buffer sizes in parallel; it's just a matter of keeping track of
pointers and sample counters. Something like this:
Collect samples,
+--> build 20 msec ----> AMBE
| frames
Input --+
| Collect samples,
+--> build 30 msec ----> CELP
| frames
|
| Collect samples,
'--> build 22.5 msec ----> MELPe
frames

In the general case, there's nothing that prevents you from running different
vocoders in parallel... so with that information, you should be able to figure out
your specific situation.

> Basically, are there techniques that exist such as time
> compression, post-filtering/processing, duplicating speech frames,
> silence frames, etc. ?

These algorithms have nothing to do with buffer size variations.

-Jeff
Don-
> Thank you for the response. I apologize for not being clear.
>
> Here's the scenario (primary objective):
>
> 1. I have captured AMBE+2 4 x speech frames each containing 99-bits per frame at
> 22.5ms, 4400bps.

First, AMBE+2 documented frame size is 20 msec, with the encoder producing 88 bit
packets... why are you getting 99 bits? I can find many AMBE+2 references to "88
bits", including DVSI's website, but not to 99 bits.

Second, when you say "speech frames containing 99 bits" you are mixing terminology
and creating a confusing problem description. A speech frame contains speech
samples, for example a 20 msec frame of speech sampled at 8 kHz would contain 160
samples. A compressed voice packet contains bits, produced by the encoder half of
the vocoder.
> 2. I am using AMBE+2 vocoder chip that takes the data structure that looks exactly
> the same.
> However, the real difference is that it was designed to take samples in 20ms speech
> frames.
>
> Is there a way to compensate for this? Software or Hardware solution?

When you say "data structure", are you saying you have trying to take compressed
voice packet output from vocoder A's encoder, and feed that to vocoder B's decoder?
If that's the case, and A and B are the same, there should be no issue. If they are
actually different, then you are "transcoding", which is a different story.

-Jeff

> 2009/7/17 Jeff Brower DB-
> > Anyone here know a method to adjust timing of 22.5ms speech frames
> > to 20ms. I am working with different vocoders including AMBE
> > (Advanced Multiband Excitation) versus CELP/VSELP. The generated
> > vocoder frames are originally 22.5ms, but the vocoder DSP chip
> > takes in 20ms.
> I can't figure out your data flow from this explanation, and some of
> your details
> don't seem to make sense, such as CELP frame size (which is 30 msec, not
> 22.5), so I
> can only guess at what you're doing. But in a general case, you can
> build and
> maintain different buffer sizes in parallel; it's just a matter of
> keeping track of
> pointers and sample counters. Something like this:
> Collect samples,
> +--> build 20 msec ----> AMBE
> | frames
> Input --+
> | Collect samples,
> +--> build 30 msec ----> CELP
> | frames
> |
> | Collect samples,
> '--> build 22.5 msec ----> MELPe
> frames
>
> In the general case, there's nothing that prevents you from running
> different
> vocoders in parallel... so with that information, you should be able to
> figure out
> your specific situation.
> > Basically, are there techniques that exist such as time
> > compression, post-filtering/processing, duplicating speech frames,
> > silence frames, etc. ?
> These algorithms have nothing to do with buffer size variations.
>
> -Jeff
>
Hi Jeff,

Thank you for the response. I apologize for not being clear.

Here's the scenario (primary objective):

1. I have captured AMBE+2 4 x speech frames each containing 99-bits per
frame at 22.5ms, 4400bps.

2. I am using AMBE+2 vocoder chip that takes the data structure that looks
exactly the same.
However, the real difference is that it was designed to take samples in 20ms
speech frames.

Is there a way to compensate for this? Software or Hardware solution?

Thanks in advance! :)

Sincerely,

DBG
2009/7/17 Jeff Brower

> DB-
>
> > Anyone here know a method to adjust timing of 22.5ms speech frames
> > to 20ms. I am working with different vocoders including AMBE
> > (Advanced Multiband Excitation) versus CELP/VSELP. The generated
> > vocoder frames are originally 22.5ms, but the vocoder DSP chip
> > takes in 20ms.
>
> I can't figure out your data flow from this explanation, and some of your
> details
> don't seem to make sense, such as CELP frame size (which is 30 msec, not
> 22.5), so I
> can only guess at what you're doing. But in a general case, you can build
> and
> maintain different buffer sizes in parallel; it's just a matter of keeping
> track of
> pointers and sample counters. Something like this:
> Collect samples,
> +--> build 20 msec ----> AMBE
> | frames
> Input --+
> | Collect samples,
> +--> build 30 msec ----> CELP
> | frames
> |
> | Collect samples,
> '--> build 22.5 msec ----> MELPe
> frames
>
> In the general case, there's nothing that prevents you from running
> different
> vocoders in parallel... so with that information, you should be able to
> figure out
> your specific situation.
>
> > Basically, are there techniques that exist such as time
> > compression, post-filtering/processing, duplicating speech frames,
> > silence frames, etc. ?
>
> These algorithms have nothing to do with buffer size variations.
>
> -Jeff
>
Hi Jeff,

Thank you for explaining. The AMBE+2 data I was examining is really unusual
-- never seen it before like this. It's from an iDEN network. It contained
180 samples (8KHz, 22.5ms) with 4 subframes of 99-bits each.

I am trying to see if I can transcode it somehow to PCM.

Just curious if it's possible.

Sincerely,

DBG

2009/7/17 Jeff Brower

> Don-
> Thank you for the response. I apologize for not being clear.
>
> Here's the scenario (primary objective):
>
> 1. I have captured AMBE+2 4 x speech frames each containing 99-bits per
> frame at 22.5ms, 4400bps.
> First, AMBE+2 documented frame size is 20 msec, with the encoder producing
> 88 bit packets... why are you getting 99 bits? I can find many AMBE+2
> references to "88 bits", including DVSI's website, but not to 99 bits.
>
> Second, when you say "speech frames containing 99 bits" you are mixing
> terminology and creating a confusing problem description. A speech frame
> contains speech samples, for example a 20 msec frame of speech sampled at 8
> kHz would contain 160 samples. A compressed voice packet contains bits,
> produced by the encoder half of the vocoder.
> 2. I am using AMBE+2 vocoder chip that takes the data structure that looks
> exactly the same.
> However, the real difference is that it was designed to take samples in
> 20ms speech frames.
>
> Is there a way to compensate for this? Software or Hardware solution?
> When you say "data structure", are you saying you have trying to take
> compressed voice packet output from vocoder A's encoder, and feed that to
> vocoder B's decoder? If that's the case, and A and B are the same, there
> should be no issue. If they are actually different, then you are
> "transcoding", which is a different story.
>
> -Jeff
>
> 2009/7/17 Jeff Brower
>>
>> DB-
>> > Anyone here know a method to adjust timing of 22.5ms speech frames
>> > to 20ms. I am working with different vocoders including AMBE
>> > (Advanced Multiband Excitation) versus CELP/VSELP. The generated
>> > vocoder frames are originally 22.5ms, but the vocoder DSP chip
>> > takes in 20ms.
>> I can't figure out your data flow from this explanation, and some of your
>> details
>> don't seem to make sense, such as CELP frame size (which is 30 msec, not
>> 22.5), so I
>> can only guess at what you're doing. But in a general case, you can build
>> and
>> maintain different buffer sizes in parallel; it's just a matter of keeping
>> track of
>> pointers and sample counters. Something like this:
>> Collect samples,
>> +--> build 20 msec ----> AMBE
>> | frames
>> Input --+
>> | Collect samples,
>> +--> build 30 msec ----> CELP
>> | frames
>> |
>> | Collect samples,
>> '--> build 22.5 msec ----> MELPe
>> frames
>>
>> In the general case, there's nothing that prevents you from running
>> different
>> vocoders in parallel... so with that information, you should be able to
>> figure out
>> your specific situation.
>> > Basically, are there techniques that exist such as time
>> > compression, post-filtering/processing, duplicating speech frames,
>> > silence frames, etc. ?
>> These algorithms have nothing to do with buffer size variations.
>>
>> -Jeff
>
Don-

> Thank you for explaining. The AMBE+2 data I was examining is really unusual
> -- never seen it before like this. It's from an iDEN network. It contained
> 180 samples (8KHz, 22.5ms) with 4 subframes of 99-bits each.

Ok... in that case I might guess that you're looking at something proprietary that
DVSI did for Motorola several years ago, when iDEN + Nextel were hot and PTT involved
proprietary methods and equipment. Today, PoC implementations are typically IMS
compliant (some even clientless) and standard cellular networks and equipment are
used (partly explaining why Nextel is where they're at now, but that's another
discussion). Standard GSM and CDMA networks means standard cell codecs (GSM-AMR,
EVRC, etc, not AMBE.

> I am trying to see if I can transcode it somehow to PCM.
>
> Just curious if it's possible.

Well, those extra 11 bits are a big deal. At that point you're in the "packet
domain" so unless you know the exact meaning and interpretation of each and every
bit, then you don't have a way to interpolate, discard, transpose, etc. those bits
into the 88-bit format needed by your AMBE+2 chip. Somehow you have to get your
hands on something that can decode the 22.5 msec version of AMBE+2 -- but my guess is
that's not easy for many reasons, including terms of Mot-DVSI contracts. I did see
some references to 22.5 msec AMBE+2, mentioned in patent applications. You can
Google it and hopefully find out more.

If you want to talk more about this offline, feel free to give me a call. Mention
"AMBE" or "iDEN" and they'll put you through.

-Jeff
> 2009/7/17 Jeff Brower > Don-
> >
> >
> > Thank you for the response. I apologize for not being clear.
> >
> > Here's the scenario (primary objective):
> >
> > 1. I have captured AMBE+2 4 x speech frames each containing 99-bits per
> > frame at 22.5ms, 4400bps.
> >
> >
> > First, AMBE+2 documented frame size is 20 msec, with the encoder producing
> > 88 bit packets... why are you getting 99 bits? I can find many AMBE+2
> > references to "88 bits", including DVSI's website, but not to 99 bits.
> >
> > Second, when you say "speech frames containing 99 bits" you are mixing
> > terminology and creating a confusing problem description. A speech frame
> > contains speech samples, for example a 20 msec frame of speech sampled at 8
> > kHz would contain 160 samples. A compressed voice packet contains bits,
> > produced by the encoder half of the vocoder.
> >
> >
> > 2. I am using AMBE+2 vocoder chip that takes the data structure that looks
> > exactly the same.
> > However, the real difference is that it was designed to take samples in
> > 20ms speech frames.
> >
> > Is there a way to compensate for this? Software or Hardware solution?
> >
> >
> > When you say "data structure", are you saying you have trying to take
> > compressed voice packet output from vocoder A's encoder, and feed that to
> > vocoder B's decoder? If that's the case, and A and B are the same, there
> > should be no issue. If they are actually different, then you are
> > "transcoding", which is a different story.
> >
> > -Jeff
> >
> >
> >
> > 2009/7/17 Jeff Brower
> >>
> >> DB-
> >> > Anyone here know a method to adjust timing of 22.5ms speech frames
> >> > to 20ms. I am working with different vocoders including AMBE
> >> > (Advanced Multiband Excitation) versus CELP/VSELP. The generated
> >> > vocoder frames are originally 22.5ms, but the vocoder DSP chip
> >> > takes in 20ms.
> >> I can't figure out your data flow from this explanation, and some of your
> >> details
> >> don't seem to make sense, such as CELP frame size (which is 30 msec, not
> >> 22.5), so I
> >> can only guess at what you're doing. But in a general case, you can build
> >> and
> >> maintain different buffer sizes in parallel; it's just a matter of keeping
> >> track of
> >> pointers and sample counters. Something like this:
> >>
> >>
> >> Collect samples,
> >> +--> build 20 msec ----> AMBE
> >> | frames
> >> Input --+
> >> | Collect samples,
> >> +--> build 30 msec ----> CELP
> >> | frames
> >> |
> >> | Collect samples,
> >> '--> build 22.5 msec ----> MELPe
> >> frames
> >>
> >> In the general case, there's nothing that prevents you from running
> >> different
> >> vocoders in parallel... so with that information, you should be able to
> >> figure out
> >> your specific situation.
> >> > Basically, are there techniques that exist such as time
> >> > compression, post-filtering/processing, duplicating speech frames,
> >> > silence frames, etc. ?
> >> These algorithms have nothing to do with buffer size variations.
> >>
> >> -Jeff
Hi Jeff,

Thank you very much for your input!
Looks like I still need a lot of time to see into the "packet domain" before
I could really make it work.

By the way, I checked out your website and you have AWESOME DSP tools,
especially for vocoders! I am more of a hardware person, but your products
are definitely interesting and very useful for me in the near future.

Sincerely,

DBG

2009/7/17 Jeff Brower

> Don-
>
> > Thank you for explaining. The AMBE+2 data I was examining is really
> unusual
> > -- never seen it before like this. It's from an iDEN network. It
> contained
> > 180 samples (8KHz, 22.5ms) with 4 subframes of 99-bits each.
>
> Ok... in that case I might guess that you're looking at something
> proprietary that
> DVSI did for Motorola several years ago, when iDEN + Nextel were hot and
> PTT involved
> proprietary methods and equipment. Today, PoC implementations are
> typically IMS
> compliant (some even clientless) and standard cellular networks and
> equipment are
> used (partly explaining why Nextel is where they're at now, but that's
> another
> discussion). Standard GSM and CDMA networks means standard cell codecs
> (GSM-AMR,
> EVRC, etc, not AMBE.
>
> > I am trying to see if I can transcode it somehow to PCM.
> >
> > Just curious if it's possible.
>
> Well, those extra 11 bits are a big deal. At that point you're in the
> "packet
> domain" so unless you know the exact meaning and interpretation of each and
> every
> bit, then you don't have a way to interpolate, discard, transpose, etc.
> those bits
> into the 88-bit format needed by your AMBE+2 chip. Somehow you have to get
> your
> hands on something that can decode the 22.5 msec version of AMBE+2 -- but
> my guess is
> that's not easy for many reasons, including terms of Mot-DVSI contracts. I
> did see
> some references to 22.5 msec AMBE+2, mentioned in patent applications. You
> can
> Google it and hopefully find out more.
>
> If you want to talk more about this offline, feel free to give me a call.
> Mention
> "AMBE" or "iDEN" and they'll put you through.
>
> -Jeff
> > 2009/7/17 Jeff Brower
> >
> > > Don-
> > >
> > >
> > > Thank you for the response. I apologize for not being clear.
> > >
> > > Here's the scenario (primary objective):
> > >
> > > 1. I have captured AMBE+2 4 x speech frames each containing 99-bits per
> > > frame at 22.5ms, 4400bps.
> > >
> > >
> > > First, AMBE+2 documented frame size is 20 msec, with the encoder
> producing
> > > 88 bit packets... why are you getting 99 bits? I can find many AMBE+2
> > > references to "88 bits", including DVSI's website, but not to 99 bits.
> > >
> > > Second, when you say "speech frames containing 99 bits" you are mixing
> > > terminology and creating a confusing problem description. A speech
> frame
> > > contains speech samples, for example a 20 msec frame of speech sampled
> at 8
> > > kHz would contain 160 samples. A compressed voice packet contains
> bits,
> > > produced by the encoder half of the vocoder.
> > >
> > >
> > > 2. I am using AMBE+2 vocoder chip that takes the data structure that
> looks
> > > exactly the same.
> > > However, the real difference is that it was designed to take samples in
> > > 20ms speech frames.
> > >
> > > Is there a way to compensate for this? Software or Hardware solution?
> > >
> > >
> > > When you say "data structure", are you saying you have trying to take
> > > compressed voice packet output from vocoder A's encoder, and feed that
> to
> > > vocoder B's decoder? If that's the case, and A and B are the same,
> there
> > > should be no issue. If they are actually different, then you are
> > > "transcoding", which is a different story.
> > >
> > > -Jeff
> > >
> > >
> > >
> > > 2009/7/17 Jeff Brower
> > >>
> > >> DB-
> > >> > Anyone here know a method to adjust timing of 22.5ms speech frames
> > >> > to 20ms. I am working with different vocoders including AMBE
> > >> > (Advanced Multiband Excitation) versus CELP/VSELP. The generated
> > >> > vocoder frames are originally 22.5ms, but the vocoder DSP chip
> > >> > takes in 20ms.
> > >> I can't figure out your data flow from this explanation, and some of
> your
> > >> details
> > >> don't seem to make sense, such as CELP frame size (which is 30 msec,
> not
> > >> 22.5), so I
> > >> can only guess at what you're doing. But in a general case, you can
> build
> > >> and
> > >> maintain different buffer sizes in parallel; it's just a matter of
> keeping
> > >> track of
> > >> pointers and sample counters. Something like this:
> > >>
> > >>
> > >> Collect samples,
> > >> +--> build 20 msec ----> AMBE
> > >> | frames
> > >> Input --+
> > >> | Collect samples,
> > >> +--> build 30 msec ----> CELP
> > >> | frames
> > >> |
> > >> | Collect samples,
> > >> '--> build 22.5 msec ----> MELPe
> > >> frames
> > >>
> > >> In the general case, there's nothing that prevents you from running
> > >> different
> > >> vocoders in parallel... so with that information, you should be able
> to
> > >> figure out
> > >> your specific situation.
> > >> > Basically, are there techniques that exist such as time
> > >> > compression, post-filtering/processing, duplicating speech frames,
> > >> > silence frames, etc. ?
> > >> These algorithms have nothing to do with buffer size variations.
> > >>
> > >> -Jeff
>