Sign in

username:

password:



Not a member?

Search speechcoding



Search tips

Subscribe to speechcoding



speechcoding by Keywords

ACELP | ADPCM | AMBE | AMR | AMR-NB | CELP | Codebook | DTMF | G.723 | G.726 | G.729 | GSM | Interpolation | LPC | LSF | LSP | MELP | PCM | Perceptual | Pitch | PSOL | QCELP | Quantization | SMV | VAD | Vocoder


Discussion Groups

See Also

Embedded SystemsFPGAElectronics

Discussion Groups | Speech Coding | How to adjust timing of speech frame

Technical discussions related to Speech Coding (all itu and other vocoders, ACELP, CELP, AMR, etc)

  

Post a new Thread

How to adjust timing of speech frame - dbga...@gmail.com - Jul 17 12:03:07 2009

Hi,

Anyone here know a method to adjust timing of 22.5ms speech frames to 20ms.
I am working with different vocoders including AMBE (Advanced Multiband
Excitation) versus CELP/VSELP. The generated vocoder frames are originally
22.5ms, but the vocoder DSP chip takes in 20ms.

Basically, are there techniques that exist such as time compression,
post-filtering/processing, duplicating speech frames, silence frames, etc. ?

Any help, thoughts, opinions, intuition will be GREATLY appreciated!

Thanks,

DBG

______________________________
Start your Android Ice Cream Sandwich development on TI's AM35x Sitara ARM Cortex-A8 processor today.



(You need to be a member of speechcoding -- send a blank email to speechcoding-subscribe@yahoogroups.com )

Re: How to adjust timing of speech frame - Jeff Brower - Jul 17 13:16:32 2009

DB-

> Anyone here know a method to adjust timing of 22.5ms speech frames
> to 20ms.  I am working with different vocoders including AMBE
> (Advanced Multiband Excitation) versus CELP/VSELP. The generated
> vocoder frames are originally 22.5ms, but the vocoder DSP chip
> takes in 20ms.

I can't figure out your data flow from this explanation, and some of your
details
don't seem to make sense, such as CELP frame size (which is 30 msec, not 22.5),
so I
can only guess at what you're doing.  But in a general case, you can build and
maintain different buffer sizes in parallel; it's just a matter of keeping track
of
pointers and sample counters.  Something like this:
               Collect samples,
          +--> build 20 msec   ----> AMBE
          |    frames
  Input --+
          |    Collect samples,
          +--> build 30 msec   ----> CELP
          |    frames
          |
          |    Collect samples,
          '--> build 22.5 msec ----> MELPe
               frames

In the general case, there's nothing that prevents you from running different
vocoders in parallel... so with that information, you should be able to figure
out
your specific situation.

> Basically, are there techniques that exist such as time
> compression, post-filtering/processing, duplicating speech frames,
> silence frames, etc. ?

These algorithms have nothing to do with buffer size variations.

-Jeff

______________________________
New Code Sharing Section now Live on DSPRelated.com. Learn about the Reward Program for Contributors here.



(You need to be a member of speechcoding -- send a blank email to speechcoding-subscribe@yahoogroups.com )

Re: How to adjust timing of speech frame - Jeff Brower - Jul 17 14:18:54 2009

Don-
> Thank you for the response.  I apologize for not being clear.
>
> Here's the scenario (primary objective):
>
> 1. I have captured AMBE+2 4 x speech frames each containing 99-bits per
frame at
> 22.5ms, 4400bps.

First, AMBE+2 documented frame size is 20 msec, with the encoder producing 88
bit
packets... why are you getting 99 bits?  I can find many AMBE+2 references to
"88
bits", including DVSI's website, but not to 99 bits.

Second, when you say "speech frames containing 99 bits" you are mixing
terminology
and creating a confusing problem description.  A speech frame contains speech
samples, for example a 20 msec frame of speech sampled at 8 kHz would contain
160
samples.  A compressed voice packet contains bits, produced by the encoder half
of
the vocoder.
> 2. I  am using AMBE+2 vocoder chip that takes the data structure that looks
exactly
> the same.
> However, the real difference is that it was designed to take samples in
20ms speech
> frames.
>
> Is there a way to compensate for this?  Software or Hardware solution?

When you say "data structure", are you saying you have trying to take
compressed
voice packet output from vocoder A's encoder, and feed that to vocoder B's
decoder?
If that's the case, and A and B are the same, there should be no issue.  If they
are
actually different, then you are "transcoding", which is a different
story.

-Jeff

> 2009/7/17 Jeff Brower <j...@signalogic.com>      DB-
>      > Anyone here know a method to adjust timing of 22.5ms speech
frames
>      > to 20ms.  I am working with different vocoders including AMBE
>      > (Advanced Multiband Excitation) versus CELP/VSELP. The generated
>      > vocoder frames are originally 22.5ms, but the vocoder DSP chip
>      > takes in 20ms.
>       I can't figure out your data flow from this explanation, and some of
>      your details
>      don't seem to make sense, such as CELP frame size (which is 30 msec,
not
>      22.5), so I
>      can only guess at what you're doing.  But in a general case, you can
>      build and
>      maintain different buffer sizes in parallel; it's just a matter of
>      keeping track of
>      pointers and sample counters.  Something like this:
>                    Collect samples,
>               +--> build 20 msec   ----> AMBE
>               |    frames
>       Input --+
>               |    Collect samples,
>               +--> build 30 msec   ----> CELP
>               |    frames
>               |
>               |    Collect samples,
>               '--> build 22.5 msec ----> MELPe
>                    frames
>
>      In the general case, there's nothing that prevents you from running
>      different
>      vocoders in parallel... so with that information, you should be able
to
>      figure out
>      your specific situation.
>      > Basically, are there techniques that exist such as time
>      > compression, post-filtering/processing, duplicating speech
frames,
>      > silence frames, etc. ?
>       These algorithms have nothing to do with buffer size variations.
>
>      -Jeff
>

______________________________
New Code Sharing Section now Live on DSPRelated.com. Learn about the Reward Program for Contributors here.



(You need to be a member of speechcoding -- send a blank email to speechcoding-subscribe@yahoogroups.com )

Re: How to adjust timing of speech frame - Don Gabriel - Jul 17 15:54:36 2009

Hi Jeff,

Thank you for the response.  I apologize for not being clear.

Here's the scenario (primary objective):

1. I have captured AMBE+2 4 x speech frames each containing 99-bits per
frame at 22.5ms, 4400bps.

2. I  am using AMBE+2 vocoder chip that takes the data structure that looks
exactly the same.
However, the real difference is that it was designed to take samples in 20ms
speech frames.

Is there a way to compensate for this?  Software or Hardware solution?

Thanks in advance! :)

Sincerely,

DBG
2009/7/17 Jeff Brower <j...@signalogic.com>

> DB-
>
> > Anyone here know a method to adjust timing of 22.5ms speech frames
> > to 20ms.  I am working with different vocoders including AMBE
> > (Advanced Multiband Excitation) versus CELP/VSELP. The generated
> > vocoder frames are originally 22.5ms, but the vocoder DSP chip
> > takes in 20ms.
>
> I can't figure out your data flow from this explanation, and some of your
> details
> don't seem to make sense, such as CELP frame size (which is 30 msec, not
> 22.5), so I
> can only guess at what you're doing.  But in a general case, you can build
> and
> maintain different buffer sizes in parallel; it's just a matter of keeping
> track of
> pointers and sample counters.  Something like this:
>               Collect samples,
>          +--> build 20 msec   ----> AMBE
>          |    frames
>  Input --+
>          |    Collect samples,
>          +--> build 30 msec   ----> CELP
>          |    frames
>          |
>          |    Collect samples,
>          '--> build 22.5 msec ----> MELPe
>               frames
>
> In the general case, there's nothing that prevents you from running
> different
> vocoders in parallel... so with that information, you should be able to
> figure out
> your specific situation.
>
> > Basically, are there techniques that exist such as time
> > compression, post-filtering/processing, duplicating speech frames,
> > silence frames, etc. ?
>
> These algorithms have nothing to do with buffer size variations.
>
> -Jeff
>

______________________________
New Code Sharing Section now Live on DSPRelated.com. Learn about the Reward Program for Contributors here.



(You need to be a member of speechcoding -- send a blank email to speechcoding-subscribe@yahoogroups.com )

Re: How to adjust timing of speech frame - Don Gabriel - Jul 17 15:55:07 2009

Hi Jeff,

Thank you for explaining.  The AMBE+2 data I was examining is really unusual
-- never seen it before like this.  It's from an iDEN network. It contained
180 samples (8KHz, 22.5ms)  with 4 subframes of 99-bits each.

I am trying to see if I can transcode it somehow to PCM.

Just curious if it's possible.

Sincerely,

DBG

2009/7/17 Jeff Brower <j...@signalogic.com>

> Don-
> Thank you for the response.  I apologize for not being clear.
>
> Here's the scenario (primary objective):
>
> 1. I have captured AMBE+2 4 x speech frames each containing 99-bits per
> frame at 22.5ms, 4400bps.
> First, AMBE+2 documented frame size is 20 msec, with the encoder producing
> 88 bit packets... why are you getting 99 bits?  I can find many AMBE+2
> references to "88 bits", including DVSI's website, but not to 99
bits.
>
> Second, when you say "speech frames containing 99 bits" you are
mixing
> terminology and creating a confusing problem description.  A speech frame
> contains speech samples, for example a 20 msec frame of speech sampled at
8
> kHz would contain 160 samples.  A compressed voice packet contains bits,
> produced by the encoder half of the vocoder.
> 2. I  am using AMBE+2 vocoder chip that takes the data structure that
looks
> exactly the same.
> However, the real difference is that it was designed to take samples in
> 20ms speech frames.
>
> Is there a way to compensate for this?  Software or Hardware solution?
> When you say "data structure", are you saying you have trying to
take
> compressed voice packet output from vocoder A's encoder, and feed that to
> vocoder B's decoder?  If that's the case, and A and B are the same, there
> should be no issue.  If they are actually different, then you are
> "transcoding", which is a different story.
>
> -Jeff
>
> 2009/7/17 Jeff Brower <j...@signalogic.com>
>>
>> DB-
>> > Anyone here know a method to adjust timing of 22.5ms speech
frames
>> > to 20ms.  I am working with different vocoders including AMBE
>> > (Advanced Multiband Excitation) versus CELP/VSELP. The generated
>> > vocoder frames are originally 22.5ms, but the vocoder DSP chip
>> > takes in 20ms.
>>  I can't figure out your data flow from this explanation, and some of
your
>> details
>> don't seem to make sense, such as CELP frame size (which is 30 msec,
not
>> 22.5), so I
>> can only guess at what you're doing.  But in a general case, you can
build
>> and
>> maintain different buffer sizes in parallel; it's just a matter of
keeping
>> track of
>> pointers and sample counters.  Something like this:
>>               Collect samples,
>>          +--> build 20 msec   ----> AMBE
>>          |    frames
>>  Input --+
>>          |    Collect samples,
>>          +--> build 30 msec   ----> CELP
>>          |    frames
>>          |
>>          |    Collect samples,
>>          '--> build 22.5 msec ----> MELPe
>>               frames
>>
>> In the general case, there's nothing that prevents you from running
>> different
>> vocoders in parallel... so with that information, you should be able
to
>> figure out
>> your specific situation.
>> > Basically, are there techniques that exist such as time
>> > compression, post-filtering/processing, duplicating speech
frames,
>> > silence frames, etc. ?
>>  These algorithms have nothing to do with buffer size variations.
>>
>> -Jeff
>

______________________________
Start your Android Ice Cream Sandwich development on TI's AM35x Sitara ARM Cortex-A8 processor today.



(You need to be a member of speechcoding -- send a blank email to speechcoding-subscribe@yahoogroups.com )

Re: How to adjust timing of speech frame - Jeff Brower - Jul 17 16:51:19 2009

Don-

> Thank you for explaining.  The AMBE+2 data I was examining is really
unusual
> -- never seen it before like this.  It's from an iDEN network. It
contained
> 180 samples (8KHz, 22.5ms)  with 4 subframes of 99-bits each.

Ok... in that case I might guess that you're looking at something proprietary
that
DVSI did for Motorola several years ago, when iDEN + Nextel were hot and PTT
involved
proprietary methods and equipment.  Today, PoC implementations are typically
IMS
compliant (some even clientless) and standard cellular networks and equipment
are
used (partly explaining why Nextel is where they're at now, but that's another
discussion).  Standard GSM and CDMA networks means standard cell codecs
(GSM-AMR,
EVRC, etc, not AMBE.

> I am trying to see if I can transcode it somehow to PCM.
> 
> Just curious if it's possible.

Well, those extra 11 bits are a big deal.  At that point you're in the
"packet
domain" so unless you know the exact meaning and interpretation of each and
every
bit, then you don't have a way to interpolate, discard, transpose, etc. those
bits
into the 88-bit format needed by your AMBE+2 chip.  Somehow you have to get
your
hands on something that can decode the 22.5 msec version of AMBE+2 -- but my
guess is
that's not easy for many reasons, including terms of Mot-DVSI contracts.  I did
see
some references to 22.5 msec AMBE+2, mentioned in patent applications.  You can
Google it and hopefully find out more.

If you want to talk more about this offline, feel free to give me a call. 
Mention
"AMBE" or "iDEN" and they'll put you through.

-Jeff
> 2009/7/17 Jeff Brower <j...@signalogic.com> > Don-
> >
> >
> > Thank you for the response.  I apologize for not being clear.
> >
> > Here's the scenario (primary objective):
> >
> > 1. I have captured AMBE+2 4 x speech frames each containing 99-bits
per
> > frame at 22.5ms, 4400bps.
> >
> >
> > First, AMBE+2 documented frame size is 20 msec, with the encoder
producing
> > 88 bit packets... why are you getting 99 bits?  I can find many
AMBE+2
> > references to "88 bits", including DVSI's website, but not
to 99 bits.
> >
> > Second, when you say "speech frames containing 99 bits" you
are mixing
> > terminology and creating a confusing problem description.  A speech
frame
> > contains speech samples, for example a 20 msec frame of speech sampled
at 8
> > kHz would contain 160 samples.  A compressed voice packet contains
bits,
> > produced by the encoder half of the vocoder.
> >
> >
> > 2. I  am using AMBE+2 vocoder chip that takes the data structure that
looks
> > exactly the same.
> > However, the real difference is that it was designed to take samples
in
> > 20ms speech frames.
> >
> > Is there a way to compensate for this?  Software or Hardware
solution?
> >
> >
> > When you say "data structure", are you saying you have
trying to take
> > compressed voice packet output from vocoder A's encoder, and feed that
to
> > vocoder B's decoder?  If that's the case, and A and B are the same,
there
> > should be no issue.  If they are actually different, then you are
> > "transcoding", which is a different story.
> >
> > -Jeff
> >
> >
> >
> > 2009/7/17 Jeff Brower <j...@signalogic.com>
> >>
> >> DB-
> >> > Anyone here know a method to adjust timing of 22.5ms speech
frames
> >> > to 20ms.  I am working with different vocoders including
AMBE
> >> > (Advanced Multiband Excitation) versus CELP/VSELP. The
generated
> >> > vocoder frames are originally 22.5ms, but the vocoder DSP
chip
> >> > takes in 20ms.
> >>  I can't figure out your data flow from this explanation, and some
of your
> >> details
> >> don't seem to make sense, such as CELP frame size (which is 30
msec, not
> >> 22.5), so I
> >> can only guess at what you're doing.  But in a general case, you
can build
> >> and
> >> maintain different buffer sizes in parallel; it's just a matter of
keeping
> >> track of
> >> pointers and sample counters.  Something like this:
> >>
> >>
> >>               Collect samples,
> >>          +--> build 20 msec   ----> AMBE
> >>          |    frames
> >>  Input --+
> >>          |    Collect samples,
> >>          +--> build 30 msec   ----> CELP
> >>          |    frames
> >>          |
> >>          |    Collect samples,
> >>          '--> build 22.5 msec ----> MELPe
> >>               frames
> >>
> >> In the general case, there's nothing that prevents you from
running
> >> different
> >> vocoders in parallel... so with that information, you should be
able to
> >> figure out
> >> your specific situation.
> >> > Basically, are there techniques that exist such as time
> >> > compression, post-filtering/processing, duplicating speech
frames,
> >> > silence frames, etc. ?
> >>  These algorithms have nothing to do with buffer size variations.
> >>
> >> -Jeff

______________________________
Start your Android Ice Cream Sandwich development on TI's AM35x Sitara ARM Cortex-A8 processor today.



(You need to be a member of speechcoding -- send a blank email to speechcoding-subscribe@yahoogroups.com )

Re: How to adjust timing of speech frame - Don Gabriel - Jul 20 9:58:01 2009

Hi Jeff,

Thank you very much for your input!
Looks like I still need a lot of time to see into the "packet domain"
before
I could really make it work.

By the way, I checked out your website and you have AWESOME DSP tools,
especially for vocoders!   I am more of a hardware person, but your products
are definitely interesting and very useful for me in the near future.

Sincerely,

DBG

2009/7/17 Jeff Brower <j...@signalogic.com>

> Don-
>
> > Thank you for explaining.  The AMBE+2 data I was examining is really
> unusual
> > -- never seen it before like this.  It's from an iDEN network. It
> contained
> > 180 samples (8KHz, 22.5ms)  with 4 subframes of 99-bits each.
>
> Ok... in that case I might guess that you're looking at something
> proprietary that
> DVSI did for Motorola several years ago, when iDEN + Nextel were hot and
> PTT involved
> proprietary methods and equipment.  Today, PoC implementations are
> typically IMS
> compliant (some even clientless) and standard cellular networks and
> equipment are
> used (partly explaining why Nextel is where they're at now, but that's
> another
> discussion).  Standard GSM and CDMA networks means standard cell codecs
> (GSM-AMR,
> EVRC, etc, not AMBE.
>
> > I am trying to see if I can transcode it somehow to PCM.
> >
> > Just curious if it's possible.
>
> Well, those extra 11 bits are a big deal.  At that point you're in the
> "packet
> domain" so unless you know the exact meaning and interpretation of
each and
> every
> bit, then you don't have a way to interpolate, discard, transpose, etc.
> those bits
> into the 88-bit format needed by your AMBE+2 chip.  Somehow you have to
get
> your
> hands on something that can decode the 22.5 msec version of AMBE+2 -- but
> my guess is
> that's not easy for many reasons, including terms of Mot-DVSI contracts. 
I
> did see
> some references to 22.5 msec AMBE+2, mentioned in patent applications. 
You
> can
> Google it and hopefully find out more.
>
> If you want to talk more about this offline, feel free to give me a call.
>  Mention
> "AMBE" or "iDEN" and they'll put you through.
>
> -Jeff
> > 2009/7/17 Jeff Brower <j...@signalogic.com>
> >
> > > Don-
> > >
> > >
> > > Thank you for the response.  I apologize for not being clear.
> > >
> > > Here's the scenario (primary objective):
> > >
> > > 1. I have captured AMBE+2 4 x speech frames each containing
99-bits per
> > > frame at 22.5ms, 4400bps.
> > >
> > >
> > > First, AMBE+2 documented frame size is 20 msec, with the encoder
> producing
> > > 88 bit packets... why are you getting 99 bits?  I can find many
AMBE+2
> > > references to "88 bits", including DVSI's website, but
not to 99 bits.
> > >
> > > Second, when you say "speech frames containing 99 bits"
you are mixing
> > > terminology and creating a confusing problem description.  A
speech
> frame
> > > contains speech samples, for example a 20 msec frame of speech
sampled
> at 8
> > > kHz would contain 160 samples.  A compressed voice packet
contains
> bits,
> > > produced by the encoder half of the vocoder.
> > >
> > >
> > > 2. I  am using AMBE+2 vocoder chip that takes the data structure
that
> looks
> > > exactly the same.
> > > However, the real difference is that it was designed to take
samples in
> > > 20ms speech frames.
> > >
> > > Is there a way to compensate for this?  Software or Hardware
solution?
> > >
> > >
> > > When you say "data structure", are you saying you have
trying to take
> > > compressed voice packet output from vocoder A's encoder, and feed
that
> to
> > > vocoder B's decoder?  If that's the case, and A and B are the
same,
> there
> > > should be no issue.  If they are actually different, then you
are
> > > "transcoding", which is a different story.
> > >
> > > -Jeff
> > >
> > >
> > >
> > > 2009/7/17 Jeff Brower <j...@signalogic.com>
> > >>
> > >> DB-
> > >> > Anyone here know a method to adjust timing of 22.5ms
speech frames
> > >> > to 20ms.  I am working with different vocoders including
AMBE
> > >> > (Advanced Multiband Excitation) versus CELP/VSELP. The
generated
> > >> > vocoder frames are originally 22.5ms, but the vocoder
DSP chip
> > >> > takes in 20ms.
> > >>  I can't figure out your data flow from this explanation, and
some of
> your
> > >> details
> > >> don't seem to make sense, such as CELP frame size (which is
30 msec,
> not
> > >> 22.5), so I
> > >> can only guess at what you're doing.  But in a general case,
you can
> build
> > >> and
> > >> maintain different buffer sizes in parallel; it's just a
matter of
> keeping
> > >> track of
> > >> pointers and sample counters.  Something like this:
> > >>
> > >>
> > >>               Collect samples,
> > >>          +--> build 20 msec   ----> AMBE
> > >>          |    frames
> > >>  Input --+
> > >>          |    Collect samples,
> > >>          +--> build 30 msec   ----> CELP
> > >>          |    frames
> > >>          |
> > >>          |    Collect samples,
> > >>          '--> build 22.5 msec ----> MELPe
> > >>               frames
> > >>
> > >> In the general case, there's nothing that prevents you from
running
> > >> different
> > >> vocoders in parallel... so with that information, you should
be able
> to
> > >> figure out
> > >> your specific situation.
> > >> > Basically, are there techniques that exist such as time
> > >> > compression, post-filtering/processing, duplicating
speech frames,
> > >> > silence frames, etc. ?
> > >>  These algorithms have nothing to do with buffer size
variations.
> > >>
> > >> -Jeff
>

______________________________
Start your Android Ice Cream Sandwich development on TI's AM35x Sitara ARM Cortex-A8 processor today.



(You need to be a member of speechcoding -- send a blank email to speechcoding-subscribe@yahoogroups.com )