Forums

Determining sound intensity from G.729 parameters

Started by furiousgame November 24, 2002
Using only the G.729 parameters (codebook values, codebook indices,
pitch periods, and pitch and codebook gains) or the line spectral
pairs, how can the intensity of a sound be determined?




Phillip-

> Using only the G.729 parameters (codebook values, codebook indices,
> pitch periods, and pitch and codebook gains) or the line spectral
> pairs, how can the intensity of a sound be determined?

If you had an uncompressed waveform, how would you measure "intensity"?

Jeff Brower
DSP sw/hw engineer
Signalogic


Hi I m developping a speech synthetiser on a TIc5402 and as
I tried to compute my own LPC coefficients with Matlab but
the result on LPC 10 sounds quite bad......
so does anybody know existings coefficients for voiced
sounds so I could try a filter on my DSP?? cheers

=====

Michel memeteau
www.geocities.com/chelmitch

ICQ : 138498950
www.freechelmi.fr.fm
0033 (0)621339362 0033(0)238770942
3 rue reginald 45000 orleans



Phillip-

> I would probably just look at the amplitude of the wave. When I say
> "intensity," I'm referring to how loud the sound is. I'm trying to
> distinguish between foreground and background noise.

To some extent, decoding only the gain might be helpful. But without at least
some
general estimate of the spectrum, it would be hard to make any sort of reliable
energy or power estimates.

If you're trying to distinguish presence of speech from background noise only,
you
might look at voiced/unvoiced flags in the parameters. If you examine these
from a
statistical perspective, and combine with gain parameters, you might be able to
say
reliably "there's some speech here" or not.

Jeff Brower
DSP sw/hw engineer
Signalogic > >Phillip-
> >
> > > Using only the G.729 parameters (codebook values, codebook indices,
> > > pitch periods, and pitch and codebook gains) or the line spectral
> > > pairs, how can the intensity of a sound be determined?
> >
> >If you had an uncompressed waveform, how would you measure "intensity"?
> >
> >Jeff Brower
> >DSP sw/hw engineer
> >Signalogic




Charles-

Phillip's question was about how to avoid full decode and still get some useful
information. If I understand his intent correctly, then VAD/DTX, zero crossing
rate,
etc. would require decode processing on which he does not want to spend MIPS in
his
application.

Jeff Brower
DSP sw/hw engineer
Signalogic

wrote:
>
> Phillip et al:
>
> I'd read G.729 Annex B and also the information available f rom various
vendors on the web about their VAD/DTX schemes.
>
> The classic criteria for noise/speech differentiation are zero crossing rate,
spectral characteristics and amplitude. Decisions also take time into account;
it's better to say "voice" quicker than "noise" to avoid clipping the front edge
of the voice.
>
> I'm sure you'll find enough material on the web to fill in the details. If you
haven't registered with ITU for three free documents, you might want to do that.
>
> Chuck
>
> In a message dated 11/25/2002 9:32:25 AM Eastern Standard Time,
writes:
>
> >
> >
> > Phillip-
> >
> > > I would probably just look at the amplitude of the wave. When I say
> > > "intensity," I'm referring to how loud the sound is. I'm trying to
> > > distinguish between foreground and background noise.
> >
> > To some extent, decoding only the gain might be helpful. But without at
least some
> > general estimate of the spectrum, it would be hard to make any sort of
reliable
> > energy or power estimates.
> >
> > If you're trying to distinguish presence of speech from background noise
only, you
> > might look at voiced/unvoiced flags in the parameters. If you examine these
from a
> > statistical perspective, and combine with gain parameters, you might be able
to say
> > reliably "there's some speech here" or not.
> >
> > Jeff Brower
> > DSP sw/hw engineer
> > Signalogic
> >
> >
> > > >Phillip-
> > > >
> > > > > Using only the G.729 parameters (codebook values, codebook indices,
> > > > > pitch periods, and pitch and codebook gains) or the line spectral
> > > > > pairs, how can the intensity of a sound be determined?
> > > >
> > > >If you had an uncompressed waveform, how would you measure "intensity"?
> > > >
> > > >Jeff Brower
> > > >DSP sw/hw engineer
> > > >Signalogic




Jeff:

When you mention voiced/unvoiced flags, I do not know to what you refer.
Where might I find these in the data?

Also, is there any way which you can think of to determine whether a sound
is getting louder/softer or remaining fairly constant? This may not be very
helpful for distinguishing between fore- and background noise, but may be
useful to me, otherwise.

Thank you for your responses, by the way.

Phillip >To some extent, decoding only the gain might be helpful. But without at
>least some
>general estimate of the spectrum, it would be hard to make any sort of
>reliable
>energy or power estimates.
>
>If you're trying to distinguish presence of speech from background noise
>only, you
>might look at voiced/unvoiced flags in the parameters. If you examine
>these from a
>statistical perspective, and combine with gain parameters, you might be
>able to say
>reliably "there's some speech here" or not.
>
>Jeff Brower
>DSP sw/hw engineer
>Signalogic > > >Phillip-
> > >
> > > > Using only the G.729 parameters (codebook values, codebook indices,
> > > > pitch periods, and pitch and codebook gains) or the line spectral
> > > > pairs, how can the intensity of a sound be determined?
> > >
> > >If you had an uncompressed waveform, how would you measure "intensity"?
> > >
> > >Jeff Brower
> > >DSP sw/hw engineer
> > >Signalogic


_________________________________________________________________
Protect your PC - get McAfee.com VirusScan Online
http://clinic.mcafee.com/clinic/ibuy/campaign.asp?cid963


Charles-

In Phillip's original question it seemed that he wanted to avoid decode and just
look
at packet bits, and I assumed that would be because of MIPS reasons, or
complexity
reasons -- for example, maybe to make a quick estimate using FPGA logic rather
than a
uP. If he can decode, then yes I would agree that VAD, zero-crossing and other
low-cost MIPS methods could be used to obtain an accurate indicator of speech
presence and, if so, "intensity".

Jeff Brower
DSP sw/hw engineer
Signalogic chutton12000 wrote:
>
> Jeff:
>
> I looked back over the posts to the group and I don't see any mention
> of the criteria you mentioned below.
>
> If Phillip wants VAD (as he posted), then without further information
> from him about how much MOS degradation he'll tolerate, I'll assume
> he needs a standard VAD with zero crossing, spectral and gain
> information.
>
> A lot of that is free when you decode. Let's say he has enough MIPS
> (maybe 15, more or less) to do full G.729 decoding of continuous
> speech. He would almost certainly have the MIPS for VAD, as only the
> MIPS for simple zero crossings plus the final weighting of factors
> are added. A few MIPS would be saved once VAD is declared, right? So
> in the end, I do not think VAD should be an issue for MIPS.
>
> Would you agree?
>
> Chuck
>
> --- In speechcoding@y..., Jeff Brower <jbrower@s...> wrote:
> > Charles-
> >
> > Phillip's question was about how to avoid full decode and still get
> some useful
> > information. If I understand his intent correctly, then VAD/DTX,
> zero crossing rate,
> > etc. would require decode processing on which he does not want to
> spend MIPS in his
> > application.
> >
> > Jeff Brower
> > DSP sw/hw engineer
> > Signalogic
> >
> > CharlesH3@a... wrote:
> > >
> > > Phillip et al:
> > >
> > > I'd read G.729 Annex B and also the information available f rom
> various vendors on the web about their VAD/DTX schemes.
> > >
> > > The classic criteria for noise/speech differentiation are zero
> crossing rate, spectral characteristics and amplitude. Decisions also
> take time into account; it's better to say "voice" quicker
> than "noise" to avoid clipping the front edge of the voice.
> > >
> > > I'm sure you'll find enough material on the web to fill in the
> details. If you haven't registered with ITU for three free documents,
> you might want to do that.
> > >
> > > Chuck
> > >
> > > In a message dated 11/25/2002 9:32:25 AM Eastern Standard Time,
> jbrower@s... writes:
> > >
> > > >
> > > >
> > > > Phillip-
> > > >
> > > > > I would probably just look at the amplitude of the wave.
> When I say
> > > > > "intensity," I'm referring to how loud the sound is. I'm
> trying to
> > > > > distinguish between foreground and background noise.
> > > >
> > > > To some extent, decoding only the gain might be helpful. But
> without at least some
> > > > general estimate of the spectrum, it would be hard to make any
> sort of reliable
> > > > energy or power estimates.
> > > >
> > > > If you're trying to distinguish presence of speech from
> background noise only, you
> > > > might look at voiced/unvoiced flags in the parameters. If you
> examine these from a
> > > > statistical perspective, and combine with gain parameters, you
> might be able to say
> > > > reliably "there's some speech here" or not.
> > > >
> > > > Jeff Brower
> > > > DSP sw/hw engineer
> > > > Signalogic
> > > >
> > > >
> > > > > >Phillip-
> > > > > >
> > > > > > > Using only the G.729 parameters (codebook values,
> codebook indices,
> > > > > > > pitch periods, and pitch and codebook gains) or the line
> spectral
> > > > > > > pairs, how can the intensity of a sound be determined?
> > > > > >
> > > > > >If you had an uncompressed waveform, how would you
> measure "intensity"?
> > > > > >
> > > > > >Jeff Brower
> > > > > >DSP sw/hw engineer
> > > > > >Signalogic





Gentlemen:

To resolve the conflict: yes, I am trying to conserve MIPS. Were I
just dealing with one stream of sound at a time, the number of cycles
required for a full decode would be tolerable. However, the program
which I am researching this for will be multi-threaded and may deal
with hundreds of sound streams at any one time. Considering the
hardware available to me, performing a full decode on each of these
sound streams does not seem realistic (even less so when you take into
account that the ITU decoder is the only one available to me right now).

Jeff, I did look into the gain values like you suggested and found
that neither gain ever exceeds 94 during silence. I've been keeping a
short buffer of these values (5-10 frames is usually sufficient) to
determine whether any sound is present. This puts me 50-100ms behind
real-time and doesn't help me distinguish between foreground and
background noise, but at least it's something. I've mostly been
looking at lsp values to get some idea of the spectrum at which I'm
looking. They tend to bunch up (that is, the values of adjacent lsps
get closer to each other) with monotonal sounds or sounds with few
harmonics. Again, this isn't a great solution, but it's the best I
have at the moment.

Also, in a previous message, you mentioned something about "voiced
flags." I do not know what these are and was hoping you could give me
more information about them.

And I notice that you always put the word "intensity" in quotes. I
realize that this is not the correct word to use when referring to the
property which I described (i.e., loudness) and, so as to avoid future
confusions, I was wondering if you coud tell me which term I should
use. (I should tell you that what little programming background I
have is in an astrophysics research group. I have very little
experience with DSP, which should help explain why I must consult this
group for help and why I've failed to term "loudness" correctly. =])

Thanks for your help. Both of you.

Phillip > Charles-
>
> In Phillip's original question it seemed that he wanted to avoid
decode and just look
> at packet bits, and I assumed that would be because of MIPS reasons,
or complexity
> reasons -- for example, maybe to make a quick estimate using FPGA
logic rather than a
> uP. If he can decode, then yes I would agree that VAD,
zero-crossing and other
> low-cost MIPS methods could be used to obtain an accurate indicator
of speech
> presence and, if so, "intensity".
>
> Jeff Brower
> DSP sw/hw engineer
> Signalogic > chutton12000 wrote:
> >
> > Jeff:
> >
> > I looked back over the posts to the group and I don't see any mention
> > of the criteria you mentioned below.
> >
> > If Phillip wants VAD (as he posted), then without further information
> > from him about how much MOS degradation he'll tolerate, I'll assume
> > he needs a standard VAD with zero crossing, spectral and gain
> > information.
> >
> > A lot of that is free when you decode. Let's say he has enough MIPS
> > (maybe 15, more or less) to do full G.729 decoding of continuous
> > speech. He would almost certainly have the MIPS for VAD, as only the
> > MIPS for simple zero crossings plus the final weighting of factors
> > are added. A few MIPS would be saved once VAD is declared, right? So
> > in the end, I do not think VAD should be an issue for MIPS.
> >
> > Would you agree?
> >
> > Chuck
> >
> > --- In speechcoding@y..., Jeff Brower <jbrower@s...> wrote:
> > > Charles-
> > >
> > > Phillip's question was about how to avoid full decode and still get
> > some useful
> > > information. If I understand his intent correctly, then VAD/DTX,
> > zero crossing rate,
> > > etc. would require decode processing on which he does not want to
> > spend MIPS in his
> > > application.
> > >
> > > Jeff Brower
> > > DSP sw/hw engineer
> > > Signalogic
> > >
> > > CharlesH3@a... wrote:
> > > >
> > > > Phillip et al:
> > > >
> > > > I'd read G.729 Annex B and also the information available f rom
> > various vendors on the web about their VAD/DTX schemes.
> > > >
> > > > The classic criteria for noise/speech differentiation are zero
> > crossing rate, spectral characteristics and amplitude. Decisions also
> > take time into account; it's better to say "voice" quicker
> > than "noise" to avoid clipping the front edge of the voice.
> > > >
> > > > I'm sure you'll find enough material on the web to fill in the
> > details. If you haven't registered with ITU for three free documents,
> > you might want to do that.
> > > >
> > > > Chuck
> > > >
> > > > In a message dated 11/25/2002 9:32:25 AM Eastern Standard Time,
> > jbrower@s... writes:
> > > >
> > > > >
> > > > >
> > > > > Phillip-
> > > > >
> > > > > > I would probably just look at the amplitude of the wave.
> > When I say
> > > > > > "intensity," I'm referring to how loud the sound is. I'm
> > trying to
> > > > > > distinguish between foreground and background noise.
> > > > >
> > > > > To some extent, decoding only the gain might be helpful. But
> > without at least some
> > > > > general estimate of the spectrum, it would be hard to make any
> > sort of reliable
> > > > > energy or power estimates.
> > > > >
> > > > > If you're trying to distinguish presence of speech from
> > background noise only, you
> > > > > might look at voiced/unvoiced flags in the parameters. If you
> > examine these from a
> > > > > statistical perspective, and combine with gain parameters, you
> > might be able to say
> > > > > reliably "there's some speech here" or not.
> > > > >
> > > > > Jeff Brower
> > > > > DSP sw/hw engineer
> > > > > Signalogic
> > > > >
> > > > >
> > > > > > >Phillip-
> > > > > > >
> > > > > > > > Using only the G.729 parameters (codebook values,
> > codebook indices,
> > > > > > > > pitch periods, and pitch and codebook gains) or the line
> > spectral
> > > > > > > > pairs, how can the intensity of a sound be determined?
> > > > > > >
> > > > > > >If you had an uncompressed waveform, how would you
> > measure "intensity"?
> > > > > > >
> > > > > > >Jeff Brower
> > > > > > >DSP sw/hw engineer
> > > > > > >Signalogic


Phil-

"multithreaded ... hundreds of sound streams at any one time" -- sounds like a
PC is
your hardware. PC as voice processor is sort of circa 2000-2001 thinking. Ok
for
development, but bad for product.

Good work on the gain value usage. Sounds promising.

Regarding the voice/unvoice issue, one of our engineers reminds me that voicing
detection bits are not allocated in G.729x packets; the excitation is always the
addition of both adaptive and fixed codebook contributions. So you have to look
into
the algorithm enough to derive "voiced/unvoiced" meaning from codebook values.

As for sound "intensity", it means different things to different people under
different circumstances, so I was just trying to stay away from any sort of hard
definition.

Jeff Brower
DSP sw/hw engineer
Signalogic

Phil wrote:
>
> Gentlemen:
>
> To resolve the conflict: yes, I am trying to conserve MIPS. Were I
> just dealing with one stream of sound at a time, the number of cycles
> required for a full decode would be tolerable. However, the program
> which I am researching this for will be multi-threaded and may deal
> with hundreds of sound streams at any one time. Considering the
> hardware available to me, performing a full decode on each of these
> sound streams does not seem realistic (even less so when you take into
> account that the ITU decoder is the only one available to me right now).
>
> Jeff, I did look into the gain values like you suggested and found
> that neither gain ever exceeds 94 during silence. I've been keeping a
> short buffer of these values (5-10 frames is usually sufficient) to
> determine whether any sound is present. This puts me 50-100ms behind
> real-time and doesn't help me distinguish between foreground and
> background noise, but at least it's something. I've mostly been
> looking at lsp values to get some idea of the spectrum at which I'm
> looking. They tend to bunch up (that is, the values of adjacent lsps
> get closer to each other) with monotonal sounds or sounds with few
> harmonics. Again, this isn't a great solution, but it's the best I
> have at the moment.
>
> Also, in a previous message, you mentioned something about "voiced
> flags." I do not know what these are and was hoping you could give me
> more information about them.
>
> And I notice that you always put the word "intensity" in quotes. I
> realize that this is not the correct word to use when referring to the
> property which I described (i.e., loudness) and, so as to avoid future
> confusions, I was wondering if you coud tell me which term I should
> use. (I should tell you that what little programming background I
> have is in an astrophysics research group. I have very little
> experience with DSP, which should help explain why I must consult this
> group for help and why I've failed to term "loudness" correctly. =])
>
> Thanks for your help. Both of you.
>
> Phillip
>
> > Charles-
> >
> > In Phillip's original question it seemed that he wanted to avoid
> decode and just look
> > at packet bits, and I assumed that would be because of MIPS reasons,
> or complexity
> > reasons -- for example, maybe to make a quick estimate using FPGA
> logic rather than a
> > uP. If he can decode, then yes I would agree that VAD,
> zero-crossing and other
> > low-cost MIPS methods could be used to obtain an accurate indicator
> of speech
> > presence and, if so, "intensity".
> >
> > Jeff Brower
> > DSP sw/hw engineer
> > Signalogic
> >
> >
> > chutton12000 wrote:
> > >
> > > Jeff:
> > >
> > > I looked back over the posts to the group and I don't see any mention
> > > of the criteria you mentioned below.
> > >
> > > If Phillip wants VAD (as he posted), then without further information
> > > from him about how much MOS degradation he'll tolerate, I'll assume
> > > he needs a standard VAD with zero crossing, spectral and gain
> > > information.
> > >
> > > A lot of that is free when you decode. Let's say he has enough MIPS
> > > (maybe 15, more or less) to do full G.729 decoding of continuous
> > > speech. He would almost certainly have the MIPS for VAD, as only the
> > > MIPS for simple zero crossings plus the final weighting of factors
> > > are added. A few MIPS would be saved once VAD is declared, right? So
> > > in the end, I do not think VAD should be an issue for MIPS.
> > >
> > > Would you agree?
> > >
> > > Chuck
> > >
> > > --- In speechcoding@y..., Jeff Brower <jbrower@s...> wrote:
> > > > Charles-
> > > >
> > > > Phillip's question was about how to avoid full decode and still get
> > > some useful
> > > > information. If I understand his intent correctly, then VAD/DTX,
> > > zero crossing rate,
> > > > etc. would require decode processing on which he does not want to
> > > spend MIPS in his
> > > > application.
> > > >
> > > > Jeff Brower
> > > > DSP sw/hw engineer
> > > > Signalogic
> > > >
> > > > CharlesH3@a... wrote:
> > > > >
> > > > > Phillip et al:
> > > > >
> > > > > I'd read G.729 Annex B and also the information available f rom
> > > various vendors on the web about their VAD/DTX schemes.
> > > > >
> > > > > The classic criteria for noise/speech differentiation are zero
> > > crossing rate, spectral characteristics and amplitude. Decisions also
> > > take time into account; it's better to say "voice" quicker
> > > than "noise" to avoid clipping the front edge of the voice.
> > > > >
> > > > > I'm sure you'll find enough material on the web to fill in the
> > > details. If you haven't registered with ITU for three free documents,
> > > you might want to do that.
> > > > >
> > > > > Chuck
> > > > >
> > > > > In a message dated 11/25/2002 9:32:25 AM Eastern Standard Time,
> > > jbrower@s... writes:
> > > > >
> > > > > >
> > > > > >
> > > > > > Phillip-
> > > > > >
> > > > > > > I would probably just look at the amplitude of the wave.
> > > When I say
> > > > > > > "intensity," I'm referring to how loud the sound is. I'm
> > > trying to
> > > > > > > distinguish between foreground and background noise.
> > > > > >
> > > > > > To some extent, decoding only the gain might be helpful. But
> > > without at least some
> > > > > > general estimate of the spectrum, it would be hard to make any
> > > sort of reliable
> > > > > > energy or power estimates.
> > > > > >
> > > > > > If you're trying to distinguish presence of speech from
> > > background noise only, you
> > > > > > might look at voiced/unvoiced flags in the parameters. If you
> > > examine these from a
> > > > > > statistical perspective, and combine with gain parameters, you
> > > might be able to say
> > > > > > reliably "there's some speech here" or not.
> > > > > >
> > > > > > Jeff Brower
> > > > > > DSP sw/hw engineer
> > > > > > Signalogic
> > > > > >
> > > > > >
> > > > > > > >Phillip-
> > > > > > > >
> > > > > > > > > Using only the G.729 parameters (codebook values,
> > > codebook indices,
> > > > > > > > > pitch periods, and pitch and codebook gains) or the line
> > > spectral
> > > > > > > > > pairs, how can the intensity of a sound be determined?
> > > > > > > >
> > > > > > > >If you had an uncompressed waveform, how would you
> > > measure "intensity"?
> > > > > > > >
> > > > > > > >Jeff Brower
> > > > > > > >DSP sw/hw engineer
> > > > > > > >Signalogic
>
> _____________________________________
> Note: If you do a simple "reply" with your email client, only the author of
this message will receive your answer. You need to do a "reply all" if you want
your answer to be distributed to the entire group.
>
> _____________________________________
> About this discussion group:
>
> To Join:
>
> To Post:
>
> To Leave:
>
> Archives: http://www.yahoogroups.com/group/speechcoding
>
> Other DSP-Related Groups: http://www.dsprelated.com > ">http://docs.yahoo.com/info/terms/