speechcoding | Determining sound intensity from G.729 parameters

Using only the G.729 parameters (codebook values, codebook indices, pitch periods, and pitch and codebook gains) or the line spectral pairs, how can the intensity of a sound be determined?

Reply by Jeff Brower ●November 25, 20022002-11-25

Phillip- > Using only the G.729 parameters (codebook values, codebook indices, > pitch periods, and pitch and codebook gains) or the line spectral > pairs, how can the intensity of a sound be determined? If you had an uncompressed waveform, how would you measure "intensity"? Jeff Brower DSP sw/hw engineer Signalogic

Reply by michel memeteau ●November 25, 20022002-11-25

Hi I m developping a speech synthetiser on a TIc5402 and as I tried to compute my own LPC coefficients with Matlab but the result on LPC 10 sounds quite bad...... so does anybody know existings coefficients for voiced sounds so I could try a filter on my DSP?? cheers ===== Michel memeteau www.geocities.com/chelmitch ICQ : 138498950 www.freechelmi.fr.fm 0033 (0)621339362 0033(0)238770942 3 rue reginald 45000 orleans

Reply by Jeff Brower ●November 25, 20022002-11-25

Phillip- > I would probably just look at the amplitude of the wave. When I say > "intensity," I'm referring to how loud the sound is. I'm trying to > distinguish between foreground and background noise. To some extent, decoding only the gain might be helpful. But without at least some general estimate of the spectrum, it would be hard to make any sort of reliable energy or power estimates. If you're trying to distinguish presence of speech from background noise only, you might look at voiced/unvoiced flags in the parameters. If you examine these from a statistical perspective, and combine with gain parameters, you might be able to say reliably "there's some speech here" or not. Jeff Brower DSP sw/hw engineer Signalogic > >Phillip- > > > > > Using only the G.729 parameters (codebook values, codebook indices, > > > pitch periods, and pitch and codebook gains) or the line spectral > > > pairs, how can the intensity of a sound be determined? > > > >If you had an uncompressed waveform, how would you measure "intensity"? > > > >Jeff Brower > >DSP sw/hw engineer > >Signalogic

Reply by Jeff Brower ●November 25, 20022002-11-25

Charles- Phillip's question was about how to avoid full decode and still get some useful information. If I understand his intent correctly, then VAD/DTX, zero crossing rate, etc. would require decode processing on which he does not want to spend MIPS in his application. Jeff Brower DSP sw/hw engineer Signalogic wrote: > > Phillip et al: > > I'd read G.729 Annex B and also the information available f rom various vendors on the web about their VAD/DTX schemes. > > The classic criteria for noise/speech differentiation are zero crossing rate, spectral characteristics and amplitude. Decisions also take time into account; it's better to say "voice" quicker than "noise" to avoid clipping the front edge of the voice. > > I'm sure you'll find enough material on the web to fill in the details. If you haven't registered with ITU for three free documents, you might want to do that. > > Chuck > > In a message dated 11/25/2002 9:32:25 AM Eastern Standard Time, writes: > > > > > > > Phillip- > > > > > I would probably just look at the amplitude of the wave. When I say > > > "intensity," I'm referring to how loud the sound is. I'm trying to > > > distinguish between foreground and background noise. > > > > To some extent, decoding only the gain might be helpful. But without at least some > > general estimate of the spectrum, it would be hard to make any sort of reliable > > energy or power estimates. > > > > If you're trying to distinguish presence of speech from background noise only, you > > might look at voiced/unvoiced flags in the parameters. If you examine these from a > > statistical perspective, and combine with gain parameters, you might be able to say > > reliably "there's some speech here" or not. > > > > Jeff Brower > > DSP sw/hw engineer > > Signalogic > > > > > > > >Phillip- > > > > > > > > > Using only the G.729 parameters (codebook values, codebook indices, > > > > > pitch periods, and pitch and codebook gains) or the line spectral > > > > > pairs, how can the intensity of a sound be determined? > > > > > > > >If you had an uncompressed waveform, how would you measure "intensity"? > > > > > > > >Jeff Brower > > > >DSP sw/hw engineer > > > >Signalogic

Reply by Phillip Morrison ●November 26, 20022002-11-26

Jeff: When you mention voiced/unvoiced flags, I do not know to what you refer. Where might I find these in the data? Also, is there any way which you can think of to determine whether a sound is getting louder/softer or remaining fairly constant? This may not be very helpful for distinguishing between fore- and background noise, but may be useful to me, otherwise. Thank you for your responses, by the way. Phillip >To some extent, decoding only the gain might be helpful. But without at >least some >general estimate of the spectrum, it would be hard to make any sort of >reliable >energy or power estimates. > >If you're trying to distinguish presence of speech from background noise >only, you >might look at voiced/unvoiced flags in the parameters. If you examine >these from a >statistical perspective, and combine with gain parameters, you might be >able to say >reliably "there's some speech here" or not. > >Jeff Brower >DSP sw/hw engineer >Signalogic > > >Phillip- > > > > > > > Using only the G.729 parameters (codebook values, codebook indices, > > > > pitch periods, and pitch and codebook gains) or the line spectral > > > > pairs, how can the intensity of a sound be determined? > > > > > >If you had an uncompressed waveform, how would you measure "intensity"? > > > > > >Jeff Brower > > >DSP sw/hw engineer > > >Signalogic _________________________________________________________________ Protect your PC - get McAfee.com VirusScan Online http://clinic.mcafee.com/clinic/ibuy/campaign.asp?cid963

Reply by Jeff Brower ●November 27, 20022002-11-27

Charles- In Phillip's original question it seemed that he wanted to avoid decode and just look at packet bits, and I assumed that would be because of MIPS reasons, or complexity reasons -- for example, maybe to make a quick estimate using FPGA logic rather than a uP. If he can decode, then yes I would agree that VAD, zero-crossing and other low-cost MIPS methods could be used to obtain an accurate indicator of speech presence and, if so, "intensity". Jeff Brower DSP sw/hw engineer Signalogic chutton12000 wrote: > > Jeff: > > I looked back over the posts to the group and I don't see any mention > of the criteria you mentioned below. > > If Phillip wants VAD (as he posted), then without further information > from him about how much MOS degradation he'll tolerate, I'll assume > he needs a standard VAD with zero crossing, spectral and gain > information. > > A lot of that is free when you decode. Let's say he has enough MIPS > (maybe 15, more or less) to do full G.729 decoding of continuous > speech. He would almost certainly have the MIPS for VAD, as only the > MIPS for simple zero crossings plus the final weighting of factors > are added. A few MIPS would be saved once VAD is declared, right? So > in the end, I do not think VAD should be an issue for MIPS. > > Would you agree? > > Chuck > > --- In speechcoding@y..., Jeff Brower <jbrower@s...> wrote: > > Charles- > > > > Phillip's question was about how to avoid full decode and still get > some useful > > information. If I understand his intent correctly, then VAD/DTX, > zero crossing rate, > > etc. would require decode processing on which he does not want to > spend MIPS in his > > application. > > > > Jeff Brower > > DSP sw/hw engineer > > Signalogic > > > > CharlesH3@a... wrote: > > > > > > Phillip et al: > > > > > > I'd read G.729 Annex B and also the information available f rom > various vendors on the web about their VAD/DTX schemes. > > > > > > The classic criteria for noise/speech differentiation are zero > crossing rate, spectral characteristics and amplitude. Decisions also > take time into account; it's better to say "voice" quicker > than "noise" to avoid clipping the front edge of the voice. > > > > > > I'm sure you'll find enough material on the web to fill in the > details. If you haven't registered with ITU for three free documents, > you might want to do that. > > > > > > Chuck > > > > > > In a message dated 11/25/2002 9:32:25 AM Eastern Standard Time, > jbrower@s... writes: > > > > > > > > > > > > > > > Phillip- > > > > > > > > > I would probably just look at the amplitude of the wave. > When I say > > > > > "intensity," I'm referring to how loud the sound is. I'm > trying to > > > > > distinguish between foreground and background noise. > > > > > > > > To some extent, decoding only the gain might be helpful. But > without at least some > > > > general estimate of the spectrum, it would be hard to make any > sort of reliable > > > > energy or power estimates. > > > > > > > > If you're trying to distinguish presence of speech from > background noise only, you > > > > might look at voiced/unvoiced flags in the parameters. If you > examine these from a > > > > statistical perspective, and combine with gain parameters, you > might be able to say > > > > reliably "there's some speech here" or not. > > > > > > > > Jeff Brower > > > > DSP sw/hw engineer > > > > Signalogic > > > > > > > > > > > > > >Phillip- > > > > > > > > > > > > > Using only the G.729 parameters (codebook values, > codebook indices, > > > > > > > pitch periods, and pitch and codebook gains) or the line > spectral > > > > > > > pairs, how can the intensity of a sound be determined? > > > > > > > > > > > >If you had an uncompressed waveform, how would you > measure "intensity"? > > > > > > > > > > > >Jeff Brower > > > > > >DSP sw/hw engineer > > > > > >Signalogic

Reply by Phil ●November 29, 20022002-11-29

Gentlemen: To resolve the conflict: yes, I am trying to conserve MIPS. Were I just dealing with one stream of sound at a time, the number of cycles required for a full decode would be tolerable. However, the program which I am researching this for will be multi-threaded and may deal with hundreds of sound streams at any one time. Considering the hardware available to me, performing a full decode on each of these sound streams does not seem realistic (even less so when you take into account that the ITU decoder is the only one available to me right now). Jeff, I did look into the gain values like you suggested and found that neither gain ever exceeds 94 during silence. I've been keeping a short buffer of these values (5-10 frames is usually sufficient) to determine whether any sound is present. This puts me 50-100ms behind real-time and doesn't help me distinguish between foreground and background noise, but at least it's something. I've mostly been looking at lsp values to get some idea of the spectrum at which I'm looking. They tend to bunch up (that is, the values of adjacent lsps get closer to each other) with monotonal sounds or sounds with few harmonics. Again, this isn't a great solution, but it's the best I have at the moment. Also, in a previous message, you mentioned something about "voiced flags." I do not know what these are and was hoping you could give me more information about them. And I notice that you always put the word "intensity" in quotes. I realize that this is not the correct word to use when referring to the property which I described (i.e., loudness) and, so as to avoid future confusions, I was wondering if you coud tell me which term I should use. (I should tell you that what little programming background I have is in an astrophysics research group. I have very little experience with DSP, which should help explain why I must consult this group for help and why I've failed to term "loudness" correctly. =]) Thanks for your help. Both of you. Phillip > Charles- > > In Phillip's original question it seemed that he wanted to avoid decode and just look > at packet bits, and I assumed that would be because of MIPS reasons, or complexity > reasons -- for example, maybe to make a quick estimate using FPGA logic rather than a > uP. If he can decode, then yes I would agree that VAD, zero-crossing and other > low-cost MIPS methods could be used to obtain an accurate indicator of speech > presence and, if so, "intensity". > > Jeff Brower > DSP sw/hw engineer > Signalogic > chutton12000 wrote: > > > > Jeff: > > > > I looked back over the posts to the group and I don't see any mention > > of the criteria you mentioned below. > > > > If Phillip wants VAD (as he posted), then without further information > > from him about how much MOS degradation he'll tolerate, I'll assume > > he needs a standard VAD with zero crossing, spectral and gain > > information. > > > > A lot of that is free when you decode. Let's say he has enough MIPS > > (maybe 15, more or less) to do full G.729 decoding of continuous > > speech. He would almost certainly have the MIPS for VAD, as only the > > MIPS for simple zero crossings plus the final weighting of factors > > are added. A few MIPS would be saved once VAD is declared, right? So > > in the end, I do not think VAD should be an issue for MIPS. > > > > Would you agree? > > > > Chuck > > > > --- In speechcoding@y..., Jeff Brower <jbrower@s...> wrote: > > > Charles- > > > > > > Phillip's question was about how to avoid full decode and still get > > some useful > > > information. If I understand his intent correctly, then VAD/DTX, > > zero crossing rate, > > > etc. would require decode processing on which he does not want to > > spend MIPS in his > > > application. > > > > > > Jeff Brower > > > DSP sw/hw engineer > > > Signalogic > > > > > > CharlesH3@a... wrote: > > > > > > > > Phillip et al: > > > > > > > > I'd read G.729 Annex B and also the information available f rom > > various vendors on the web about their VAD/DTX schemes. > > > > > > > > The classic criteria for noise/speech differentiation are zero > > crossing rate, spectral characteristics and amplitude. Decisions also > > take time into account; it's better to say "voice" quicker > > than "noise" to avoid clipping the front edge of the voice. > > > > > > > > I'm sure you'll find enough material on the web to fill in the > > details. If you haven't registered with ITU for three free documents, > > you might want to do that. > > > > > > > > Chuck > > > > > > > > In a message dated 11/25/2002 9:32:25 AM Eastern Standard Time, > > jbrower@s... writes: > > > > > > > > > > > > > > > > > > > Phillip- > > > > > > > > > > > I would probably just look at the amplitude of the wave. > > When I say > > > > > > "intensity," I'm referring to how loud the sound is. I'm > > trying to > > > > > > distinguish between foreground and background noise. > > > > > > > > > > To some extent, decoding only the gain might be helpful. But > > without at least some > > > > > general estimate of the spectrum, it would be hard to make any > > sort of reliable > > > > > energy or power estimates. > > > > > > > > > > If you're trying to distinguish presence of speech from > > background noise only, you > > > > > might look at voiced/unvoiced flags in the parameters. If you > > examine these from a > > > > > statistical perspective, and combine with gain parameters, you > > might be able to say > > > > > reliably "there's some speech here" or not. > > > > > > > > > > Jeff Brower > > > > > DSP sw/hw engineer > > > > > Signalogic > > > > > > > > > > > > > > > > >Phillip- > > > > > > > > > > > > > > > Using only the G.729 parameters (codebook values, > > codebook indices, > > > > > > > > pitch periods, and pitch and codebook gains) or the line > > spectral > > > > > > > > pairs, how can the intensity of a sound be determined? > > > > > > > > > > > > > >If you had an uncompressed waveform, how would you > > measure "intensity"? > > > > > > > > > > > > > >Jeff Brower > > > > > > >DSP sw/hw engineer > > > > > > >Signalogic

Reply by Jeff Brower ●December 1, 20022002-12-01

Phil- "multithreaded ... hundreds of sound streams at any one time" -- sounds like a PC is your hardware. PC as voice processor is sort of circa 2000-2001 thinking. Ok for development, but bad for product. Good work on the gain value usage. Sounds promising. Regarding the voice/unvoice issue, one of our engineers reminds me that voicing detection bits are not allocated in G.729x packets; the excitation is always the addition of both adaptive and fixed codebook contributions. So you have to look into the algorithm enough to derive "voiced/unvoiced" meaning from codebook values. As for sound "intensity", it means different things to different people under different circumstances, so I was just trying to stay away from any sort of hard definition. Jeff Brower DSP sw/hw engineer Signalogic Phil wrote: > > Gentlemen: > > To resolve the conflict: yes, I am trying to conserve MIPS. Were I > just dealing with one stream of sound at a time, the number of cycles > required for a full decode would be tolerable. However, the program > which I am researching this for will be multi-threaded and may deal > with hundreds of sound streams at any one time. Considering the > hardware available to me, performing a full decode on each of these > sound streams does not seem realistic (even less so when you take into > account that the ITU decoder is the only one available to me right now). > > Jeff, I did look into the gain values like you suggested and found > that neither gain ever exceeds 94 during silence. I've been keeping a > short buffer of these values (5-10 frames is usually sufficient) to > determine whether any sound is present. This puts me 50-100ms behind > real-time and doesn't help me distinguish between foreground and > background noise, but at least it's something. I've mostly been > looking at lsp values to get some idea of the spectrum at which I'm > looking. They tend to bunch up (that is, the values of adjacent lsps > get closer to each other) with monotonal sounds or sounds with few > harmonics. Again, this isn't a great solution, but it's the best I > have at the moment. > > Also, in a previous message, you mentioned something about "voiced > flags." I do not know what these are and was hoping you could give me > more information about them. > > And I notice that you always put the word "intensity" in quotes. I > realize that this is not the correct word to use when referring to the > property which I described (i.e., loudness) and, so as to avoid future > confusions, I was wondering if you coud tell me which term I should > use. (I should tell you that what little programming background I > have is in an astrophysics research group. I have very little > experience with DSP, which should help explain why I must consult this > group for help and why I've failed to term "loudness" correctly. =]) > > Thanks for your help. Both of you. > > Phillip > > > Charles- > > > > In Phillip's original question it seemed that he wanted to avoid > decode and just look > > at packet bits, and I assumed that would be because of MIPS reasons, > or complexity > > reasons -- for example, maybe to make a quick estimate using FPGA > logic rather than a > > uP. If he can decode, then yes I would agree that VAD, > zero-crossing and other > > low-cost MIPS methods could be used to obtain an accurate indicator > of speech > > presence and, if so, "intensity". > > > > Jeff Brower > > DSP sw/hw engineer > > Signalogic > > > > > > chutton12000 wrote: > > > > > > Jeff: > > > > > > I looked back over the posts to the group and I don't see any mention > > > of the criteria you mentioned below. > > > > > > If Phillip wants VAD (as he posted), then without further information > > > from him about how much MOS degradation he'll tolerate, I'll assume > > > he needs a standard VAD with zero crossing, spectral and gain > > > information. > > > > > > A lot of that is free when you decode. Let's say he has enough MIPS > > > (maybe 15, more or less) to do full G.729 decoding of continuous > > > speech. He would almost certainly have the MIPS for VAD, as only the > > > MIPS for simple zero crossings plus the final weighting of factors > > > are added. A few MIPS would be saved once VAD is declared, right? So > > > in the end, I do not think VAD should be an issue for MIPS. > > > > > > Would you agree? > > > > > > Chuck > > > > > > --- In speechcoding@y..., Jeff Brower <jbrower@s...> wrote: > > > > Charles- > > > > > > > > Phillip's question was about how to avoid full decode and still get > > > some useful > > > > information. If I understand his intent correctly, then VAD/DTX, > > > zero crossing rate, > > > > etc. would require decode processing on which he does not want to > > > spend MIPS in his > > > > application. > > > > > > > > Jeff Brower > > > > DSP sw/hw engineer > > > > Signalogic > > > > > > > > CharlesH3@a... wrote: > > > > > > > > > > Phillip et al: > > > > > > > > > > I'd read G.729 Annex B and also the information available f rom > > > various vendors on the web about their VAD/DTX schemes. > > > > > > > > > > The classic criteria for noise/speech differentiation are zero > > > crossing rate, spectral characteristics and amplitude. Decisions also > > > take time into account; it's better to say "voice" quicker > > > than "noise" to avoid clipping the front edge of the voice. > > > > > > > > > > I'm sure you'll find enough material on the web to fill in the > > > details. If you haven't registered with ITU for three free documents, > > > you might want to do that. > > > > > > > > > > Chuck > > > > > > > > > > In a message dated 11/25/2002 9:32:25 AM Eastern Standard Time, > > > jbrower@s... writes: > > > > > > > > > > > > > > > > > > > > > > > Phillip- > > > > > > > > > > > > > I would probably just look at the amplitude of the wave. > > > When I say > > > > > > > "intensity," I'm referring to how loud the sound is. I'm > > > trying to > > > > > > > distinguish between foreground and background noise. > > > > > > > > > > > > To some extent, decoding only the gain might be helpful. But > > > without at least some > > > > > > general estimate of the spectrum, it would be hard to make any > > > sort of reliable > > > > > > energy or power estimates. > > > > > > > > > > > > If you're trying to distinguish presence of speech from > > > background noise only, you > > > > > > might look at voiced/unvoiced flags in the parameters. If you > > > examine these from a > > > > > > statistical perspective, and combine with gain parameters, you > > > might be able to say > > > > > > reliably "there's some speech here" or not. > > > > > > > > > > > > Jeff Brower > > > > > > DSP sw/hw engineer > > > > > > Signalogic > > > > > > > > > > > > > > > > > > > >Phillip- > > > > > > > > > > > > > > > > > Using only the G.729 parameters (codebook values, > > > codebook indices, > > > > > > > > > pitch periods, and pitch and codebook gains) or the line > > > spectral > > > > > > > > > pairs, how can the intensity of a sound be determined? > > > > > > > > > > > > > > > >If you had an uncompressed waveform, how would you > > > measure "intensity"? > > > > > > > > > > > > > > > >Jeff Brower > > > > > > > >DSP sw/hw engineer > > > > > > > >Signalogic > > _____________________________________ > Note: If you do a simple "reply" with your email client, only the author of this message will receive your answer. You need to do a "reply all" if you want your answer to be distributed to the entire group. > > _____________________________________ > About this discussion group: > > To Join: > > To Post: > > To Leave: > > Archives: http://www.yahoogroups.com/group/speechcoding > > Other DSP-Related Groups: http://www.dsprelated.com > ">http://docs.yahoo.com/info/terms/

Determining sound intensity from G.729 parameters

Sign in

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group