Phil- "multithreaded ... hundreds of sound streams at any one time" -- sounds like a PC is your hardware. PC as voice processor is sort of circa 2000-2001 thinking. Ok for development, but bad for product. Good work on the gain value usage. Sounds promising. Regarding the voice/unvoice issue, one of our engineers reminds me that voicing detection bits are not allocated in G.729x packets; the excitation is always the addition of both adaptive and fixed codebook contributions. So you have to look into the algorithm enough to derive "voiced/unvoiced" meaning from codebook values. As for sound "intensity", it means different things to different people under different circumstances, so I was just trying to stay away from any sort of hard definition. Jeff Brower DSP sw/hw engineer Signalogic Phil wrote: > > Gentlemen: > > To resolve the conflict: yes, I am trying to conserve MIPS. Were I > just dealing with one stream of sound at a time, the number of cycles > required for a full decode would be tolerable. However, the program > which I am researching this for will be multi-threaded and may deal > with hundreds of sound streams at any one time. Considering the > hardware available to me, performing a full decode on each of these > sound streams does not seem realistic (even less so when you take into > account that the ITU decoder is the only one available to me right now). > > Jeff, I did look into the gain values like you suggested and found > that neither gain ever exceeds 94 during silence. I've been keeping a > short buffer of these values (5-10 frames is usually sufficient) to > determine whether any sound is present. This puts me 50-100ms behind > real-time and doesn't help me distinguish between foreground and > background noise, but at least it's something. I've mostly been > looking at lsp values to get some idea of the spectrum at which I'm > looking. They tend to bunch up (that is, the values of adjacent lsps > get closer to each other) with monotonal sounds or sounds with few > harmonics. Again, this isn't a great solution, but it's the best I > have at the moment. > > Also, in a previous message, you mentioned something about "voiced > flags." I do not know what these are and was hoping you could give me > more information about them. > > And I notice that you always put the word "intensity" in quotes. I > realize that this is not the correct word to use when referring to the > property which I described (i.e., loudness) and, so as to avoid future > confusions, I was wondering if you coud tell me which term I should > use. (I should tell you that what little programming background I > have is in an astrophysics research group. I have very little > experience with DSP, which should help explain why I must consult this > group for help and why I've failed to term "loudness" correctly. =]) > > Thanks for your help. Both of you. > > Phillip > > > Charles- > > > > In Phillip's original question it seemed that he wanted to avoid > decode and just look > > at packet bits, and I assumed that would be because of MIPS reasons, > or complexity > > reasons -- for example, maybe to make a quick estimate using FPGA > logic rather than a > > uP. If he can decode, then yes I would agree that VAD, > zero-crossing and other > > low-cost MIPS methods could be used to obtain an accurate indicator > of speech > > presence and, if so, "intensity". > > > > Jeff Brower > > DSP sw/hw engineer > > Signalogic > > > > > > chutton12000 wrote: > > > > > > Jeff: > > > > > > I looked back over the posts to the group and I don't see any mention > > > of the criteria you mentioned below. > > > > > > If Phillip wants VAD (as he posted), then without further information > > > from him about how much MOS degradation he'll tolerate, I'll assume > > > he needs a standard VAD with zero crossing, spectral and gain > > > information. > > > > > > A lot of that is free when you decode. Let's say he has enough MIPS > > > (maybe 15, more or less) to do full G.729 decoding of continuous > > > speech. He would almost certainly have the MIPS for VAD, as only the > > > MIPS for simple zero crossings plus the final weighting of factors > > > are added. A few MIPS would be saved once VAD is declared, right? So > > > in the end, I do not think VAD should be an issue for MIPS. > > > > > > Would you agree? > > > > > > Chuck > > > > > > --- In speechcoding@y..., Jeff Brower <jbrower@s...> wrote: > > > > Charles- > > > > > > > > Phillip's question was about how to avoid full decode and still get > > > some useful > > > > information. If I understand his intent correctly, then VAD/DTX, > > > zero crossing rate, > > > > etc. would require decode processing on which he does not want to > > > spend MIPS in his > > > > application. > > > > > > > > Jeff Brower > > > > DSP sw/hw engineer > > > > Signalogic > > > > > > > > CharlesH3@a... wrote: > > > > > > > > > > Phillip et al: > > > > > > > > > > I'd read G.729 Annex B and also the information available f rom > > > various vendors on the web about their VAD/DTX schemes. > > > > > > > > > > The classic criteria for noise/speech differentiation are zero > > > crossing rate, spectral characteristics and amplitude. Decisions also > > > take time into account; it's better to say "voice" quicker > > > than "noise" to avoid clipping the front edge of the voice. > > > > > > > > > > I'm sure you'll find enough material on the web to fill in the > > > details. If you haven't registered with ITU for three free documents, > > > you might want to do that. > > > > > > > > > > Chuck > > > > > > > > > > In a message dated 11/25/2002 9:32:25 AM Eastern Standard Time, > > > jbrower@s... writes: > > > > > > > > > > > > > > > > > > > > > > > Phillip- > > > > > > > > > > > > > I would probably just look at the amplitude of the wave. > > > When I say > > > > > > > "intensity," I'm referring to how loud the sound is. I'm > > > trying to > > > > > > > distinguish between foreground and background noise. > > > > > > > > > > > > To some extent, decoding only the gain might be helpful. But > > > without at least some > > > > > > general estimate of the spectrum, it would be hard to make any > > > sort of reliable > > > > > > energy or power estimates. > > > > > > > > > > > > If you're trying to distinguish presence of speech from > > > background noise only, you > > > > > > might look at voiced/unvoiced flags in the parameters. If you > > > examine these from a > > > > > > statistical perspective, and combine with gain parameters, you > > > might be able to say > > > > > > reliably "there's some speech here" or not. > > > > > > > > > > > > Jeff Brower > > > > > > DSP sw/hw engineer > > > > > > Signalogic > > > > > > > > > > > > > > > > > > > >Phillip- > > > > > > > > > > > > > > > > > Using only the G.729 parameters (codebook values, > > > codebook indices, > > > > > > > > > pitch periods, and pitch and codebook gains) or the line > > > spectral > > > > > > > > > pairs, how can the intensity of a sound be determined? > > > > > > > > > > > > > > > >If you had an uncompressed waveform, how would you > > > measure "intensity"? > > > > > > > > > > > > > > > >Jeff Brower > > > > > > > >DSP sw/hw engineer > > > > > > > >Signalogic > > _____________________________________ > Note: If you do a simple "reply" with your email client, only the author of this message will receive your answer. You need to do a "reply all" if you want your answer to be distributed to the entire group. > > _____________________________________ > About this discussion group: > > To Join: > > To Post: > > To Leave: > > Archives: http://www.yahoogroups.com/group/speechcoding > > Other DSP-Related Groups: http://www.dsprelated.com > ">http://docs.yahoo.com/info/terms/ |