Hi All, I wish to make a AGC system targeted for decoded speech signals in mobile phones. I went through many of Jerry Avin's previous posts on this topic and tried to work in those directions. However things are not that easy when it comes to real time controlling of gain of the speech singals. From the previous posts I can summarise following: 1) If the gain of the singal is higher than the average level at which we want to operate, we need to decrese the gain. This decrease has to happen quickly. 2) If the signal gain is lower than some threshold, we need to increase the gain. This has to be done gradually. 3) There are some dead zones where no gain control be applied. Now my doubts are as follows: 1) Which method should be used to analyse the signal level. Is peak detection the right method of estimating the signal energy variation, or the average / moving average (with a history of say prev 10 frames/ 1600 samples)of the singal energy gives the right picture. 2) What are dead zones? Do they mean the pauses in the speech signal where the speaker does not speak anything? Should AGC be active during pauses? As far as I conceive, AGC should remain active during pauses because that would ensure no discontinuity between the speech segment and the pauses. If we keep AGC ON guring some speech segment and make it OFF as soon as we detect that we have silence, that would create some sort of discontinuity in the overall volume of the flow of the speech. So the noise during the pauses hsould also be corrected accordingly. 3) Shouldn't our decision making mechanism take into account value of noise floor. Could some one please explain to me how do I go about this. I am finding this all too tricky. Each time I have to tune my attack rate or the threshold or the noise floor level. It works sometimes for some speech signal BUT then sounds arbit for some other speech signal. I am a bit confused and dont know hoe to go about it. I need your help with this. Just for your information I have set following values for my experiment: * a FRAME is 20ms long consisting of 160 samples. a ) ATTACK_SLOPE -0.0322 dB per frame (frame = 160)samples b ) HANG_OVER_PERIOD 30 frames (Observation time where u analyze and make some decision). c ) NOISE_LEVLE -95 dB. (avg. noise floor level) d ) TOLERANCE_LEVEL 20 dB (The diff between noise floor and peak detection curve. Anything less than this should be treated as noise.) Thanks and Regards, Anoop Deoras Sasken, India Email : adeoras [at] gmail [dot] com
Automatic Gain Control
Started by ●March 26, 2005
Reply by ●March 26, 20052005-03-26
"EC-AKD" <akd_ecc@yahoo.com> wrote in message news:2f592dd3.0503252254.79c9e214@posting.google.com...> Hi All, > > I wish to make a AGC system targeted for decoded speech signals in > mobile phones. I went through many of Jerry Avin's previous posts on > this topic and tried to work in those directions. However things are > not that easy when it comes to real time controlling of gain of the > speech singals. > From the previous posts I can summarise following: > > 1) If the gain of the singal is higher than the average level at > which we want to operate, we need to decrese the gain. This decrease > has to happen quickly. > 2) If the signal gain is lower than some threshold, we need to > increase the gain. This has to be done gradually. > 3) There are some dead zones where no gain control be applied. > > > Now my doubts are as follows: > > 1) Which method should be used to analyse the signal level. Is peak > detection the right method of estimating the signal energy variation, > or the average / moving average (with a history of say prev 10 frames/ > 1600 samples)of the singal energy gives the right picture. > > 2) What are dead zones? Do they mean the pauses in the speech signal > where the speaker does not speak anything? Should AGC be active during > pauses? > As far as I conceive, AGC should remain active during pauses because > that would ensure no discontinuity between the speech segment and the > pauses. > If we keep AGC ON guring some speech segment and make it OFF as soon > as we detect that we have silence, that would create some sort of > discontinuity in the overall volume of the flow of the speech. So the > noise during the pauses hsould also be corrected accordingly. > > 3) Shouldn't our decision making mechanism take into account value of > noise floor. > > Could some one please explain to me how do I go about this. I am > finding this all too tricky. Each time I have to tune my attack rate > or the threshold or the noise floor level. It works sometimes for some > speech signal BUT then sounds arbit for some other speech signal. I am > a bit confused and dont know hoe to go about it. I need your help with > this. > > Just for your information I have set following values for my > experiment: > * a FRAME is 20ms long consisting of 160 samples. > > a ) ATTACK_SLOPE -0.0322 dB per frame (frame = 160)samples > b ) HANG_OVER_PERIOD 30 frames (Observation time where u analyze > and make some decision). > c ) NOISE_LEVLE -95 dB. (avg. noise floor level) > d ) TOLERANCE_LEVEL 20 dB (The diff between noise floor and peak > detection curve. Anything less than this should be treated as noise.) > > > Thanks and Regards, > Anoop Deoras > Sasken, India > Email : adeoras [at] gmail [dot] comAssume yin is the input to the AGC (say a speech signal) and yout is the AGC output. loop forever { /* Voltage controlled amplifier is just a multiplier here */ yout=yin*iout /* error */ err=spoint-abs(yout) /* Integrate */ iout1=iout iout=iout1+gain*err } Where in the above 'gain' needs to be found by experiment. Too large and the envelope will be compressed and the loop will go unstable. If the gain is too low then the loop will be slow to repond. Try gain=0.001 for starters though the gain value for stability depends on the size of yin. spoint is a set point and can be taken to be 1.0. yout is the AGCd output. regardsDr Tam
Reply by ●March 26, 20052005-03-26
in article 2f592dd3.0503252254.79c9e214@posting.google.com, EC-AKD at akd_ecc@yahoo.com wrote on 03/26/2005 01:54:> I wish to make a AGC system targeted for decoded speech signals in > mobile phones. I went through many of Jerry Avin's previous posts on > this topic and tried to work in those directions. However things are > not that easy when it comes to real time controlling of gain of the > speech singals. > From the previous posts I can summarise following: > > 1) If the gain of the singal is higher than the average level at > which we want to operate, we need to decrese the gain. This decrease > has to happen quickly.this means a fast attack time constant ...> 2) If the signal gain is lower than some threshold, we need to > increase the gain. This has to be done gradually.... and a slow release time constant> 3) There are some dead zones where no gain control be applied.that gets defined in your compression curve.> Now my doubts are as follows: > > 1) Which method should be used to analyse the signal level. Is peak > detection the right method of estimating the signal energy variation, > or the average / moving average (with a history of say prev 10 frames/ > 1600 samples)of the singal energy gives the right picture.it depends on what you're trying to accomplish with your AGC. if the purpose was to adjust the level so that clipping does not happen, then peak detection might be appropriate. if the purpose is to even out the loud and quiet segments of sound so that a person can hear it better, then some dB loudness detection might be the right thing. that might be a mean square measure (energy) and perhaps with a perceptual weighting filter (maybe "A weighting") applied before that. you can also make it some combination of the two, peak level and perceptual loudness detection.> 2) What are dead zones? Do they mean the pauses in the speech signal > where the speaker does not speak anything? Should AGC be active during > pauses?besides limiting at the top end, you might want to have a gate at the bottom end so that ambient noise doesn't get transmitted when the speaker isn't talking. maybe not. i dunno what your specs or application is. (radio people used to call this "squelch". audio folks call it "gating".)> As far as I conceive, AGC should remain active during pauses because > that would ensure no discontinuity between the speech segment and the > pauses.you can have it kick in and out smoothly. in fact, strictly speaking, it is always active, even when the gain is unaffected, as long as it is measuring level and *will* react when your level thresholds are crossed.> If we keep AGC ON guring some speech segment and make it OFF as soon > as we detect that we have silence, that would create some sort of > discontinuity in the overall volume of the flow of the speech.it can (and should) be done smoothly. that is part of the calculus of attack and release time.> So the > noise during the pauses hsould also be corrected accordingly. > > 3) Shouldn't our decision making mechanism take into account value of > noise floor.if you have the means of measuring it. perhaps you want the user to specify it.> Could some one please explain to me how do I go about this. I am > finding this all too tricky. Each time I have to tune my attack rate > or the threshold or the noise floor level. It works sometimes for some > speech signal BUT then sounds arbit for some other speech signal. I am > a bit confused and dont know hoe to go about it. I need your help with > this. > > Just for your information I have set following values for my > experiment: > * a FRAME is 20ms long consisting of 160 samples. > > a ) ATTACK_SLOPE -0.0322 dB per frame (frame = 160)samplesi think you should consider these parameters in units that do not depend on your sample rate. something per second. then, knowing the sample rate, map that to something per sample for your implementation.> b ) HANG_OVER_PERIOD 30 frames (Observation time where u analyze > and make some decision). > c ) NOISE_LEVLE -95 dB. (avg. noise floor level) > d ) TOLERANCE_LEVEL 20 dB (The diff between noise floor and peak > detection curve. Anything less than this should be treated as noise.)so how do you want to treat noise? gate it to zero? or maybe to some other reduced gain? or leave it alone? even if you process "frames" or "segments" or "blocks" or "chunks" or "buffers" (there are lotsa names for it) of audio, your algorithm should be a samply-by-sample algorithm to make sure its behavior is smooth. other than a throughput delay, it shouldn't matter to the sound what blocksize you end up using (increased blocksize *might* decrease computational expense at the cost of increased delay). are you doing this in MATLAB or C or something like that for simulation? -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
Reply by ●March 27, 20052005-03-27
you want to do some googling using terms like "audio compressor" "dynamic range compression" . See some of the products made by dBx such as the dbx160. also look up "orban optimod" heres more, some may be old but look around there ... http://www.harmony-central.com/Effects/effects-explained.html http://www.harmony-central.com/Effects/Articles/Expansion/ http://www.harmony-central.com/Effects/Articles/Compression/ http://www.alesis.com/ (see beginners guide to compression) Mark
Reply by ●March 28, 20052005-03-28
"Mark" <makolber@yahoo.com> wrote in message news:1111897578.535028.149910@f14g2000cwb.googlegroups.com...> you want to do some googling using terms like "audio compressor" > "dynamic range compression" . See some of the products made by dBx > such as the dbx160. also look up "orban optimod" > > heres more, some may be old but look around there ... > http://www.harmony-central.com/Effects/effects-explained.html > http://www.harmony-central.com/Effects/Articles/Expansion/ > http://www.harmony-central.com/Effects/Articles/Compression/ > http://www.alesis.com/ (see beginners guide to compression)I'll add one more: http://www.rane.com/pdf/note141.pdf
Reply by ●March 28, 20052005-03-28
"EC-AKD" <akd_ecc@yahoo.com> wrote in message news:2f592dd3.0503252254.79c9e214@posting.google.com...> Hi All, > > > 2) What are dead zones? Do they mean the pauses in the speech signal > where the speaker does not speak anything? Should AGC be active during > pauses? > As far as I conceive, AGC should remain active during pauses because > that would ensure no discontinuity between the speech segment and the > pauses. > If we keep AGC ON guring some speech segment and make it OFF as soon > as we detect that we have silence, that would create some sort of > discontinuity in the overall volume of the flow of the speech. So the > noise during the pauses hsould also be corrected accordingly.Sometimes you want to freeze the changes in the AGC, keeping the gain constant. This is sometimes called "hold". This can be used so as not to crank up the volume unduly during "silence" (which in reality, will probably be the noise floor and not digital zeros).
Reply by ●March 28, 20052005-03-28
One thing to be aware of is "comfort noise". If you detect low noise=non-speech don't set it to zeros but add a little white noise to let the listener know that the line is still active and not disconnected. Morphing from this "comfort noise" level back to speech and vice-versa is much less jarring than to zero level. Sometimes "Noise" is a good thing. -- Chip Wood "Jon Harris" <goldentully@hotmail.com> wrote in message> > 2) What are dead zones? Do they mean the pauses in thespeech signal> > where the speaker does not speak anything? Should AGC beactive during> > pauses? > > As far as I conceive, AGC should remain active duringpauses because> > that would ensure no discontinuity between the speechsegment and the> > pauses. > > If we keep AGC ON guring some speech segment and make itOFF as soon> > as we detect that we have silence, that would createsome sort of> > discontinuity in the overall volume of the flow of thespeech. So the> > noise during the pauses hsould also be correctedaccordingly.> > Sometimes you want to freeze the changes in the AGC,keeping the gain constant.> This is sometimes called "hold". This can be used so asnot to crank up the> volume unduly during "silence" (which in reality, willprobably be the noise> floor and not digital zeros). > >
Reply by ●March 30, 20052005-03-30
robert bristow-johnson <rbj@audioimagination.com> wrote in message news:<BE6B7E38.59CF%rbj@audioimagination.com>...> in article 2f592dd3.0503252254.79c9e214@posting.google.com, EC-AKD at > akd_ecc@yahoo.com wrote on 03/26/2005 01:54: > > > Now my doubts are as follows: > > > > 1) Which method should be used to analyse the signal level. Is peak > > detection the right method of estimating the signal energy variation, > > or the average / moving average (with a history of say prev 10 frames/ > > 1600 samples)of the singal energy gives the right picture. > > it depends on what you're trying to accomplish with your AGC. if the > purpose was to adjust the level so that clipping does not happen, then peak > detection might be appropriate. if the purpose is to even out the loud and > quiet segments of sound so that a person can hear it better, then some dB > loudness detection might be the right thing. that might be a mean square > measure (energy) and perhaps with a perceptual weighting filter (maybe "A > weighting") applied before that. you can also make it some combination of > the two, peak level and perceptual loudness detection. >I calculated the Peak level and also the moving average of energy of the signal (with a history of past 1.5 sec speech segment). Taking average of these two I can now crudely see the general behaviour of the speech. However still deciding on some threshold value is tough. Sometimes I feel that I may also need to tune the threshold values depending upon the speech I am working on. I am coding in C and use matlab for plotting the curves. I generated a speech waveform by talking into a microphone from some distance and recorded it.. after some time I gradually came closer to the microphone so as to increase the overall energy. Now my aim is to give a boost to that signal which was recorded from some distance. Such would be a scenario when the user would use a hands free mobile. I am assuming that the change from faded speech to loud speech is not drastic and not much variaton is there.. as in a speech seg of 1 min, 30 sec would be faded speech and remaining 30 sec would be loud speech. Initially to get going, I wish to correct the faded 30 sec speech by assuming some thresholds (some dB value). Later I may be able to tune the thresholds properly. As I am doing it all alone, I feel lost when I dont see it going anywhere. Suggestions and help from usenet would be really appreciated. Thanks Anoop
Reply by ●March 31, 20052005-03-31
in article 2f592dd3.0503300144.7544fb03@posting.google.com, EC-AKD at akd_ecc@yahoo.com wrote on 03/30/2005 04:44:> As I am doing it all alone, I feel lost when I dont see it going > anywhere. Suggestions and help from usenet would be really > appreciated.can't help much at a distance. i can only suggest, if you have the sound recorded on some kind of wave editor (like Cool Edit), that you select the quiet pieces and manually boost them to what sounds right, then look at your measured loudness and the dB boost for that section and see if you can construct a heuristic curve that maps dB loudness (relative to fullscale) to dB of boost. rots o' ruk. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
Reply by ●April 1, 20052005-04-01
And of course, most editors like Cool Edit include their own dynamics processor (which = compressor, among other things); this should give a very good idea of the sort of parameters needed. The Cool Edit Pro dynamics processor offers the choice of RMS or Peak envelope detection. Also, there is a code example (in C) for a hard-knee compressor in the musicdsp archive: http://www.musicdsp.org/archive.php Note that in compressors for music, there is a standard technique to delay the direct signal by at least half the size of the window used to extract the envelope so that an incoming peak (e.g. a drum hit or plosive speech consonant) is wholly reduced, rather than having the compressor react late, so to speak. A typical envelope window may only be 10msecs or so, which is not particularly disturbing in real-time streaming. Richard Dobson robert bristow-johnson wrote:> in article 2f592dd3.0503300144.7544fb03@posting.google.com, EC-AKD at > akd_ecc@yahoo.com wrote on 03/30/2005 04:44: > > >>As I am doing it all alone, I feel lost when I dont see it going >>anywhere. Suggestions and help from usenet would be really >>appreciated. > > > can't help much at a distance. i can only suggest, if you have the sound > recorded on some kind of wave editor (like Cool Edit), that you select the > quiet pieces and manually boost them to what sounds right, then look at your > measured loudness and the dB boost for that section and see if you can > construct a heuristic curve that maps dB loudness (relative to fullscale) to > dB of boost. > > rots o' ruk. >






