I'm trying to implement a simple denoising module based on wavelet filter-banks (DWT) with VAD. Noise is white, but I'm going to use other noises as well (car, babble, pink,..) 1) How can one estimate the noise variance on each subband? 2) Would it be better to build VAD on the base of decomposition or as stand-alone module (energy feature)? The main problem that I'm not sure how to use standard frame-by-frame (frame=30ms, overlap=15ms) energy calculation for the DWT decomposition as long as each level contains 2 times less samples than the previous. Thanks for taking the time reading my questions :)
Noise variance estimate
Started by ●February 3, 2009
Reply by ●February 3, 20092009-02-03
On Feb 4, 12:53�am, "DWT" <zwfilte...@yahoo.com> wrote:> I'm trying to implement a simple denoising module based on wavelet > filter-banks (DWT) with VAD. Noise is white, but I'm going to use other > noises as well (car, babble, pink,..) > > 1) How can one estimate the noise variance on each subband? > > 2) Would it be better to build VAD on the base of decomposition or as > stand-alone module (energy feature)? The main problem that I'm not sure how > to use standard frame-by-frame (frame=30ms, overlap=15ms) energy > calculation for the DWT decomposition as long as each level contains 2 > times less samples �than the previous. > > Thanks for taking the time reading my questions :)Sum of the squares divded by the number of points! You can also do this recursively - there are a number of methods. Hardy
Reply by ●February 4, 20092009-02-04
>Sum of the squares divded by the number of points! You can also do >this recursively - there are a number of methods. > > >HardyHardy, Thanks for the prompt reply. Is it possible to estimate noise variance in the frames with 'speech' data or only in the 'non-speech\silence' frames? Serge
Reply by ●February 4, 20092009-02-04
On Feb 4, 8:33 pm, "DWT" <zwfilte...@yahoo.com> wrote:> >Sum of the squares divded by the number of points! You can also do > >this recursively - there are a number of methods. > > >Hardy > > Hardy, > > Thanks for the prompt reply. Is it possible to estimate noise variance in > the frames with 'speech' data or only in the 'non-speech\silence' frames? > > SergeNormally noise would be present on its own when speech is absent. Therefore you would need a voice activity detector. Hardy
Reply by ●February 5, 20092009-02-05
On Feb 4, 8:33�am, "DWT" <zwfilte...@yahoo.com> wrote:> >Sum of the squares divded by the number of points! You can also do > >this recursively - there are a number of methods. > > >Hardy > > Hardy, > > Thanks for the prompt reply. Is it possible to estimate noise variance in > the frames with 'speech' data or only in the 'non-speech\silence' frames? > > SergeWhat you need is a model, not ad hoc procedures. illywhacker;
Reply by ●February 5, 20092009-02-05
On 3 Feb, 12:53, "DWT" <zwfilte...@yahoo.com> wrote:> I'm trying to implement a simple denoising module based on wavelet > filter-banks (DWT) with VAD. Noise is white, but I'm going to use other > noises as well (car, babble, pink,..)You need to first of all contemplate very carefullt what you mean by 'noise'. In the context of DSP the term is usually understood as a signal that has certain statistical properties, like with AWGN and pink noise. However, in day-to-day language 'noise' also means 'unwanted' or 'uninteresting' sound, which is interpreted subjectively according to the context: If your main interest is to talk with John, Jack's speaking on the phone in the corner of the room is 'noise', even if Jack calls somebody to make a deal on your behalf. If you and Jack were alone in the room, the same phone call of Jack's would be very interesting to you. So you need to discriminate between the 'signal of interest', which usually have one set of statistical properties, 'noise' in the DSP sense, which have other statistical properties, and 'interfering sources' which have the same kind of statistical properties as the signal of interest, but which are of no interest to you. It's usually not too difficult to get at least an impression of the amount of AWGN noise is present. You just have to look around your formulas and find some error terms here and there. These are usually related to noise. They are related to both numerical inaccuracy and model mis-match as well, so be careful not to over-interpret such factors, but you can get the overall impression. The difficult part is how to handle interfering sources, like the 'cars' and 'babble' you mention above. (I assume you by 'babble' mean people talking in the background.) These sources are very different, so it might be possible to separate sound from a car engine from a speach signal, but you will have problems separating background babble from a speech signal, as the only difference between them is which speaker you are interested in listening to. Similarly, speech ought to be separable from steady-state engine noise, but the sounds from two different engines or motors might not be separable. Rune