Hello, I have a question regarding the computation of SNR. The case I am working on is a speech processing algorithm which suppresses background noise. The input signal is a noisy speech sequence defined as x[k]=s[k]+n[k] where s is clean speech and n is quasi-stationary pink noise. I would like to know how well the algorithm performs. The SNR of the input signal is calculated as: SNR*log10(var(s)/var(n)) where var() stands for variance. To obtain an estimate of how well the algorithm performs, I also need to calculate the SNR of the output. I do this by setting x[k]=n[k]. In that way I obtain the processed background noise Yn[k]. I then set x[k]=s[k]+n[k] and obtain the estimated clean speech Ys[k]. Then I calculate: SNR*log10(var(Ys)/var(Yn)) Any comments to this approach? Cheers...

# Question regarding the validity of a SNR calculation

Started by ●September 24, 2006

Posted by ●September 24, 2006

John wrote:> Hello, > > I have a question regarding the computation of SNR. > > The case I am working on is a speech processing algorithm which > suppresses background noise. The input signal is a noisy speech > sequence defined as x[k]=s[k]+n[k] where s is clean speech and > n is quasi-stationary pink noise. > > I would like to know how well the algorithm performs. > > The SNR of the input signal is calculated as: > > SNR*log10(var(s)/var(n)) > > where var() stands for variance. > > To obtain an estimate of how well the algorithm performs, > I also need to calculate the SNR of the output. > > I do this by setting x[k]=n[k]. In that way I obtain the > processed background noise Yn[k]. > > I then set x[k]=s[k]+n[k] and obtain the estimated > clean speech Ys[k]. > > Then I calculate: > > SNR*log10(var(Ys)/var(Yn)) > > > Any comments to this approach? > > > Cheers...I have a practical comment. The 10log10 statements will only be true if you are measuring power. If you are measuring voltage (far more common) or even current then you'll need to use 20log10 [x] (I know that only really true if the load resistance for each element [the signal source and the signal + noise source] is the same, but in this case it should work perfectly.) Cheers PeteS

Posted by ●September 24, 2006

> I have a practical comment. The 10log10 statements will only be true if> you are measuring power. If you are measuring voltage (far more common) > or even current then you'll need to use 20log10 [x] >Thanks for replying :-) Is variance not the same as power? The power spectrum shows the distribution of variance over a set of frequencies. Right? As far as I remember you use a multiplication factor of 20 when you are measuring the ratio of squared amplitude, that is : SNR= 20 log10 (S_amplitude^2 / N_amplitude^2) Correct me if I am mistaken :-)

Posted by ●September 24, 2006

John wrote:>> I have a practical comment. The 10log10 statements will only be true if >> you are measuring power. If you are measuring voltage (far more common) >> or even current then you'll need to use 20log10 [x] >> > > Thanks for replying :-) > > Is variance not the same as power? The power spectrum shows the distribution > of variance over a set of frequencies. Right?If the average value is zero, yes. Otherwise the power is greater than the variance. -- Mark Borgerding

Posted by ●September 24, 2006

> If the average value is zero, yes. Otherwise the power is greater than> the variance.Well, in this case the average value is zero (speech signals). Thanks for replying :-)

Posted by ●September 24, 2006

John wrote:> > I have a practical comment. The 10log10 statements will only be true if > > you are measuring power. If you are measuring voltage (far more common) > > or even current then you'll need to use 20log10 [x] > > > > Thanks for replying :-) > > Is variance not the same as power? The power spectrum shows the distribution > of variance over a set of frequencies. Right? > > As far as I remember you use a multiplication factor of 20 when you are > measuring the ratio of squared amplitude, that is : > > SNR= 20 log10 (S_amplitude^2 / N_amplitude^2) > > Correct me if I am mistaken :-)Well, I would usually take 10log10 (SPwr/NPwr) or 20 log10 (S Amplitude/N Amplitude) which is the same as 10log10 (S Amplitude^2/N Amplitude^2) assuming equal resistances in the power system>From the basic identity that log a^2 = 2 log a, of course; moregenerally that log a^x = x log a. Cheers PeteS

Posted by ●September 24, 2006

> To obtain an estimate of how well the algorithm performs,> I also need to calculate the SNR of the output. > > I do this by setting x[k]=n[k]. In that way I obtain the > processed background noise Yn[k]. > > I then set x[k]=s[k]+n[k] and obtain the estimated > clean speech Ys[k]. > > Then I calculate: > > SNR*log10(var(Ys)/var(Yn)) > > Any comments to this approach?I don't have experience in this sort of quantitative evaluation of noise reduction algorithms, but here's my take on this. With this calculation method, it seems to me an algorithm could get an arbitrarily high "SNR" just by detecting the presence of speech and boosting overall output volume when speech is present. I suggest that you find the distance between Ys[k] and s[k] instead. For example, calculate the rms difference between them. If you have the time to get into it, a more perceptual distance measure might be better than rms. Unfortunately I can't think of any references offhand for perceptual distance measures but I believe various people have developed code for them to aid in the evaluation of speech compression or noise reduction methods. (One idea might be to apply an A-weighting filter to the signals before computing the rms distance.) (I am assuming your work is aimed at helping human listeners rather than, e.g., improving computer speech recognition accuracy. Even for the latter, A-weighting is not unreasonable.) I think the best measurement of all is tests with human listeners (e.g., having listeners rate the quality of the processed and unprocessed output to obtain Mean Opinion Scores, or testing intelligibility if improving intelligibility is what you are after). But that may be time-consuming or expensive. By the way I have some noise reduction code linked at http://www.icsi.berkeley.edu/Speech/papers/gelbart-ms/pointers/ which you might find interesting for comparison. Good luck, David

Posted by ●September 24, 2006

this-email-address-is-invalid wrote:> I suggest that you find the distance between Ys[k] and s[k] instead. > For example, calculate the rms difference between them. If you have > the time to get into it, a more perceptual distance measure might be > better than rms. Unfortunately I can't think of any references offhand > for perceptual distance measures but I believe various people have > developed code for them to aid in the evaluation of speech compression > or noise reduction methods. (One idea might be to apply an A-weighting > filter to the signals before computing the rms distance.) >For distance calculation, Bryan Pellom's Objective Speech Quality Assessment toolkit at http://cslr.colorado.edu/rspl/rspl_software.html might be of interest.

Posted by ●September 24, 2006

Thanks for all the links :-) I appreciate it.. The reason why I asked about how SNR should be calculated is that it seems "wrong" to me to calculate the error signal as the difference between the original clean speech signal and the estimated speech signal. While it _is_ a true error signal, it doesn't make any sense to calculate SNR using this kind of error signal as a reference in this context. Why? First of all, the SNR of the input is calculated as: 10log10(var(s)/var(n)) If we assume that the algorithm A operates on the input signal in a close-to-linear way and the input signal is defined as x=s+n, then A(x)=A(s)+A(n) The SNR of the output should then - in the name of consistency - be calculated as 10log10(var(A(s))/var(A(n))) A(n) is obtained by sending the noise component for a given SNR input signal through the algorithm; that is setting x=n for a given SNR. I haven't thought in detail about the validity about this approach and this is why I am posting the question. But intuitively it seems like the right approach. To verify the "linear" properties of the algorithm I tried to set x=s+n for a given SNR and saved the output O1 of the algorithm. I then set x=n for the same SNR and saved that output O2. I then played O1-O2 and it definitely sounds better than O1 so I guess that implies that O2 _is_ the remaining noise component after processing. I don't know if I make any sense, so I hope some experts out there can correct me if my approach is not valid. Thank you.

Posted by ●September 24, 2006

John wrote:> The reason why I asked about how SNR should be calculated is that > it seems "wrong" to me to calculate the error signal as the difference > between the original clean speech signal and the estimated speech > signal. While it _is_ a true error signal, it doesn't make any sense > to calculate SNR using this kind of error signal as a reference in > this context.I was suggesting to use a distance measure between the original clean speech signal and the estimated speech signal as the quality measure instead of using SNR, not as a step in SNR calculation. Sorry if that wasn't clear.

Posted by ●September 25, 2006

> How would you do the variance ? Sum of squared samples ?> If you say: standard deviation, which is SQRT(deviation) > we have done the same thing.Slowly i am too getting at it: you square the RMS-"voltage" to get it to a "power", because you want the S/N expressed as power 10log() not voltage 20log(). And on that way the variance creeps in. MfG JRD

Posted by ●September 25, 2006

> It is valid only for the memoryless distortion ... You are right, thats why i used the uLAW-coder/decoder ( no filters in it assumed ) as an example. The "Denoiser" would look like that: speech --+----------------+------- "signal" | | noise -add--Denoiser--subtract--- "noise" One could assume speech and noise are fed in with 8kHz linear PCM. But the delay in the "Denoiser" would make that setup unusable. MfG JRD

Posted by ●September 25, 2006

Rafael Deliano wrote:> Simple example: i want to measure > the distortion of a uLAW-PCM-Coder.[...] The biggest problem is that the difference between the input and the output is generally not the adequate measure for the distortion for the audio signal. It is valid only for the memoryless distortion at the small levels of -30dB or less. The weighted difference in the spectral domain should be used. Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com

Posted by ●September 25, 2006

Simple example: i want to measure the distortion of a uLAW-PCM-Coder. I can do that digital as a simulation on a sample by sample basis with a sine generator: sine --+----------------------------------+------- "signal" | | +--ulaw-coder--ulaw-decoder-----subtract--- "noise" The ulaw-decoder outputs the distorted signal: signal plus noise. To get the RMS-value of "signal" and "noise" i square and sum all samples and do a SQRT at the end. How would you do the variance ? Sum of squared samples ? If you say: standard deviation, which is SQRT(deviation) we have done the same thing. MfG JRD

Posted by ●September 25, 2006

stereo said the following on 25/09/2006 13:00:>>> I think just calculating the variance doesn't work anymore for such a signal. >> I disagree. It depends on the context. > > I don't get this. If my noise is not gaussian-like, the variance of the > signal will not equal to the signal's power anymore, does it? Imagine a > (nearly) rectangular signal, the histogram will just have two > spikes...mean and variance are not applicable here in the way you do it > above.Variance is defined as the second central moment; assuming zero mean this is: sigma^2 = int x^2 f(x) dx integrated across all x. However, this is also the definition of power. So variance and power are always equal in the zero-mean case, I think. -- Oli

Posted by ●September 25, 2006

Hi again,> >If you calculate it by the > > difference between s and s_hat you schould obtain the noise suppression > > plus any distortion of the speech. > > How do you deduct that?No math here, just intuitively: s_hat is the estimated clean speech (and exactly what you output to a listener). As the output of the algorithm it is built from the distorted version of the speech (since the speech is not estimated error-free) and from the remaining noise, i.e. the part of the noise that is not removed by the algorithm. Therefore the difference between s and s_hat should include distortion and noise. I'm not postulating this as an expert or so, it is just what comes into mind when analysing the problem. Corrections are very welcome.> > Another thing I just thought about: How about noise which is not > > gaussian, e.g. impulsive noise or deterministic interferences? > My algorithm is designed for quasi-stationary noise sources. A deterministic > source like a sine would be suppressed no problem. An impulsive noise > source would probably be harder to suppress unless it has some cyclic > nature to it. > >I think just calculating the variance doesn't work anymore for such a signal. > I disagree. It depends on the context.I don't get this. If my noise is not gaussian-like, the variance of the signal will not equal to the signal's power anymore, does it? Imagine a (nearly) rectangular signal, the histogram will just have two spikes...mean and variance are not applicable here in the way you do it above. Regards stereo

Posted by ●September 25, 2006

> Imagine that your algorithm does not only change the noise but also the> speech signal. I think, with your calculation you get the amount of > noise suppression by the algorithm.The algorithm changes both the unknown noise component n and the unknown speech component s in the known noisy input speech sequence x. And yes, the amount of noise suppression is _exactly_ what I want to measure since I want to measure the performance of my _noise suppression_ algorithm.>If you calculate it by the > difference between s and s_hat you schould obtain the noise suppression > plus any distortion of the speech.How do you deduct that?>Your approach on the other hand does > only get the noise suppression, doesn't it *?*Yes.> Another thing I just thought about: How about noise which is not > gaussian, e.g. impulsive noise or deterministic interferences?My algorithm is designed for quasi-stationary noise sources. A deterministic source like a sine would be suppressed no problem. An impulsive noise source would probably be harder to suppress unless it has some cyclic nature to it.>I think > just calculating the variance doesn't work anymore for such a signal.I disagree. It depends on the context. I am really grateful that you took time out to post a reply. It is very inspiring to discuss ideas and problems in this forum. Cheers...

Posted by ●September 25, 2006

Hi John, interesting question. I'm with you in your way of evaluating the suppression of noise by an algorithm.> However, in the papers I have read the authors > always calculate the ratio as s/e where e is defined as e=s-s_hat where s is > the original > clean speech and s_hat is the estimated clean speech. In my mind, it makes > no sense to first calculate > the SNR of the input signal as s/n and then postulate that the "noise" > component > in the output signal is the difference between the original speech signal > and > the estimated speech.Imagine that your algorithm does not only change the noise but also the speech signal. I think, with your calculation you get the amount of noise suppression by the algorithm. If you calculate it by the difference between s and s_hat you schould obtain the noise suppression plus any distortion of the speech. Your approach on the other hand does only get the noise suppression, doesn't it *?* Another thing I just thought about: How about noise which is not gaussian, e.g. impulsive noise or deterministic interferences? I think just calculating the variance doesn't work anymore for such a signal. Regards stereo

Posted by ●September 25, 2006

"Rafael Deliano" <Rafael_Deliano@t-online.de> skrev i en meddelelse news:45178F7A.E09422A@t-online.de...>> could get an arbitrarily high "SNR" > To put it more bluntly: there is nothing wrong about defining > a new figure of merit. But giving it the name of a established > one would be confusing.I don't quite understand what you mean by a "new figure of merit". SNR is defined as 10 multiplied by the 10-base logarithm of the ratio between signal power and noise power. However, in the papers I have read the authors always calculate the ratio as s/e where e is defined as e=s-s_hat where s is the original clean speech and s_hat is the estimated clean speech. In my mind, it makes no sense to first calculate the SNR of the input signal as s/n and then postulate that the "noise" component in the output signal is the difference between the original speech signal and the estimated speech. An algorithm, A, which operates more or less in a linear way on an input signal s+n will output A(s)+A(n). Hence, SNR of output should be measured as 10log10(var(A(s))/var(A(n))) and not 10log10(var(A(s))/var(s-s_hat)). In the latter case, the SNR ratio is not even comparable with the SNR of the input.

Posted by ●September 25, 2006

> could get an arbitrarily high "SNR" To put it more bluntly: there is nothing wrong about defining a new figure of merit. But giving it the name of a established one would be confusing.> Mean Opinion ScoresLow distortion systems like PCM-codecs use S/N-plots to judge performance. For simple waveformcoders like CVSD S/N already failed: slope-overload would give poor S/N, but be listeners wouldnÂ´t object. This was already known in the 60ies when work on vocoders started and a need for measuring their quality was emerging. All the more complex coders since then are in the end judged by MOS. Which done properly is tedious, expensive, not too reliable either. And obviously impractial for daily work. No cheap and simple way to judge how noise & distortion affects the quality of speech has emerged in all these years.>> where var() stands for variance.The usual "signal" for testing S/N in a phone-channel would be a sine 1KHz. A filter limiting the noise like C-message would be used. This S/N calculation is based on power. Very simple ( but murky ) statistical models of speech have been used in the past. For defining the PCM-codecs the longtime-average pdf of speech which is supposed to be a gamma-distribution ( similar: Laplacian ). But the short-term is Gauss. In the 60ies some people have used whitenoise-generators with formfilters that give the shape of the longtime-average spectrum of speech. For some applications that may be a more usefull representation of speech then a sine. MfG JRD