DSPRelated.com
Forums

Question regarding the validity of a SNR calculation

Started by John September 24, 2006
Hello,

I have a question regarding the computation of SNR.

The case I am working on is a speech processing algorithm which
suppresses background noise. The input signal is a noisy speech
sequence defined as x[k]=s[k]+n[k] where s is clean speech and
n is quasi-stationary pink noise.

I would like to know how well the algorithm performs.

The SNR of the input signal is calculated as:

SNR=10*log10(var(s)/var(n))

where var() stands for variance.

To obtain an estimate of how well the algorithm performs,
I also need to calculate the SNR of the output.

I do this by setting x[k]=n[k]. In that way I obtain the
processed background noise Yn[k].

I then set x[k]=s[k]+n[k] and obtain the estimated
clean speech Ys[k].

Then I calculate:

SNR=10*log10(var(Ys)/var(Yn))


Any comments to this approach?


Cheers...


John wrote:
> Hello, > > I have a question regarding the computation of SNR. > > The case I am working on is a speech processing algorithm which > suppresses background noise. The input signal is a noisy speech > sequence defined as x[k]=s[k]+n[k] where s is clean speech and > n is quasi-stationary pink noise. > > I would like to know how well the algorithm performs. > > The SNR of the input signal is calculated as: > > SNR=10*log10(var(s)/var(n)) > > where var() stands for variance. > > To obtain an estimate of how well the algorithm performs, > I also need to calculate the SNR of the output. > > I do this by setting x[k]=n[k]. In that way I obtain the > processed background noise Yn[k]. > > I then set x[k]=s[k]+n[k] and obtain the estimated > clean speech Ys[k]. > > Then I calculate: > > SNR=10*log10(var(Ys)/var(Yn)) > > > Any comments to this approach? > > > Cheers...
I have a practical comment. The 10log10 statements will only be true if you are measuring power. If you are measuring voltage (far more common) or even current then you'll need to use 20log10 [x] (I know that only really true if the load resistance for each element [the signal source and the signal + noise source] is the same, but in this case it should work perfectly.) Cheers PeteS
> I have a practical comment. The 10log10 statements will only be true if > you are measuring power. If you are measuring voltage (far more common) > or even current then you'll need to use 20log10 [x] >
Thanks for replying :-) Is variance not the same as power? The power spectrum shows the distribution of variance over a set of frequencies. Right? As far as I remember you use a multiplication factor of 20 when you are measuring the ratio of squared amplitude, that is : SNR= 20 log10 (S_amplitude^2 / N_amplitude^2) Correct me if I am mistaken :-)
John wrote:
>> I have a practical comment. The 10log10 statements will only be true if >> you are measuring power. If you are measuring voltage (far more common) >> or even current then you'll need to use 20log10 [x] >> > > Thanks for replying :-) > > Is variance not the same as power? The power spectrum shows the distribution > of variance over a set of frequencies. Right?
If the average value is zero, yes. Otherwise the power is greater than the variance. -- Mark Borgerding
> If the average value is zero, yes. Otherwise the power is greater than > the variance.
Well, in this case the average value is zero (speech signals). Thanks for replying :-)
John wrote:
> > I have a practical comment. The 10log10 statements will only be true if > > you are measuring power. If you are measuring voltage (far more common) > > or even current then you'll need to use 20log10 [x] > > > > Thanks for replying :-) > > Is variance not the same as power? The power spectrum shows the distribution > of variance over a set of frequencies. Right? > > As far as I remember you use a multiplication factor of 20 when you are > measuring the ratio of squared amplitude, that is : > > SNR= 20 log10 (S_amplitude^2 / N_amplitude^2) > > Correct me if I am mistaken :-)
Well, I would usually take 10log10 (SPwr/NPwr) or 20 log10 (S Amplitude/N Amplitude) which is the same as 10log10 (S Amplitude^2/N Amplitude^2) assuming equal resistances in the power system
>From the basic identity that log a^2 = 2 log a, of course; more
generally that log a^x = x log a. Cheers PeteS
> To obtain an estimate of how well the algorithm performs, > I also need to calculate the SNR of the output. > > I do this by setting x[k]=n[k]. In that way I obtain the > processed background noise Yn[k]. > > I then set x[k]=s[k]+n[k] and obtain the estimated > clean speech Ys[k]. > > Then I calculate: > > SNR=10*log10(var(Ys)/var(Yn)) > > Any comments to this approach?
I don't have experience in this sort of quantitative evaluation of noise reduction algorithms, but here's my take on this. With this calculation method, it seems to me an algorithm could get an arbitrarily high "SNR" just by detecting the presence of speech and boosting overall output volume when speech is present. I suggest that you find the distance between Ys[k] and s[k] instead. For example, calculate the rms difference between them. If you have the time to get into it, a more perceptual distance measure might be better than rms. Unfortunately I can't think of any references offhand for perceptual distance measures but I believe various people have developed code for them to aid in the evaluation of speech compression or noise reduction methods. (One idea might be to apply an A-weighting filter to the signals before computing the rms distance.) (I am assuming your work is aimed at helping human listeners rather than, e.g., improving computer speech recognition accuracy. Even for the latter, A-weighting is not unreasonable.) I think the best measurement of all is tests with human listeners (e.g., having listeners rate the quality of the processed and unprocessed output to obtain Mean Opinion Scores, or testing intelligibility if improving intelligibility is what you are after). But that may be time-consuming or expensive. By the way I have some noise reduction code linked at http://www.icsi.berkeley.edu/Speech/papers/gelbart-ms/pointers/ which you might find interesting for comparison. Good luck, David
this-email-address-is-invalid wrote:

> I suggest that you find the distance between Ys[k] and s[k] instead. > For example, calculate the rms difference between them. If you have > the time to get into it, a more perceptual distance measure might be > better than rms. Unfortunately I can't think of any references offhand > for perceptual distance measures but I believe various people have > developed code for them to aid in the evaluation of speech compression > or noise reduction methods. (One idea might be to apply an A-weighting > filter to the signals before computing the rms distance.) >
For distance calculation, Bryan Pellom's Objective Speech Quality Assessment toolkit at http://cslr.colorado.edu/rspl/rspl_software.html might be of interest.
Thanks for all the links :-)

I appreciate it..

The reason why I asked about how SNR should be calculated is that
it seems "wrong" to me to calculate the error signal as the difference
between the original clean speech signal and the estimated speech
signal. While it _is_ a true error signal, it doesn't make any sense
to calculate SNR using this kind of error signal as a reference in
this context. Why?

First of all, the SNR of the input is calculated as:

10log10(var(s)/var(n))

If we assume that the algorithm A operates on the input signal in
a close-to-linear way and the input signal is defined as x=s+n, then

A(x)=A(s)+A(n)

The SNR of the output should then - in the name of consistency - be
calculated as

10log10(var(A(s))/var(A(n)))

A(n) is obtained by sending the noise component for a given SNR input
signal through the algorithm; that is setting x=n for a given SNR.

I haven't thought in detail about the validity about this approach and this
is why I am posting the question. But intuitively it seems like the right
approach.

To verify the "linear" properties of the algorithm I tried to set x=s+n
for a given SNR and saved the output O1 of the algorithm.

I then set x=n for the same SNR and saved that output O2.

I then played O1-O2 and it definitely sounds better than O1 so I guess
that implies that O2 _is_ the remaining noise component after processing.

I don't know if I make any sense, so I hope some experts out there can
correct me if my approach is not valid.

Thank you.







John wrote:

> The reason why I asked about how SNR should be calculated is that > it seems "wrong" to me to calculate the error signal as the difference > between the original clean speech signal and the estimated speech > signal. While it _is_ a true error signal, it doesn't make any sense > to calculate SNR using this kind of error signal as a reference in > this context.
I was suggesting to use a distance measure between the original clean speech signal and the estimated speech signal as the quality measure instead of using SNR, not as a step in SNR calculation. Sorry if that wasn't clear.