Question regarding the validity of a SNR calculation

Started by September 24, 2006
```Hello,

I have a question regarding the computation of SNR.

The case I am working on is a speech processing algorithm which
suppresses background noise. The input signal is a noisy speech
sequence defined as x[k]=s[k]+n[k] where s is clean speech and
n is quasi-stationary pink noise.

I would like to know how well the algorithm performs.

The SNR of the input signal is calculated as:

SNR=10*log10(var(s)/var(n))

where var() stands for variance.

To obtain an estimate of how well the algorithm performs,
I also need to calculate the SNR of the output.

I do this by setting x[k]=n[k]. In that way I obtain the
processed background noise Yn[k].

I then set x[k]=s[k]+n[k] and obtain the estimated
clean speech Ys[k].

Then I calculate:

SNR=10*log10(var(Ys)/var(Yn))

Cheers...

```
```John wrote:
> Hello,
>
> I have a question regarding the computation of SNR.
>
> The case I am working on is a speech processing algorithm which
> suppresses background noise. The input signal is a noisy speech
> sequence defined as x[k]=s[k]+n[k] where s is clean speech and
> n is quasi-stationary pink noise.
>
> I would like to know how well the algorithm performs.
>
> The SNR of the input signal is calculated as:
>
> SNR=10*log10(var(s)/var(n))
>
> where var() stands for variance.
>
> To obtain an estimate of how well the algorithm performs,
> I also need to calculate the SNR of the output.
>
> I do this by setting x[k]=n[k]. In that way I obtain the
> processed background noise Yn[k].
>
> I then set x[k]=s[k]+n[k] and obtain the estimated
> clean speech Ys[k].
>
> Then I calculate:
>
> SNR=10*log10(var(Ys)/var(Yn))
>
>
> Any comments to this approach?
>
>
> Cheers...

I have a practical comment. The 10log10 statements will only be true if
you are measuring power. If you are measuring voltage (far more common)
or even current then you'll need to use 20log10 [x]

(I know that only really true if the load resistance for each element
[the signal source and the signal + noise source] is the same, but in
this case it should work perfectly.)

Cheers

PeteS

```
```> I have a practical comment. The 10log10 statements will only be true if
> you are measuring power. If you are measuring voltage (far more common)
> or even current then you'll need to use 20log10 [x]
>

Is variance not the same as power? The power spectrum shows the distribution
of variance over a set of frequencies. Right?

As far as I remember you use a multiplication factor of 20 when you are
measuring the ratio of squared amplitude, that is :

SNR= 20 log10 (S_amplitude^2 / N_amplitude^2)

Correct me if I am mistaken :-)

```
```John wrote:
>> I have a practical comment. The 10log10 statements will only be true if
>> you are measuring power. If you are measuring voltage (far more common)
>> or even current then you'll need to use 20log10 [x]
>>
>
>
> Is variance not the same as power? The power spectrum shows the distribution
> of variance over a set of frequencies. Right?

If the average value is zero, yes.  Otherwise the power is greater than
the variance.

--
Mark Borgerding

```
```> If the average value is zero, yes.  Otherwise the power is greater than
> the variance.

Well, in this case the average value is zero (speech signals).

```
```John wrote:
> > I have a practical comment. The 10log10 statements will only be true if
> > you are measuring power. If you are measuring voltage (far more common)
> > or even current then you'll need to use 20log10 [x]
> >
>
>
> Is variance not the same as power? The power spectrum shows the distribution
> of variance over a set of frequencies. Right?
>
> As far as I remember you use a multiplication factor of 20 when you are
> measuring the ratio of squared amplitude, that is :
>
> SNR= 20 log10 (S_amplitude^2 / N_amplitude^2)
>
> Correct me if I am mistaken :-)

Well, I would usually take 10log10 (SPwr/NPwr) or 20 log10 (S
Amplitude/N Amplitude) which is the same as 10log10 (S Amplitude^2/N
Amplitude^2) assuming equal resistances in the power system

>From the basic identity that log a^2 = 2 log a, of course;  more
generally that log a^x = x log a.

Cheers

PeteS

```
```> To obtain an estimate of how well the algorithm performs,
> I also need to calculate the SNR of the output.
>
> I do this by setting x[k]=n[k]. In that way I obtain the
> processed background noise Yn[k].
>
> I then set x[k]=s[k]+n[k] and obtain the estimated
> clean speech Ys[k].
>
> Then I calculate:
>
> SNR=10*log10(var(Ys)/var(Yn))
>
> Any comments to this approach?

I don't have experience in this sort of quantitative evaluation of
noise reduction algorithms, but here's my take on this.

With this calculation method, it seems to me an algorithm could get an
arbitrarily high "SNR" just by detecting the presence of speech and
boosting overall output volume when speech is present.

I suggest that you find the distance between Ys[k] and s[k] instead.
For example, calculate the rms difference between them.  If you have
the time to get into it, a more perceptual distance measure might be
better than rms.  Unfortunately I can't think of any references offhand
for perceptual distance measures but I believe various people have
developed code for them to aid in the evaluation of speech compression
or noise reduction methods.  (One idea might be to apply an A-weighting
filter to the signals before computing the rms distance.)

(I am assuming your work is aimed at helping human listeners rather
than, e.g., improving computer speech recognition accuracy.  Even for
the latter, A-weighting is not unreasonable.)

I think the best measurement of all is tests with human listeners
(e.g., having listeners rate the quality of the processed and
unprocessed output to obtain Mean Opinion Scores, or testing
intelligibility if improving intelligibility is what you are after).
But that may be time-consuming or expensive.

By the way I have some noise reduction code linked at
http://www.icsi.berkeley.edu/Speech/papers/gelbart-ms/pointers/
which you might find interesting for comparison.

Good luck,
David

```
```this-email-address-is-invalid wrote:

> I suggest that you find the distance between Ys[k] and s[k] instead.
> For example, calculate the rms difference between them.  If you have
> the time to get into it, a more perceptual distance measure might be
> better than rms.  Unfortunately I can't think of any references offhand
> for perceptual distance measures but I believe various people have
> developed code for them to aid in the evaluation of speech compression
> or noise reduction methods.  (One idea might be to apply an A-weighting
> filter to the signals before computing the rms distance.)
>

For distance calculation, Bryan Pellom's Objective Speech Quality
might be of interest.

```
```Thanks for all the links :-)

I appreciate it..

The reason why I asked about how SNR should be calculated is that
it seems "wrong" to me to calculate the error signal as the difference
between the original clean speech signal and the estimated speech
signal. While it _is_ a true error signal, it doesn't make any sense
to calculate SNR using this kind of error signal as a reference in
this context. Why?

First of all, the SNR of the input is calculated as:

10log10(var(s)/var(n))

If we assume that the algorithm A operates on the input signal in
a close-to-linear way and the input signal is defined as x=s+n, then

A(x)=A(s)+A(n)

The SNR of the output should then - in the name of consistency - be
calculated as

10log10(var(A(s))/var(A(n)))

A(n) is obtained by sending the noise component for a given SNR input
signal through the algorithm; that is setting x=n for a given SNR.

is why I am posting the question. But intuitively it seems like the right
approach.

To verify the "linear" properties of the algorithm I tried to set x=s+n
for a given SNR and saved the output O1 of the algorithm.

I then set x=n for the same SNR and saved that output O2.

I then played O1-O2 and it definitely sounds better than O1 so I guess
that implies that O2 _is_ the remaining noise component after processing.

I don't know if I make any sense, so I hope some experts out there can
correct me if my approach is not valid.

Thank you.

```
```John wrote:

> The reason why I asked about how SNR should be calculated is that
> it seems "wrong" to me to calculate the error signal as the
difference
> between the original clean speech signal and the estimated speech
> signal. While it _is_ a true error signal, it doesn't make any sense
> to calculate SNR using this kind of error signal as a reference in
> this context.

I was suggesting to use a distance measure between the original clean
speech signal and the estimated speech signal as the quality measure
instead of using SNR, not as a step in SNR calculation.  Sorry if that
wasn't clear.

```
```> How would you do the variance ? Sum of squared samples ?
> If you say: standard deviation, which is SQRT(deviation)
> we have done the same thing.
Slowly i am too getting at it: you square the RMS-"voltage"
to get it to a "power", because you want the S/N expressed
as power 10log() not voltage 20log().
And on that way the variance creeps in.

MfG  JRD
```
```> It is valid only for the memoryless distortion ...
You are right, thats why i used the uLAW-coder/decoder ( no filters
in it assumed ) as an example.
The "Denoiser" would look like that:

speech --+----------------+------- "signal"
|                |

One could assume speech and noise are fed in with 8kHz linear
PCM. But the delay in the "Denoiser" would make that
setup unusable.

MfG  JRD
```
```
Rafael Deliano wrote:

> Simple example: i want to measure
> the distortion of a uLAW-PCM-Coder.

[...]

The biggest problem is that the difference between the input and the
output is generally not the adequate measure for the distortion for the
audio signal. It is valid only for the memoryless distortion at the
small levels of -30dB or less.
The weighted difference in the spectral domain should be used.

DSP and Mixed Signal Design Consultant

http://www.abvolt.com
```
```Simple example: i want to measure
the distortion of a uLAW-PCM-Coder. I can do that
digital as a simulation on a sample by sample basis with a
sine generator:

sine --+----------------------------------+------- "signal"
|                                  |
+--ulaw-coder--ulaw-decoder-----subtract--- "noise"

The ulaw-decoder outputs the distorted signal: signal plus noise.

To get the RMS-value of "signal" and "noise" i square and
sum all samples and do a SQRT at the end.
How would you do the variance ? Sum of squared samples ?
If you say: standard deviation, which is SQRT(deviation)
we have done the same thing.

MfG  JRD
```
```stereo said the following on 25/09/2006 13:00:

>>> I think just calculating the variance doesn't work anymore for such a
signal.
>> I disagree. It depends on the context.
>
> I don't get this. If my noise is not gaussian-like, the variance of the
> signal will not equal to the signal's power anymore, does it? Imagine a
> (nearly) rectangular signal, the histogram will just have two
> spikes...mean and variance are not applicable here in the way you do it
> above.

Variance is defined as the second central moment; assuming zero mean
this is:

sigma^2 = int x^2 f(x) dx

integrated across all x.

However, this is also the definition of power.  So variance and power
are always equal in the zero-mean case, I think.

--
Oli
```
```Hi again,

> >If you calculate it by the
> > difference between s and s_hat you schould obtain the noise suppression
> > plus any distortion of the speech.
>
> How do you deduct that?

No math here, just intuitively: s_hat is the estimated clean speech
(and exactly what you output to a listener). As the output of the
algorithm it is built from the distorted version of the speech (since
the speech is not estimated error-free) and from the remaining noise,
i.e. the part of the noise that is not removed by the algorithm.
Therefore the difference between s and s_hat should include distortion
and noise.

I'm not postulating this as an expert or so, it is just what comes into
mind when analysing the problem. Corrections are very welcome.

> > Another thing I just thought about: How about noise which is not
> > gaussian, e.g. impulsive noise or deterministic interferences?
> My algorithm is designed for quasi-stationary noise sources. A deterministic
> source like a sine would be suppressed no problem. An impulsive noise
> source would probably be harder to suppress unless it has some cyclic
> nature to it.
> >I think just calculating the variance doesn't work anymore for such a
signal.
> I disagree. It depends on the context.

I don't get this. If my noise is not gaussian-like, the variance of the
signal will not equal to the signal's power anymore, does it? Imagine a
(nearly) rectangular signal, the histogram will just have two
spikes...mean and variance are not applicable here in the way you do it
above.

Regards

stereo

```
```> Imagine that your algorithm does not only change the noise but also the
> speech signal. I think, with your calculation you get the amount of
> noise suppression by the algorithm.

The algorithm changes both the unknown noise component n and the unknown
speech component s in
the known noisy input speech sequence x. And yes, the amount of noise
suppression is _exactly_ what I want to measure
since I want to measure the performance of my _noise suppression_ algorithm.

>If you calculate it by the
> difference between s and s_hat you schould obtain the noise suppression
> plus any distortion of the speech.

How do you deduct that?

>Your approach on the other hand does
> only get the noise suppression, doesn't it *?*

Yes.

> Another thing I just thought about: How about noise which is not
> gaussian, e.g. impulsive noise or deterministic interferences?

My algorithm is designed for quasi-stationary noise sources. A deterministic
source like a sine would be suppressed no problem. An impulsive noise
source would probably be harder to suppress unless it has some cyclic
nature to it.

>I think
> just calculating the variance doesn't work anymore for such a signal.

I disagree. It depends on the context.

I am really grateful that you took time out to post a reply. It is very
inspiring to discuss ideas and problems in this forum.

Cheers...

```
```Hi John,

interesting question. I'm with you in your way of evaluating the
suppression of noise by an algorithm.

> However, in the papers I have read the authors
> always calculate the ratio as s/e where e is defined as e=s-s_hat where s is
> the original
> clean speech and s_hat is the estimated clean speech. In my mind, it makes
> no sense to first calculate
> the SNR of the input signal as s/n and then postulate that the
"noise"
> component
> in the output signal is the difference between the original speech signal
> and
> the estimated speech.

Imagine that your algorithm does not only change the noise but also the
speech signal. I think, with your calculation you get the amount of
noise suppression by the algorithm. If you calculate it by the
difference between s and s_hat you schould obtain the noise suppression
plus any distortion of the speech. Your approach on the other hand does
only get the noise suppression, doesn't it *?*

Another thing I just thought about: How about noise which is not
gaussian, e.g. impulsive noise or deterministic interferences? I think
just calculating the variance doesn't work anymore for such a signal.

Regards

stereo

```
```"Rafael Deliano" <Rafael_Deliano@t-online.de> skrev i en meddelelse

news:45178F7A.E09422A@t-online.de...
>> could get an arbitrarily high "SNR"
> To put it more bluntly: there is nothing wrong about defining
> a new figure of merit. But giving it the name of a established
> one would be confusing.

I don't quite understand what you mean by a "new figure of merit".
SNR is defined as 10 multiplied by the 10-base logarithm of the ratio
between signal
power and noise power. However, in the papers I have read the authors
always calculate the ratio as s/e where e is defined as e=s-s_hat where s is
the original
clean speech and s_hat is the estimated clean speech. In my mind, it makes
no sense to first calculate
the SNR of the input signal as s/n and then postulate that the "noise"
component
in the output signal is the difference between the original speech signal
and
the estimated speech. An algorithm, A, which operates more or less in a
linear
way on an input signal s+n will output A(s)+A(n). Hence, SNR of output
should
be measured as 10log10(var(A(s))/var(A(n))) and not
10log10(var(A(s))/var(s-s_hat)).
In the latter case, the SNR ratio is not even comparable with the SNR of the
input.

```
```> could get an arbitrarily high "SNR"
To put it more bluntly: there is nothing wrong about defining
a new figure of merit. But giving it the name of a established
one would be confusing.

> Mean Opinion Scores
Low distortion systems like PCM-codecs use S/N-plots to
judge performance.
For simple waveformcoders like CVSD S/N already failed:
slope-overload would give poor S/N, but be listeners wouldn&#2013266100;t
object. This was already known in the 60ies when work on
vocoders started and a need for measuring their quality
was emerging.
All the more complex coders since then are in the end judged
by MOS. Which done properly is tedious, expensive, not too
reliable either. And obviously impractial for daily work.
No cheap and simple way to judge how noise & distortion
affects the quality of speech has emerged in all these years.

>> where var() stands for variance.
The usual "signal" for testing S/N in a phone-channel
would be a sine 1KHz. A filter limiting the noise like C-message
would be used. This S/N calculation is based on power.

Very simple ( but murky ) statistical models of speech have been
used in the past. For defining the PCM-codecs the longtime-average pdf
of speech which is supposed to be a gamma-distribution
( similar: Laplacian ). But the short-term is Gauss.
In the 60ies some people have used whitenoise-generators with
formfilters that give the shape of the longtime-average spectrum
of speech. For some applications that may be a more usefull
representation of speech then a sine.

MfG  JRD
```