Hello everyone, I have an important question.

Estimating the noise floor as the minimum spectrogram, I get this
situation (the red vector is the maximum values of spectrogram, the vector
in blue the minimum. I calculate the "SNR" in that way: the ratio of the
variance of the maximum vector and minimum vector)

EXAMPLES:

1 - Clean file, solo keyboard:
https://s32.postimg.org/xjk940z8z/image.jpg
In this case the vector of minimum has very low variance. Calculating:
variance (maximum) / variance (minimum) I get a high enough value: 478,
which is ok.

2 - Clean file, distorted guitar, drums and vocals:
https://s31.postimg.org/t0vdsfox7/image.jpg
In this case, the maximum vector has very high variance. By calculating
the variance (maximum) / variance (minimum): 272, which is not good, I
would like a value around 350, or above.

3 - Dirty File: lot of background noise, acoustic guitar and vocals:
https://s32.postimg.org/5c6asgoz9/D10.jpg
In this case, the maximum of the carrier has very high variance. By
calculating the variance (maximum) / variance (minimum) I get a low enough
value: 254, which is ok, since I have a high background noise.

What I'm trying to do, is finding a way to discriminate case 2 from case
3. I want to find a significant parameter that can help me in case of high
variance, to discriminate when I am in presence of a CLEAN percussive
files and a dirty file (percussive or not), in order to "filter" the
percussive parts.

I tried to calculate asymmetry and mode of the probability distribution of
the  minimum vector, because I saw that in presence of percussive parts,
the vector is asymmetrical.. but not in every case works.

I hope someone can help me, thanks in advance.
---------------------------------------
Posted through http://www.DSPRelated.com

>It's very simple. Assume the noise and the signal are un-correlated so
>their powers always add. It is likely that in any given frequency band,
there
>will be one or more sections of the music track that have no energy in
that
>band, therefore the minimum power over the length of the song for given
>band will represent the background noise. But this is not guaranteed.
Someone
>might like listening to a solid hour of white noise in which case the
>method will fail. 
>
>Bob

Ok, I think I understand. Now I have a question. since I have seen, doing
spectrograms, that the signal is very strongly concentrated in a gross
bandwidth ranging from 0 Hz to 5000 Hz, while from 5000 Hz to 20000, the
energies tend to zero. What do you think if I'm going to estimate the
signal by taking the maximum values in the band where I have the utmost
concentration, and at the same time I'm going to estimate the noise only
in high frequencies?

Also my SNR values depends on the estimation that I made of the signal and
noise. So, if the distance between the mean value of all the maximum and
the mean value of all the minimum of the FFT is large, then I have a good
estimate of my signal, otherwise not. This generally makes me fail my SNR
estimate. Do you think there's something I can do about it?

Thank you in advance.
---------------------------------------
Posted through http://www.DSPRelated.com

It's very simple. Assume the noise and the signal are un-correlated so their powers always add. It is likely that in any given frequency band, there will be one or more sections of the music track that have no energy in that band, therefore the minimum power over the length of the song for given band will represent the background noise. But this is not guaranteed. Someone might like listening to a solid hour of white noise in which case the method will fail. 

Bob

>I would recommend temporal averaging. Update the power estimate for each
bin
>according to 
>
>av_power(bin,k) = alpha*av_power(bin,k-1) + (1-alpha)*power(bin,k)
>
>Where alpha is some number close to 1.
>
>Or if you have enough memory to store past frames, just average the bin
>powers over the last 4 frames or so. 
>
>Bob

Hi Bob, I decided to keep this algorithm. Can you tell me what is the
scientific explanation of this method? Or is it just your experience?
---------------------------------------
Posted through http://www.DSPRelated.com

>This adaptive acoustic channel echo canceler I've written up might help:
>
>http://www.allaboutcircuits.com/technical-articles/an-introduction-to-adaptive-echo-cancellers/
>
>basically your noise would be 
>
>reg1*(wts');
>
>You might be able to tweak this to detect noise, as is it will help
>characterize your acoustic channel. 
>
>
>
>---------------------------------------
>Posted through http://www.DSPRelated.com

%LMS algorithm of echo canceller
reg1=zeros(1,50);
wts = (zeros(1,50));
mu  = .07;
for n = 1:trainlen
 
  wts_sv = wts;
 
  reg1 = [r_t(n) reg1(1:49)];
 
  err = mic_in(n) - reg1*(wts');
 
  y(n) = err;
 
  wts = wts + mu*(reg1*(err'));
   
end

your samples are mic_in...this might not actually work for your case as
you see im shifting r_t(n) into the reg1, where r_t(n) is the noise I'm
training it with. Nevertheless, perhaps this may help you in some way. 
---------------------------------------
Posted through http://www.DSPRelated.com

>This adaptive acoustic channel echo canceler I've written up might help:
>
>http://www.allaboutcircuits.com/technical-articles/an-introduction-to-adaptive-echo-cancellers/
>
>basically your noise would be 
>
>reg1*(wts');
>
>You might be able to tweak this to detect noise, as is it will help
>characterize your acoustic channel. 
>
>
>
>---------------------------------------
>Posted through http://www.DSPRelated.com

Thank you, I'm having a look at it but it's a new topic for me. 

Which variable should be my samples vector? :
---------------------------------------
Posted through http://www.DSPRelated.com

This adaptive acoustic channel echo canceler I've written up might help:

http://www.allaboutcircuits.com/technical-articles/an-introduction-to-adaptive-echo-cancellers/

basically your noise would be 

reg1*(wts');

You might be able to tweak this to detect noise, as is it will help
characterize your acoustic channel. 



---------------------------------------
Posted through http://www.DSPRelated.com

>The SNR numbers you quote for synthetic waveforms correspond to around
400
>db of SNR. Therefore you should not be concerned about the difference
>between say, 380db and 420 db, because both are far above what you will
get in
>a music file (maybe 80db if you are very lucky). 
>I think you are expecting too much precision in your results. With real
>music files you could come up with 10 different ways to estimate the SNR
and
>they will all give different numbers. You have to accept that the best
you
>can do is a rough estimate. 
>
>Bob

In my case what I get for real music is more or less like this:

SNR= around 10^22, good music and no background noise

SNR= around 10^17, I often can't classify when it's like that but many
times I have an intermediate level of background noise 

SNR= around 10^17, bad quality and high level of hiss

Then there are many cases of false negatives and few false positives. 
Assuming, as you say, that I have to settle for approximate results, is
there a better way to improve the algorithm for recognizing false
negatives? I want to be able to recognize a good track (with no background
noise) that now my algorithm recognizes me as dirty.. I hope there is
something to do.

---------------------------------------
Posted through http://www.DSPRelated.com

The SNR numbers you quote for synthetic waveforms correspond to around 400 db of SNR. Therefore you should not be concerned about the difference between say, 380db and 420 db, because both are far above what you will get in a music file (maybe 80db if you are very lucky). 
I think you are expecting too much precision in your results. With real music files you could come up with 10 different ways to estimate the SNR and they will all give different numbers. You have to accept that the best you can do is a rough estimate. 

Bob

>I would recommend temporal averaging. Update the power estimate for each
bin
>according to 
>
>av_power(bin,k) = alpha*av_power(bin,k-1) + (1-alpha)*power(bin,k)
>
>Where alpha is some number close to 1.
>
>Or if you have enough memory to store past frames, just average the bin
>powers over the last 4 frames or so. 
>
>Bob

Hi Bob, i did this but the results are more or less the same.

I did several tests to prove my algorithm. Taking a random song, or a sine
wave and adding the noise at various amplitudes the algorithm works.. my
SNR increases if I have less noise. (To calculate the signal SNR I take
the vector which approximates the signal, the vector which approximates
the noise, I make the variance of both vectors, and then do the ratio).

Testing instead a sine-wave, a sawtooth, a square wave and a triangular
wave, all "clean" with no background noise, I respectively have, these
values &#8203;&#8203;of SNR:

SNR (sin) = 10^24
SNR (tri) = 10^23
SNR (sqr) = 10^20
SNR (saw) = 10^18

That is, all different values&#8203;&#8203;.. it seems that the signal that my
algorithm considers the dirtier the sawtooth wave and I don&rsquo;t want this!


My problem is that I want to be able to quantify the true noise in the
signal .. there are too many variables to consider. I don&rsquo;t know if this
approach will work for all types of tracks.
Overall the output are quite reasonable values, but it isn&rsquo;t going to
measure the true background noise of the tracks.

Is an analysis in the time domain with this logic of the value of the
maximum/minimum for frame can give me a better discrimination of the noise
and signal?

And is the maximum and the minimum method in order to estimate the noise
and the signal correct? I saw that my signal is concentrated roughly
between 0 and 5000 Hz.. doing my research only in this band might be
better?

Thank you in advance.
---------------------------------------
Posted through http://www.DSPRelated.com