comp.dsp | spectral peak-estimation by cross-correlation

Hello comp.dsp readers,

I am working on a plugin that involves estimation of spectral
peaks (frequency and magnitude - basically the whole thing is 
phase-vocoder based).
I first tried to use parabolic interpolation (by taking the
maximum of the parabola to calculate the frequency and amplitude
of sinusoids I want to track).

In most cases that works sufficiently well, but when sinusoids
are involved that are too close together (but still a few bins arpart),
the sidelobes overlap too much and the whole thing gives pretty
bad results.

So I am wondering now if it could be improved by cross-correlation
of the spectra with the spectrum of the window (blackman) and
afterwards an estimation of the peaks in the cross-correlation.
I have heard that this approach would be more robust.
But I have trouble to really understand the cross-correlation of
2 complex signals (the spectra).  Or would it make more sense just
to use the magnitude spectra?
How are the absolute values of the cross-correlation (which I would
use to estimate the peak) effected by the angles
of the bins?

X is the spectrum of the analysed signal (windowed)
W is the spectrum of the window.

compute the cross-correlation:
A[n] = sum(k = -N...N,  X[n] * W[n+k])

Then I want to use peaks in A[n] to calculate sinusoidal parameters.
The main question is if it makes sense to calculate the
cross-correlation of the complex-numbered spectra or if
this has no advantage over using the magnitude spectra?
(Or do you guys think there is no advantage over using parabolic
interpolation anyway?)

Did anybody here try such a thing?
If my explanation is too unclear, please ask.

greetings,
Bjoern Anton Erlach

Reply by banton ●March 8, 20082008-03-08

>
>Hello comp.dsp readers,
>
>I am working on a plugin that involves estimation of spectral
>peaks (frequency and magnitude - basically the whole thing is 
>phase-vocoder based).
>I first tried to use parabolic interpolation (by taking the
>maximum of the parabola to calculate the frequency and amplitude
>of sinusoids I want to track).
>
>In most cases that works sufficiently well, but when sinusoids
>are involved that are too close together (but still a few bins arpart),
>the sidelobes overlap too much and the whole thing gives pretty
>bad results.
>
>So I am wondering now if it could be improved by cross-correlation
>of the spectra with the spectrum of the window (blackman) and
>afterwards an estimation of the peaks in the cross-correlation.
>I have heard that this approach would be more robust.
>But I have trouble to really understand the cross-correlation of
>2 complex signals (the spectra).  Or would it make more sense just
>to use the magnitude spectra?
>How are the absolute values of the cross-correlation (which I would
>use to estimate the peak) effected by the angles
>of the bins?
>
>X is the spectrum of the analysed signal (windowed)
>W is the spectrum of the window.
>
>compute the cross-correlation:
>A[n] = sum(k = -N...N,  X[n] * W[n+k])

sorry, that should be
A[n] = sum(k = -N...N,  X[k] * W[k+n])

and I guess I should use the complex conjugate of X.
It's also not clear to me why. 

>
>Then I want to use peaks in A[n] to calculate sinusoidal parameters.
>The main question is if it makes sense to calculate the
>cross-correlation of the complex-numbered spectra or if
>this has no advantage over using the magnitude spectra?
>(Or do you guys think there is no advantage over using parabolic
>interpolation anyway?)
>
>Did anybody here try such a thing?
>If my explanation is too unclear, please ask.
>
>greetings,
>Bjoern Anton Erlach
>
>
>
>
>
>

Reply by dbd ●March 9, 20082008-03-09

On Mar 8, 7:41 pm, "banton" <bant...@web.de> wrote:
> Hello comp.dsp readers,
>
> I am working on a plugin that involves estimation of spectral
> peaks (frequency and magnitude - basically the whole thing is
> phase-vocoder based).
> I first tried to use parabolic interpolation (by taking the
> maximum of the parabola to calculate the frequency and amplitude
> of sinusoids I want to track).
>
> In most cases that works sufficiently well, but when sinusoids
> are involved that are too close together (but still a few bins arpart),
> the sidelobes overlap too much and the whole thing gives pretty
> bad results.
>
> So I am wondering now if it could be improved by cross-correlation
> of the spectra with the spectrum of the window (blackman) and
> afterwards an estimation of the peaks in the cross-correlation.
> I have heard that this approach would be more robust.
> But I have trouble to really understand the cross-correlation of
> 2 complex signals (the spectra).  Or would it make more sense just
> to use the magnitude spectra?
> How are the absolute values of the cross-correlation (which I would
> use to estimate the peak) effected by the angles
> of the bins?
>
> X is the spectrum of the analysed signal (windowed)
> W is the spectrum of the window.
>
> compute the cross-correlation:
> A[n] = sum(k = -N...N,  X[n] * W[n+k])
>
> Then I want to use peaks in A[n] to calculate sinusoidal parameters.
> The main question is if it makes sense to calculate the
> cross-correlation of the complex-numbered spectra or if
> this has no advantage over using the magnitude spectra?
> (Or do you guys think there is no advantage over using parabolic
> interpolation anyway?)
>
> Did anybody here try such a thing?
> If my explanation is too unclear, please ask.
>
> greetings,
> Bjoern Anton Erlach

You may get better answers if you can answer some of these questions:

What inputs do you have?
   Time Domain?
   Frequency domain
   Complex coefficients?
   How many?
What do you know about the inputs?
   SNR region: low, high?
   Type of background: noise, tones?

These are some of the parameters that suggest what might work with the
information you have.

Dale B. Dalrymple
http://dbdimages.com

Reply by banton ●March 9, 20082008-03-09

>On Mar 8, 7:41 pm, "banton" <bant...@web.de> wrote:
>> Hello comp.dsp readers,
>>
>> I am working on a plugin that involves estimation of spectral
>> peaks (frequency and magnitude - basically the whole thing is
>> phase-vocoder based).
>> I first tried to use parabolic interpolation (by taking the
>> maximum of the parabola to calculate the frequency and amplitude
>> of sinusoids I want to track).
>>
>> In most cases that works sufficiently well, but when sinusoids
>> are involved that are too close together (but still a few bins
arpart),
>> the sidelobes overlap too much and the whole thing gives pretty
>> bad results.
>>
>> So I am wondering now if it could be improved by cross-correlation
>> of the spectra with the spectrum of the window (blackman) and
>> afterwards an estimation of the peaks in the cross-correlation.
>> I have heard that this approach would be more robust.
>> But I have trouble to really understand the cross-correlation of
>> 2 complex signals (the spectra).  Or would it make more sense just
>> to use the magnitude spectra?
>> How are the absolute values of the cross-correlation (which I would
>> use to estimate the peak) effected by the angles
>> of the bins?
>>
>> X is the spectrum of the analysed signal (windowed)
>> W is the spectrum of the window.
>>
>> compute the cross-correlation:
>> A[n] = sum(k = -N...N,  X[n] * W[n+k])
>>
>> Then I want to use peaks in A[n] to calculate sinusoidal parameters.
>> The main question is if it makes sense to calculate the
>> cross-correlation of the complex-numbered spectra or if
>> this has no advantage over using the magnitude spectra?
>> (Or do you guys think there is no advantage over using parabolic
>> interpolation anyway?)
>>
>> Did anybody here try such a thing?
>> If my explanation is too unclear, please ask.
>>
>> greetings,
>> Bjoern Anton Erlach
>
>You may get better answers if you can answer some of these questions:
>
>What inputs do you have?

Musical Instruments (1 channel recordings)

>   Time Domain?
>   Frequency domain
>   Complex coefficients?
>   How many?

The input comes from a microphone and I start with taking STFT
frames in which I want to detect peaks to track some of the harmonics.
As I said I want to cross-correlate spectra.
One would be the STFT frame 
(framesize: 4096 samples windowed + 4096 zeropadding)
and the other one is the spectrum of the window
with the same size and zeropadding.
By finding the maxima in the cross-correlation I hope to
identify peaks better in cases where the simpler method of
interpolating the spectral-peaks with parabolic interpolation failes.

>What do you know about the inputs?
>   SNR region: low, high?
>   Type of background: noise, tones?

Background noise will not be a big problem. 
The problem arises when tracked sinusoids are too close, so that
the sidelobes of the peaks interfere with their neighbours.

>These are some of the parameters that suggest what might work with the
>information you have.
>
>Dale B. Dalrymple
>http://dbdimages.com
>

Reply by Rune Allnor ●March 9, 20082008-03-09

On Mar 9, 4:41&#4294967295;am, "banton" <bant...@web.de> wrote:
> Hello comp.dsp readers,
>
> I am working on a plugin that involves estimation of spectral
> peaks (frequency and magnitude - basically the whole thing is
> phase-vocoder based).
> I first tried to use parabolic interpolation (by taking the
> maximum of the parabola to calculate the frequency and amplitude
> of sinusoids I want to track).
>
> In most cases that works sufficiently well, but when sinusoids
> are involved that are too close together (but still a few bins arpart),
> the sidelobes overlap too much and the whole thing gives pretty
> bad results.
>
> So I am wondering now if it could be improved by cross-correlation
> of the spectra with the spectrum of the window (blackman) and
> afterwards an estimation of the peaks in the cross-correlation.
> I have heard that this approach would be more robust.

Others have already asked for a number of clarifications.
It might be useful to specify exactly what you try to achieve:

1) Do you want to assert the presence of one peak?
2) Do you want to locate the frequency with top magnitude
   of one peak with high precision?
3) Do you want to identify two close peaks as separate?

Each question requires an individual approache to
reach a solution, and the solutions to questions 2) and 3)
are based on the assumption that question 1) does not pose
a problem at all.

The usual application when window functions are used,
is to answer question 1), since the net effect of windows
on (cross) spectra is to reduce the senitivity to spurious
noise.

Once one start playing with frequency estimation (q. 2)) and
signal separations (q. 3)), cross correlations and cross
spectra are used, but very seldom with non-rectangular window
functions [*], as these tend to obfuscate the analysis and
results.

Rune

[*] If you are unfamiliar with window functions you might
    interpret the statement as if a rectangular window
    may be used. Don't worry, you already use it. I write
    as I do to avoid certain boring discussions on semantics
    between comp.dsp regulars.

Reply by banton ●March 9, 20082008-03-09

>On Mar 9, 4:41=A0am, "banton" <bant...@web.de> wrote:
>> Hello comp.dsp readers,
>>
>> I am working on a plugin that involves estimation of spectral
>> peaks (frequency and magnitude - basically the whole thing is
>> phase-vocoder based).
>> I first tried to use parabolic interpolation (by taking the
>> maximum of the parabola to calculate the frequency and amplitude
>> of sinusoids I want to track).
>>
>> In most cases that works sufficiently well, but when sinusoids
>> are involved that are too close together (but still a few bins
arpart),
>> the sidelobes overlap too much and the whole thing gives pretty
>> bad results.
>>
>> So I am wondering now if it could be improved by cross-correlation
>> of the spectra with the spectrum of the window (blackman) and
>> afterwards an estimation of the peaks in the cross-correlation.
>> I have heard that this approach would be more robust.
>
>Others have already asked for a number of clarifications.
>It might be useful to specify exactly what you try to achieve:
>
>1) Do you want to assert the presence of one peak?

No

>2) Do you want to locate the frequency with top magnitude
>   of one peak with high precision?

Almost.  I want to locate multiple peaks in a given frequency
region. 

>3) Do you want to identify two close peaks as separate?

Yes.  But I am not talking about Peaks which are so close
that they just cause beating.  I mean Peaks which can easily
be identified by eye, if you look at plots of the spectrum.
But it is still the interference between them which causes trouble.

Since it seems to be quite unclear what I am trying to do, I
can give some references that describe the basis of sinusoidal
modells as they are used in audio-processing 
(like the phase-vocoder and its derivatives):

http://www.panix.com/~jens/pvoc-dolson.par

http://ccrma.stanford.edu/~jos/sasp/

thanks,
Bjoern

Reply by banton ●March 9, 20082008-03-09

>>On Mar 9, 4:41=A0am, "banton" <bant...@web.de> wrote:
>>> Hello comp.dsp readers,
>>>
>>> I am working on a plugin that involves estimation of spectral
>>> peaks (frequency and magnitude - basically the whole thing is
>>> phase-vocoder based).
>>> I first tried to use parabolic interpolation (by taking the
>>> maximum of the parabola to calculate the frequency and amplitude
>>> of sinusoids I want to track).
>>>
>>> In most cases that works sufficiently well, but when sinusoids
>>> are involved that are too close together (but still a few bins
>arpart),
>>> the sidelobes overlap too much and the whole thing gives pretty
>>> bad results.
>>>
>>> So I am wondering now if it could be improved by cross-correlation
>>> of the spectra with the spectrum of the window (blackman) and
>>> afterwards an estimation of the peaks in the cross-correlation.
>>> I have heard that this approach would be more robust.
>>
>>Others have already asked for a number of clarifications.
>>It might be useful to specify exactly what you try to achieve:
>>
>>1) Do you want to assert the presence of one peak?
>
>No
>
>>2) Do you want to locate the frequency with top magnitude
>>   of one peak with high precision?
>
>Almost.  I want to locate multiple peaks in a given frequency
>region. 
>
>>3) Do you want to identify two close peaks as separate?
>
>Yes.  But I am not talking about Peaks which are so close
>that they just cause beating.  I mean Peaks which can easily
>be identified by eye, if you look at plots of the spectrum.
>But it is still the interference between them which causes trouble.
>
>Since it seems to be quite unclear what I am trying to do, I
>can give some references that describe the basis of sinusoidal
>modells as they are used in audio-processing 
>(like the phase-vocoder and its derivatives):
>
>http://www.panix.com/~jens/pvoc-dolson.par
>
>http://ccrma.stanford.edu/~jos/sasp/

For further clearification in my own words:

I want to analyse sounds of musical instruments.
The result of the analysis is a representation of the sound
in terms of a number of sinusoids (usually the harmonics) with
(rather slowly) varyieng amplitudes and frequencies.
One step is the identification of the frequencies and amplitudes
which of course almost never fall on the exact frequencies of
the bins of the STFT.
And exactly for this step I try to find out if I can improve
results by cross-correlation with the spectrum of the used window.
Which is not rectangular since I am more or less sure that
for this purpose it would make things worse.  The problems are
caused by leakage and interference in bins of neighbouring peaks.

>thanks,
>Bjoern
>
>
>

Reply by Ron N. ●March 9, 20082008-03-09

On Mar 9, 6:54 am, "banton" <bant...@web.de> wrote:
> >2) Do you want to locate the frequency with top magnitude
> >   of one peak with high precision?
>
> Almost.  I want to locate multiple peaks in a given frequency
> region.
>
> >3) Do you want to identify two close peaks as separate?
>
> Yes.  But I am not talking about Peaks which are so close
> that they just cause beating.  I mean Peaks which can easily
> be identified by eye, if you look at plots of the spectrum.
> But it is still the interference between them which causes trouble.

Simple magnitude peak finding methods with FFT results will
likely encounter problems if there are less than some small
number, say 2 or 3, of beats per frame.  Cross-correlation
with the window transform, compared to parabolic
interpolation, may improve estimation for those frequency
offsets where the shape of the transform of your window is
very different from parabolic.  Cross-correlation using
the complex FFT results instead of the magnitudes may help
for certain combinations of phase relationships between the
two frequencies and the window center.

If you use a non-rectangular window, it may reduce side
lobe interference of more distant peaks, but also increase
the span over which the main lobes of nearby peaks can
interfere with each other.

IMHO. YMMV.
--
rhn A.T nicholson d.0.t C-o-M
 http://www.nicholson.com/rhn/dsp.html

Reply by banton ●March 9, 20082008-03-09

>On Mar 9, 6:54 am, "banton" <bant...@web.de> wrote:
>> >3) Do you want to identify two close peaks as separate?
>>
>> Yes.  But I am not talking about Peaks which are so close
>> that they just cause beating.  I mean Peaks which can easily
>> be identified by eye, if you look at plots of the spectrum.
>> But it is still the interference between them which causes trouble.
>
>Simple magnitude peak finding methods with FFT results will
>likely encounter problems if there are less than some small
>number, say 2 or 3, of beats per frame.  

You say: "simple magnitude peak finding methods" - and I wonder
what other methods I could try.
Any suggestions/references?

>Cross-correlation
>with the window transform, compared to parabolic
>interpolation, may improve estimation for those frequency
>offsets where the shape of the transform of your window is
>very different from parabolic.  Cross-correlation using
>the complex FFT results instead of the magnitudes may help
>for certain combinations of phase relationships between the
>two frequencies and the window center.

I played around with the cross-correlation idea and didn't get
any improvements.  Sometimes I got better estimates sometimes worse.
So for now I went back to the parabolic interpolation.
Of course what you say about the shape of the transform of the
window makes sense.  But I was hoping that the cross-correlation
would bring an advantage because it would use a little bit more
of the information in the spectrum around the peak, since
the parabolic interpolation just looks at 3 samples.
In other words I hoped that the information from the sidelobes
surrounding the peak could somehow be turned into something usefull.

An idea that came through my mind is to use FFTs and peak-detection
just to get a rough estimate of where to look for partials and
then try to use some kind of "adaptive heterodyning" with bandpass
filters that start with center-frequencies at the peak-locations.
Maybe this way I could get more details about the sinusoids I want
to track.

So if anybody here has some experience or ideas to share, about 
the implementation of phase vocoders or sinusoidal modells, 
I am happy to hear about it.

>If you use a non-rectangular window, it may reduce side
>lobe interference of more distant peaks, but also increase
>the span over which the main lobes of nearby peaks can
>interfere with each other.

Yes, that is clear to me.

Thanks for the replies,
Bjoern

>
>
>IMHO. YMMV.
>--
>rhn A.T nicholson d.0.t C-o-M
> http://www.nicholson.com/rhn/dsp.html
>

Reply by dbd ●March 9, 20082008-03-09

On Mar 8, 10:46 pm, "banton" <bant...@web.de> wrote:

> ...
> Background noise will not be a big problem.
> The problem arises when tracked sinusoids are too close, so that
> the sidelobes of the peaks interfere with their neighbours.
> ...

One interesting technique applicable in high SNR situations for
estimating two poorly separated tones involves use of a Gaussian
window on data in an FFT and using the bins, off peak on the side away
from the interfering signal to estimate the frequency and amplitude.
Take a look at:

http://www.edn.com/archives/1994/030394/graph/05df1fg1.htm

There is a figure 1 you can click on to enlarge to see an example of
the performance capability on the data set fred harris used to discuss
frequency resolution. C code is provided. This is from a back issue of
EDN magazine.

The Gaussian window was chosen to:
1) widen mainlobe response
2) increase sidelobe rolloff
3) simplify calculation of frequency and magnitude from the ratio of
the amplitudes of two non-peak bins.

Items 1) and 2) could be provided by many good windows and serve to
improve signal to interferer ratio in the bins used for calculation.
SNR must be high enough that noise is not the limitation.

Item 3) is a characteristic of the Gaussian window and is described in
the article.

Dale B. Dalrymple

Previous12 Next

spectral peak-estimation by cross-correlation

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group