DSPRelated.com
Forums

spectral peak-estimation by cross-correlation

Started by banton March 8, 2008
Hello comp.dsp readers,

I am working on a plugin that involves estimation of spectral
peaks (frequency and magnitude - basically the whole thing is 
phase-vocoder based).
I first tried to use parabolic interpolation (by taking the
maximum of the parabola to calculate the frequency and amplitude
of sinusoids I want to track).

In most cases that works sufficiently well, but when sinusoids
are involved that are too close together (but still a few bins arpart),
the sidelobes overlap too much and the whole thing gives pretty
bad results.

So I am wondering now if it could be improved by cross-correlation
of the spectra with the spectrum of the window (blackman) and
afterwards an estimation of the peaks in the cross-correlation.
I have heard that this approach would be more robust.
But I have trouble to really understand the cross-correlation of
2 complex signals (the spectra).  Or would it make more sense just
to use the magnitude spectra?
How are the absolute values of the cross-correlation (which I would
use to estimate the peak) effected by the angles
of the bins?

X is the spectrum of the analysed signal (windowed)
W is the spectrum of the window.

compute the cross-correlation:
A[n] = sum(k = -N...N,  X[n] * W[n+k])

Then I want to use peaks in A[n] to calculate sinusoidal parameters.
The main question is if it makes sense to calculate the
cross-correlation of the complex-numbered spectra or if
this has no advantage over using the magnitude spectra?
(Or do you guys think there is no advantage over using parabolic
interpolation anyway?)

Did anybody here try such a thing?
If my explanation is too unclear, please ask.

greetings,
Bjoern Anton Erlach





> >Hello comp.dsp readers, > >I am working on a plugin that involves estimation of spectral >peaks (frequency and magnitude - basically the whole thing is >phase-vocoder based). >I first tried to use parabolic interpolation (by taking the >maximum of the parabola to calculate the frequency and amplitude >of sinusoids I want to track). > >In most cases that works sufficiently well, but when sinusoids >are involved that are too close together (but still a few bins arpart), >the sidelobes overlap too much and the whole thing gives pretty >bad results. > >So I am wondering now if it could be improved by cross-correlation >of the spectra with the spectrum of the window (blackman) and >afterwards an estimation of the peaks in the cross-correlation. >I have heard that this approach would be more robust. >But I have trouble to really understand the cross-correlation of >2 complex signals (the spectra). Or would it make more sense just >to use the magnitude spectra? >How are the absolute values of the cross-correlation (which I would >use to estimate the peak) effected by the angles >of the bins? > >X is the spectrum of the analysed signal (windowed) >W is the spectrum of the window. > >compute the cross-correlation: >A[n] = sum(k = -N...N, X[n] * W[n+k])
sorry, that should be A[n] = sum(k = -N...N, X[k] * W[k+n]) and I guess I should use the complex conjugate of X. It's also not clear to me why.
> >Then I want to use peaks in A[n] to calculate sinusoidal parameters. >The main question is if it makes sense to calculate the >cross-correlation of the complex-numbered spectra or if >this has no advantage over using the magnitude spectra? >(Or do you guys think there is no advantage over using parabolic >interpolation anyway?) > >Did anybody here try such a thing? >If my explanation is too unclear, please ask. > >greetings, >Bjoern Anton Erlach > > > > > >
On Mar 8, 7:41 pm, "banton" <bant...@web.de> wrote:
> Hello comp.dsp readers, > > I am working on a plugin that involves estimation of spectral > peaks (frequency and magnitude - basically the whole thing is > phase-vocoder based). > I first tried to use parabolic interpolation (by taking the > maximum of the parabola to calculate the frequency and amplitude > of sinusoids I want to track). > > In most cases that works sufficiently well, but when sinusoids > are involved that are too close together (but still a few bins arpart), > the sidelobes overlap too much and the whole thing gives pretty > bad results. > > So I am wondering now if it could be improved by cross-correlation > of the spectra with the spectrum of the window (blackman) and > afterwards an estimation of the peaks in the cross-correlation. > I have heard that this approach would be more robust. > But I have trouble to really understand the cross-correlation of > 2 complex signals (the spectra). Or would it make more sense just > to use the magnitude spectra? > How are the absolute values of the cross-correlation (which I would > use to estimate the peak) effected by the angles > of the bins? > > X is the spectrum of the analysed signal (windowed) > W is the spectrum of the window. > > compute the cross-correlation: > A[n] = sum(k = -N...N, X[n] * W[n+k]) > > Then I want to use peaks in A[n] to calculate sinusoidal parameters. > The main question is if it makes sense to calculate the > cross-correlation of the complex-numbered spectra or if > this has no advantage over using the magnitude spectra? > (Or do you guys think there is no advantage over using parabolic > interpolation anyway?) > > Did anybody here try such a thing? > If my explanation is too unclear, please ask. > > greetings, > Bjoern Anton Erlach
You may get better answers if you can answer some of these questions: What inputs do you have? Time Domain? Frequency domain Complex coefficients? How many? What do you know about the inputs? SNR region: low, high? Type of background: noise, tones? These are some of the parameters that suggest what might work with the information you have. Dale B. Dalrymple http://dbdimages.com
>On Mar 8, 7:41 pm, "banton" <bant...@web.de> wrote: >> Hello comp.dsp readers, >> >> I am working on a plugin that involves estimation of spectral >> peaks (frequency and magnitude - basically the whole thing is >> phase-vocoder based). >> I first tried to use parabolic interpolation (by taking the >> maximum of the parabola to calculate the frequency and amplitude >> of sinusoids I want to track). >> >> In most cases that works sufficiently well, but when sinusoids >> are involved that are too close together (but still a few bins
arpart),
>> the sidelobes overlap too much and the whole thing gives pretty >> bad results. >> >> So I am wondering now if it could be improved by cross-correlation >> of the spectra with the spectrum of the window (blackman) and >> afterwards an estimation of the peaks in the cross-correlation. >> I have heard that this approach would be more robust. >> But I have trouble to really understand the cross-correlation of >> 2 complex signals (the spectra). Or would it make more sense just >> to use the magnitude spectra? >> How are the absolute values of the cross-correlation (which I would >> use to estimate the peak) effected by the angles >> of the bins? >> >> X is the spectrum of the analysed signal (windowed) >> W is the spectrum of the window. >> >> compute the cross-correlation: >> A[n] = sum(k = -N...N, X[n] * W[n+k]) >> >> Then I want to use peaks in A[n] to calculate sinusoidal parameters. >> The main question is if it makes sense to calculate the >> cross-correlation of the complex-numbered spectra or if >> this has no advantage over using the magnitude spectra? >> (Or do you guys think there is no advantage over using parabolic >> interpolation anyway?) >> >> Did anybody here try such a thing? >> If my explanation is too unclear, please ask. >> >> greetings, >> Bjoern Anton Erlach > >You may get better answers if you can answer some of these questions: > >What inputs do you have?
Musical Instruments (1 channel recordings)
> Time Domain? > Frequency domain > Complex coefficients? > How many?
The input comes from a microphone and I start with taking STFT frames in which I want to detect peaks to track some of the harmonics. As I said I want to cross-correlate spectra. One would be the STFT frame (framesize: 4096 samples windowed + 4096 zeropadding) and the other one is the spectrum of the window with the same size and zeropadding. By finding the maxima in the cross-correlation I hope to identify peaks better in cases where the simpler method of interpolating the spectral-peaks with parabolic interpolation failes.
>What do you know about the inputs? > SNR region: low, high? > Type of background: noise, tones?
Background noise will not be a big problem. The problem arises when tracked sinusoids are too close, so that the sidelobes of the peaks interfere with their neighbours.
>These are some of the parameters that suggest what might work with the >information you have. > >Dale B. Dalrymple >http://dbdimages.com >
On Mar 9, 4:41&#4294967295;am, "banton" <bant...@web.de> wrote:
> Hello comp.dsp readers, > > I am working on a plugin that involves estimation of spectral > peaks (frequency and magnitude - basically the whole thing is > phase-vocoder based). > I first tried to use parabolic interpolation (by taking the > maximum of the parabola to calculate the frequency and amplitude > of sinusoids I want to track). > > In most cases that works sufficiently well, but when sinusoids > are involved that are too close together (but still a few bins arpart), > the sidelobes overlap too much and the whole thing gives pretty > bad results. > > So I am wondering now if it could be improved by cross-correlation > of the spectra with the spectrum of the window (blackman) and > afterwards an estimation of the peaks in the cross-correlation. > I have heard that this approach would be more robust.
Others have already asked for a number of clarifications. It might be useful to specify exactly what you try to achieve: 1) Do you want to assert the presence of one peak? 2) Do you want to locate the frequency with top magnitude of one peak with high precision? 3) Do you want to identify two close peaks as separate? Each question requires an individual approache to reach a solution, and the solutions to questions 2) and 3) are based on the assumption that question 1) does not pose a problem at all. The usual application when window functions are used, is to answer question 1), since the net effect of windows on (cross) spectra is to reduce the senitivity to spurious noise. Once one start playing with frequency estimation (q. 2)) and signal separations (q. 3)), cross correlations and cross spectra are used, but very seldom with non-rectangular window functions [*], as these tend to obfuscate the analysis and results. Rune [*] If you are unfamiliar with window functions you might interpret the statement as if a rectangular window may be used. Don't worry, you already use it. I write as I do to avoid certain boring discussions on semantics between comp.dsp regulars.
>On Mar 9, 4:41=A0am, "banton" <bant...@web.de> wrote: >> Hello comp.dsp readers, >> >> I am working on a plugin that involves estimation of spectral >> peaks (frequency and magnitude - basically the whole thing is >> phase-vocoder based). >> I first tried to use parabolic interpolation (by taking the >> maximum of the parabola to calculate the frequency and amplitude >> of sinusoids I want to track). >> >> In most cases that works sufficiently well, but when sinusoids >> are involved that are too close together (but still a few bins
arpart),
>> the sidelobes overlap too much and the whole thing gives pretty >> bad results. >> >> So I am wondering now if it could be improved by cross-correlation >> of the spectra with the spectrum of the window (blackman) and >> afterwards an estimation of the peaks in the cross-correlation. >> I have heard that this approach would be more robust. > >Others have already asked for a number of clarifications. >It might be useful to specify exactly what you try to achieve: > >1) Do you want to assert the presence of one peak?
No
>2) Do you want to locate the frequency with top magnitude > of one peak with high precision?
Almost. I want to locate multiple peaks in a given frequency region.
>3) Do you want to identify two close peaks as separate?
Yes. But I am not talking about Peaks which are so close that they just cause beating. I mean Peaks which can easily be identified by eye, if you look at plots of the spectrum. But it is still the interference between them which causes trouble. Since it seems to be quite unclear what I am trying to do, I can give some references that describe the basis of sinusoidal modells as they are used in audio-processing (like the phase-vocoder and its derivatives): http://www.panix.com/~jens/pvoc-dolson.par http://ccrma.stanford.edu/~jos/sasp/ thanks, Bjoern
>>On Mar 9, 4:41=A0am, "banton" <bant...@web.de> wrote: >>> Hello comp.dsp readers, >>> >>> I am working on a plugin that involves estimation of spectral >>> peaks (frequency and magnitude - basically the whole thing is >>> phase-vocoder based). >>> I first tried to use parabolic interpolation (by taking the >>> maximum of the parabola to calculate the frequency and amplitude >>> of sinusoids I want to track). >>> >>> In most cases that works sufficiently well, but when sinusoids >>> are involved that are too close together (but still a few bins >arpart), >>> the sidelobes overlap too much and the whole thing gives pretty >>> bad results. >>> >>> So I am wondering now if it could be improved by cross-correlation >>> of the spectra with the spectrum of the window (blackman) and >>> afterwards an estimation of the peaks in the cross-correlation. >>> I have heard that this approach would be more robust. >> >>Others have already asked for a number of clarifications. >>It might be useful to specify exactly what you try to achieve: >> >>1) Do you want to assert the presence of one peak? > >No > >>2) Do you want to locate the frequency with top magnitude >> of one peak with high precision? > >Almost. I want to locate multiple peaks in a given frequency >region. > >>3) Do you want to identify two close peaks as separate? > >Yes. But I am not talking about Peaks which are so close >that they just cause beating. I mean Peaks which can easily >be identified by eye, if you look at plots of the spectrum. >But it is still the interference between them which causes trouble. > >Since it seems to be quite unclear what I am trying to do, I >can give some references that describe the basis of sinusoidal >modells as they are used in audio-processing >(like the phase-vocoder and its derivatives): > >http://www.panix.com/~jens/pvoc-dolson.par > >http://ccrma.stanford.edu/~jos/sasp/
For further clearification in my own words: I want to analyse sounds of musical instruments. The result of the analysis is a representation of the sound in terms of a number of sinusoids (usually the harmonics) with (rather slowly) varyieng amplitudes and frequencies. One step is the identification of the frequencies and amplitudes which of course almost never fall on the exact frequencies of the bins of the STFT. And exactly for this step I try to find out if I can improve results by cross-correlation with the spectrum of the used window. Which is not rectangular since I am more or less sure that for this purpose it would make things worse. The problems are caused by leakage and interference in bins of neighbouring peaks.
>thanks, >Bjoern > > >
On Mar 9, 6:54 am, "banton" <bant...@web.de> wrote:
> >2) Do you want to locate the frequency with top magnitude > > of one peak with high precision? > > Almost. I want to locate multiple peaks in a given frequency > region. > > >3) Do you want to identify two close peaks as separate? > > Yes. But I am not talking about Peaks which are so close > that they just cause beating. I mean Peaks which can easily > be identified by eye, if you look at plots of the spectrum. > But it is still the interference between them which causes trouble.
Simple magnitude peak finding methods with FFT results will likely encounter problems if there are less than some small number, say 2 or 3, of beats per frame. Cross-correlation with the window transform, compared to parabolic interpolation, may improve estimation for those frequency offsets where the shape of the transform of your window is very different from parabolic. Cross-correlation using the complex FFT results instead of the magnitudes may help for certain combinations of phase relationships between the two frequencies and the window center. If you use a non-rectangular window, it may reduce side lobe interference of more distant peaks, but also increase the span over which the main lobes of nearby peaks can interfere with each other. IMHO. YMMV. -- rhn A.T nicholson d.0.t C-o-M http://www.nicholson.com/rhn/dsp.html
>On Mar 9, 6:54 am, "banton" <bant...@web.de> wrote: >> >3) Do you want to identify two close peaks as separate? >> >> Yes. But I am not talking about Peaks which are so close >> that they just cause beating. I mean Peaks which can easily >> be identified by eye, if you look at plots of the spectrum. >> But it is still the interference between them which causes trouble. > >Simple magnitude peak finding methods with FFT results will >likely encounter problems if there are less than some small >number, say 2 or 3, of beats per frame.
You say: "simple magnitude peak finding methods" - and I wonder what other methods I could try. Any suggestions/references?
>Cross-correlation >with the window transform, compared to parabolic >interpolation, may improve estimation for those frequency >offsets where the shape of the transform of your window is >very different from parabolic. Cross-correlation using >the complex FFT results instead of the magnitudes may help >for certain combinations of phase relationships between the >two frequencies and the window center.
I played around with the cross-correlation idea and didn't get any improvements. Sometimes I got better estimates sometimes worse. So for now I went back to the parabolic interpolation. Of course what you say about the shape of the transform of the window makes sense. But I was hoping that the cross-correlation would bring an advantage because it would use a little bit more of the information in the spectrum around the peak, since the parabolic interpolation just looks at 3 samples. In other words I hoped that the information from the sidelobes surrounding the peak could somehow be turned into something usefull. An idea that came through my mind is to use FFTs and peak-detection just to get a rough estimate of where to look for partials and then try to use some kind of "adaptive heterodyning" with bandpass filters that start with center-frequencies at the peak-locations. Maybe this way I could get more details about the sinusoids I want to track. So if anybody here has some experience or ideas to share, about the implementation of phase vocoders or sinusoidal modells, I am happy to hear about it.
>If you use a non-rectangular window, it may reduce side >lobe interference of more distant peaks, but also increase >the span over which the main lobes of nearby peaks can >interfere with each other.
Yes, that is clear to me. Thanks for the replies, Bjoern
> > >IMHO. YMMV. >-- >rhn A.T nicholson d.0.t C-o-M > http://www.nicholson.com/rhn/dsp.html >
On Mar 8, 10:46 pm, "banton" <bant...@web.de> wrote:

> ... > Background noise will not be a big problem. > The problem arises when tracked sinusoids are too close, so that > the sidelobes of the peaks interfere with their neighbours. > ...
One interesting technique applicable in high SNR situations for estimating two poorly separated tones involves use of a Gaussian window on data in an FFT and using the bins, off peak on the side away from the interfering signal to estimate the frequency and amplitude. Take a look at: http://www.edn.com/archives/1994/030394/graph/05df1fg1.htm There is a figure 1 you can click on to enlarge to see an example of the performance capability on the data set fred harris used to discuss frequency resolution. C code is provided. This is from a back issue of EDN magazine. The Gaussian window was chosen to: 1) widen mainlobe response 2) increase sidelobe rolloff 3) simplify calculation of frequency and magnitude from the ratio of the amplitudes of two non-peak bins. Items 1) and 2) could be provided by many good windows and serve to improve signal to interferer ratio in the bins used for calculation. SNR must be high enough that noise is not the limitation. Item 3) is a characteristic of the Gaussian window and is described in the article. Dale B. Dalrymple