DSPRelated.com
Forums

Harmonic Product Spectrum

Started by glowkeeper April 23, 2008
This is heavily related to this post:
http://www.dsprelated.com/showmessage/89637/1.php, but since that's a
little old now, I thought I'd start a new thread ;)

I am trying to use the Harmonic Product Spectrum method to guess the
fundamental frequency from an input to a microphone. It's working 50% of
the time, but the other 50% it's picking out harmonics other than the
fundamental. I wonder if someone could help explain as to why, and perhaps
suggest where I might be going wrong or could improve the algorithm
outlined below.

I'm using fftw to give me a fourier transform full of reals from the
microphone input. From that I calculate a power spectrum by taking the
absolute value of each bin of the fft.

I then 2, 3 and 4x downsample the power spectrum into different arrays by
taking the 2nd, 3rd and 4th output of the power spectrum respectively.

Next, I multiply each bin of the power, 2x downsampled, 3x downsampled and
4x downsampled spectrums to produce a Harmonic Product Spectrum. 

Finally, I search for the bin of the greatest magnitude and calculate it's
frequency by doing bin * ( sampleRate / fftsize ). 

fyi, the sample rate of my mic' is 11khz, and I'm using an fft size of 2k.
Therefore each bin of my fft represents ~5Hz. 

Any help and suggestions greatly appreciated!

Steve





On Apr 23, 6:17 am, "glowkeeper" <glowkee...@yahoo.co.uk> wrote:
> This is heavily related to this post:http://www.dsprelated.com/showmessage/89637/1.php, but since that's a > little old now, I thought I'd start a new thread ;) > > I am trying to use the Harmonic Product Spectrum method to guess the > fundamental frequency from an input to a microphone. It's working 50% of > the time, but the other 50% it's picking out harmonics other than the > fundamental. I wonder if someone could help explain as to why, and perhaps > suggest where I might be going wrong or could improve the algorithm > outlined below.
Have you examined the spectrum of those fft frames where you think your method is picking out harmonics other than the fundamental?
> I'm using fftw to give me a fourier transform full of reals from the > microphone input. From that I calculate a power spectrum by taking the > absolute value of each bin of the fft.
How are you taking the absolute value of each complex fft bin?
> I then 2, 3 and 4x downsample the power spectrum into different arrays by > taking the 2nd, 3rd and 4th output of the power spectrum respectively. > > Next, I multiply each bin of the power, 2x downsampled, 3x downsampled and > 4x downsampled spectrums to produce a Harmonic Product Spectrum. > > Finally, I search for the bin of the greatest magnitude and calculate it's > frequency by doing bin * ( sampleRate / fftsize ). > > fyi, the sample rate of my mic' is 11khz, and I'm using an fft size of 2k. > Therefore each bin of my fft represents ~5Hz.
If you downsample your fft, shouldn't that change the bin width? Or are you throwing away spectral energy? That loss of information could certainly effect the results. IMHO. YMMV. -- rhn A.T nicholson d.0.t C-o-M http://www.nicholson.com/rhn/dsp.html
>On Apr 23, 6:17 am, "glowkeeper" <glowkee...@yahoo.co.uk> wrote: >> This is heavily related to this
post:http://www.dsprelated.com/showmessage/89637/1.php, but since that's a
>> little old now, I thought I'd start a new thread ;) >> >> I am trying to use the Harmonic Product Spectrum method to guess the >> fundamental frequency from an input to a microphone. It's working 50%
of
>> the time, but the other 50% it's picking out harmonics other than the >> fundamental. I wonder if someone could help explain as to why, and
perhaps
>> suggest where I might be going wrong or could improve the algorithm >> outlined below. > >Have you examined the spectrum of those fft frames where >you think your method is picking out harmonics other than >the fundamental?
Not exactly. You're right, I will examine the data a little closer.
>> I'm using fftw to give me a fourier transform full of reals from the >> microphone input. From that I calculate a power spectrum by taking the >> absolute value of each bin of the fft. > >How are you taking the absolute value of each complex fft >bin?
The app' is c++, and I'm using the standard c api call: fabs( fft[binNumber ] ); I suppose I could also do something like taking the square root of the square, but that seems expensive (although granted, I don't know what fabs is doing under the hood).
>> I then 2, 3 and 4x downsample the power spectrum into different arrays
by
>> taking the 2nd, 3rd and 4th output of the power spectrum respectively. >> >> Next, I multiply each bin of the power, 2x downsampled, 3x downsampled
and
>> 4x downsampled spectrums to produce a Harmonic Product Spectrum. >> >> Finally, I search for the bin of the greatest magnitude and calculate
it's
>> frequency by doing bin * ( sampleRate / fftsize ). >> >> fyi, the sample rate of my mic' is 11khz, and I'm using an fft size of
2k.
>> Therefore each bin of my fft represents ~5Hz. > >If you downsample your fft, shouldn't that change the bin >width? Or are you throwing away spectral energy? That >loss of information could certainly effect the results.
Ah, now this was something I wasn't sure of to be honest. Maybe? I output my product to an array the same size as the fft, and therefore I thought the bin width was the same. But I think you're suggestingt that my 2x bin width is ~5Hz/2, 3x ~5Hz/3, 4x ~5Hz/4, and therefore when I find the bin of greatest magnitude in the harmonic product spectrum, I need to factor this into my equation and not just bin * ( sampleRate / fftsize ), right? Thanks for your help.
On Apr 24, 3:33 am, "glowkeeper" <glowkee...@yahoo.co.uk> wrote:
> The app' is c++, and I'm using the standard c api call: > > fabs( fft[binNumber ] ); > > I suppose I could also do something like taking the square root of the > square, but that seems expensive (although granted, I don't know what fabs > is doing under the hood).
fabs is the absolute value of a real number. You have to use sqrt(re*re + im*im) instead, or hypot(re, im) on Unix (which is more accurate), or abs(std::complex<double>(re, im)) in C++, or .... where re and im are the real and imaginary parts, respectively. Regards, Steven G. Johnson
>I suppose I could also do something like taking the square root of the >square, but that seems expensive (although granted, I don't know what
fabs
>is doing under the hood).
First of all, your Fourier coefficients should be complex, even if the input data are real. From FFTW's website, the planning function is: fftw_plan fftw_plan_dft_r2c_1d(int n, double *in, fftw_complex *out, unsigned flags); where *in is a pointer to your real input array, and *out is a pointer to your complex output array. Next, fabs() performs an absolute value of a real data point. For power, you actually want the absolute value squared. I.e., given a complex coefficient of (a+jb), the power is (a+jb)(a-jb), or abs(a+jb)^2. Since you're using FFTW, the output array is defined as type fftw_complex (or fftwf_complex for single-precision), unless you've included the complex.h header from the C99 math library. If so, then it will be of type double/float complex at the output. I've never done it this way but maybe Steven Johnson can chime in on how it works better (I prefer their convention since the complex math doesn't work as well on the processors I've used). Either way, the appropriate function at that point would be cabs(x)*cabs(x) (well, y = cabs(x), then y2 = y*y, if you want to out-think the compiler), which is the complex magnitude squared. Mark
The planning function I'm using is fftwf_plan_r2r_1d, and I use it this
way:

fftwf_plan_r2r_1d( 2048, m_pFourierIn, m_pFourierOut, FFTW_R2HC,
FFTW_BACKWARD );

But nonetheless, I'm still getting the power algorithm  wrong it
seems....

>>I suppose I could also do something like taking the square root of the >>square, but that seems expensive (although granted, I don't know what >fabs >>is doing under the hood). > >First of all, your Fourier coefficients should be complex, even if the >input data are real. From FFTW's website, the planning function is: > >fftw_plan fftw_plan_dft_r2c_1d(int n, double *in, fftw_complex *out, > unsigned flags); > >where *in is a pointer to your real input array, and *out is a pointer
to
>your complex output array. > >Next, fabs() performs an absolute value of a real data point. For
power,
>you actually want the absolute value squared. I.e., given a complex >coefficient of (a+jb), the power is (a+jb)(a-jb), or abs(a+jb)^2. Since >you're using FFTW, the output array is defined as type fftw_complex (or >fftwf_complex for single-precision), unless you've included the
complex.h
>header from the C99 math library. If so, then it will be of type >double/float complex at the output. I've never done it this way but
maybe
>Steven Johnson can chime in on how it works better (I prefer their >convention since the complex math doesn't work as well on the processors >I've used). Either way, the appropriate function at that point would be >cabs(x)*cabs(x) (well, y = cabs(x), then y2 = y*y, if you want to
out-think
>the compiler), which is the complex magnitude squared. > >Mark >
Actually, I should say that I'm using 

fftwf_plan_r2r_1d( 2048, m_pFourierIn, m_pFourierOut, FFTW_R2HC,
FFTW_BACKWARD );

and then when I come to do the power computations I work on the first 1024
entries in the returned transform (ignoring the imaginary parts). Is this
incorrect?

>The planning function I'm using is fftwf_plan_r2r_1d, and I use it this >way: > >fftwf_plan_r2r_1d( 2048, m_pFourierIn, m_pFourierOut, FFTW_R2HC, >FFTW_BACKWARD ); > >But nonetheless, I'm still getting the power algorithm wrong it >seems.... > >>>I suppose I could also do something like taking the square root of the >>>square, but that seems expensive (although granted, I don't know what >>fabs >>>is doing under the hood). >> >>First of all, your Fourier coefficients should be complex, even if the >>input data are real. From FFTW's website, the planning function is: >> >>fftw_plan fftw_plan_dft_r2c_1d(int n, double *in, fftw_complex *out, >> unsigned flags); >> >>where *in is a pointer to your real input array, and *out is a pointer >to >>your complex output array. >> >>Next, fabs() performs an absolute value of a real data point. For >power, >>you actually want the absolute value squared. I.e., given a complex >>coefficient of (a+jb), the power is (a+jb)(a-jb), or abs(a+jb)^2.
Since
>>you're using FFTW, the output array is defined as type fftw_complex (or >>fftwf_complex for single-precision), unless you've included the >complex.h >>header from the C99 math library. If so, then it will be of type >>double/float complex at the output. I've never done it this way but >maybe >>Steven Johnson can chime in on how it works better (I prefer their >>convention since the complex math doesn't work as well on the
processors
>>I've used). Either way, the appropriate function at that point would
be
>>cabs(x)*cabs(x) (well, y = cabs(x), then y2 = y*y, if you want to >out-think >>the compiler), which is the complex magnitude squared. >> >>Mark >> >
>Actually, I should say that I'm using > >fftwf_plan_r2r_1d( 2048, m_pFourierIn, m_pFourierOut, FFTW_R2HC, >FFTW_BACKWARD );
Ah, gotcha. OK, the way I read the documentation on this is that the complex data still exists, but in the second half of the transform output, which you would need to incorporate into your absolute value function, i.e. you still get a complex output, but the data are ordered all of the inphase first, then quadrature last. I'll defer to SJ for a better explanation since I've never used the R2HC option. Mark
On Apr 24, 8:38 am, "glowkeeper" <glowkee...@yahoo.co.uk> wrote:
> Actually, I should say that I'm using > > fftwf_plan_r2r_1d( 2048, m_pFourierIn, m_pFourierOut, FFTW_R2HC, > FFTW_BACKWARD ); > > and then when I come to do the power computations I work on the first 1024 > entries in the returned transform (ignoring the imaginary parts). Is this > incorrect?
Yes. If you ignore the "imaginary" parts, then a phase shift of 90 degrees could cause the fundamental or any harmonic to disappear from your composite "spectrum". If your fft frames are not phase locked to the signal generator, then this could easily cause the random type of results you may be seeing.
On Apr 23, 11:33 pm, "glowkeeper" <glowkee...@yahoo.co.uk> wrote:
> >If you downsample your fft, shouldn't that change the bin > >width? Or are you throwing away spectral energy? That > >loss of information could certainly effect the results. > > Ah, now this was something I wasn't sure of to be honest. Maybe? I output > my product to an array the same size as the fft, and therefore I thought > the bin width was the same. > > But I think you're suggestingt that my 2x bin width is ~5Hz/2, 3x ~5Hz/3, > 4x ~5Hz/4, ...
Actually, the opposite. When you downsample a spectrum, each bin should end up representing a wider frequency band, e.g. 5 Hz * 2, etc., or you will be throwing away energy. IMHO. YMMV.