Fundamental Frequency Estimation from Sinusoidal Peaks

Sinusoidal peak measurement was discussed in Chapter 4. Given a set of sinusoidal peak frequencies $ f_i$, $ i=1,\ldots,N_f$, it is usually straightforward to form a fundamental frequency estimate ``$ F_0$''. This task is also called pitch detection, where the perceived ``pitch'' of the audio signal is assumed to coincide well enough with its fundamental frequency. We assume here that the signal is periodic, so that all of its sinusoidal components are harmonics of the fundamental frequency $ F_0$. (For inharmonic sounds, the perceived pitch, if any, can be complex to predict [54].)

An approximate maximum-likelihood $ F_0$-detection algorithm10.1 consists of the following steps:

  1. Find the peak of the histogram of the peak-frequency-differences in order to find the most common harmonic spacing. This is the nominal pitch estimate.
  2. Refine the nominal pitch estimate using linear regression. Linear regression simply fits a straight line through the data to give a least-squares fit.
  3. The slope of the fitted line gives the pitch estimate.

A matlab listing for F0 estimation along these lines appears in §G.6.

Useful Preprocessing

In many cases, results are improved through the use of preprocessing of the spectrum prior to peak finding. Examples include the following:

  • Pre-emphasis: Equalize the spectrum so as to flatten it

  • Masking: Small peaks close to much larger peaks are often masked by the auditory system. Therefore, it is good practice to reject all peaks below an inaudibility threshold which is the maximum of the threshold of hearing (versus frequency) and the masking pattern generated by the largest peaks. Since it is simple to extract peaks in descending magnitude order, each removed peak can be replaced by its masking pattern, which elevates the assumed inaudibility threshold.

Getting Closer to Maximum Likelihood

In applications for which the fundamental frequency F0 must be measured very accurately in a periodic signal, the estimate obtained by the above algorithm can be refined using a gradient search which matches a so-called ``harmonic comb'' to the magnitude spectrum of an interpolated FFT $ X(\omega)$:

$\displaystyle {\hat f}_0$ $\displaystyle \isdef$ $\displaystyle \arg\max_{{\hat f}_0} \sum_{k=1}^K
\log\left[\left\vert X(k{\hat f}_0)\right\vert+\epsilon\right]$  
  $\displaystyle =$ $\displaystyle \arg\max_{{\hat f}_0} \prod_{k=1}^K \left[\left\vert X(k{\hat f}_0)\right\vert+\epsilon\right]
\protect$ (10.1)


K &=& \mbox{number of peaks, and}\\
k &=& \mbox{harmonic numb...
... of the spectral magnitude}\\
& & \mbox{\lq\lq noise floor'' level}

The purpose of $ \epsilon>0$ is an insurance against multiplying the whole expression by zero due to a missing partial (e.g., due to a comb-filtering null). If $ \epsilon=0$ in (9.1), it is advisable to omit indices $ k$ for which $ k{\hat f}_0$ is too close to a spectral null, since even one spectral null can push the product of peak amplitudes to a very small value. At the same time, the product should be penalized in some way to reflect the fact that it has fewer terms ( $ \epsilon>0$ is one way to accomplish this).

As a practical matter, it is important to inspect the magnitude spectra of the data frame manually to ensure that a robust row of peaks is being matched by the harmonic comb. For example, it is typical to look at a display of the frame magnitude spectrum overlaid with vertical lines at the optimized harmonic-comb frequencies. This provides an effective picture of the F0 estimate in which typical problems (such as octave errors) are readily seen.

References on $ F_0$ Estimation

An often-cited book on classical methods for pitch detection, particularly for voice, is that by Hess [101]. The harmonic-comb method can be considered an approximate maximum-likelihood estimator for fundamental frequency, and more accurate maximum-likelihood methods have been worked out [64,276,217,218]. More recently, Klapuri has been developing some promising methods for multiple pitch estimation [120,119,117].10.2Another promising approach to multiple pitch estimation was presented by Smaragdis at Acoustics'08 in Paris (abstract only); the full paper is in review at the time of this writing.

Next Section:
Cross Synthesis
Previous Section:
STFT Summary and Conclusions