Fundamental Frequency Estimation from Spectral Peaks
Spectral peak measurement was discussed in Chapter 5. Given a set of peak frequencies , , it is usually straightforward to form a fundamental frequency estimate `` ''. This task is also called pitch detection, where the perceived ``pitch'' of the audio signal is assumed to coincide well enough with its fundamental frequency. We assume here that the signal is periodic, so that all of its sinusoidal components are harmonics of a fundamental component having frequency . (For inharmonic sounds, the perceived pitch, if any, can be complex to predict .) An approximate maximum-likelihood -detection algorithm11.1 consists of the following steps:
- Find the peak of the histogram of the peak-frequency-differences in order to find the most common harmonic spacing. This is the nominal estimate. The matlab hist function can be used to form a histogram from the measured peak-spacings.
- Refine the nominal estimate using linear regression. Linear regression simply fits a straight line through the data to give a least-squares fit. In matlab, the function polyfit(x,y,1) can be used, e.g., p = polyfit([0,1],[1,1.5],1) returns p = [0.5,1], where p(1) is the slope, and p(2) is the offset.
- The slope p(1) of the fitted line gives the estimate.
- Pre-emphasis: Equalize the spectrum so as to flatten it. For example, low-order linear-prediction is often used for this purpose (the ``flattened'' spectrum is that of the prediction error). In voice coding, first-order linear prediction is typically used .
- Masking: Small peaks close to much larger peaks are often masked in the auditory system. Therefore, it is good practice to reject all peaks below an inaudibility threshold which is the maximum of the threshold of hearing (versus frequency) and the masking pattern generated by the largest peaks . Since it is simple to extract peaks in descending magnitude order, each removed peak can be replaced by its masking pattern, which elevates the assumed inaudibility threshold.
fundamental frequency must be measured very accurately in a periodic signal, the estimate obtained by the above algorithm can be refined using a gradient search which matches a so-called harmonic comb to the magnitude spectrum of an interpolated FFT :
pitch detection, particularly for voice, is that by Hess . The harmonic-comb method can be considered an approximate maximum-likelihood estimator for fundamental frequency, and more accurate maximum-likelihood methods have been worked out [65,297,230,231]. Another highly regarded method for estimation is the YIN algorithm . For automatic transcription of polyphonic music, Klapuri has developed methods for multiple estimation [189,127,126,124].11.2Finally, a rich source of methods may be found in the conference proceedings for the field of Music Information Retrieval (MIR)11.3 Of course, don't forget to try a Web search for ``F0 estimation'' and the like.
STFT Summary and Conclusions