Autocorrelation and the case of the missing fundamental
[UPDATED January 25, 2016: One of the examples was broken, also the IPython notebook links now point to nbviewer, where you can hear the examples.]
For sounds with simple harmonic structure, the pitch we perceive is usually the fundamental frequency, even if it is not dominant. For example, here's the spectrum of a half-second recording of a saxophone.
The first three peaks are at 464, 928, and 1392 Hz. The pitch we perceive is the fundamental, 464 Hz, which is close to B♭4. If you don't believe me, you can listen to it, and the following examples, in this IPython notebook.
To understand why we perceive pitch this way, it helps to look at the autocorrelation function (ACF). If you are not familiar with the ACF, you might want to start with Chapter 5 of Think DSP.
Here is the ACF for this segment:
The highest peak is at a lag 95, which corresponds to frequency 464 Hz. At least in this example, the pitch we perceive corresponds to the highest correlation in the ACF rather than the highest amplitude component of the spectrum.
Surprisingly, the perceived pitch doesn't change if we remove the fundamental completely. Here's what the spectrum looks like if we use a high-pass filter to clobber the fundamental.
The perceived pitch is still 464 Hz, even though there is no power at that frequency. This phenomenon is called the "missing fundamental". Again, you can hear it in the IPython notebook.
To understand why we hear a frequency that's not in the signal, it helps to look at the ACF again:
Removing the fundamental has little effect on the ACF. The third peak, which corresponds to 464 Hz, is still the highest, and that's the pitch we perceive.
There are two other peaks that are almost as high, corresponding to 1297 Hz and 722 Hz. So why don't we perceive either of those pitches, instead of 464 Hz? The reason is that the higher components in the spectrum are harmonics of 464 Hz and they are not harmonics of 722 or 1297 Hz. So our ear interprets these harmonics as evidence that the "right" fundamental is at 464 Hz.
If we get rid of the high harmonics, the effect goes away. Here's the spectrum with harmonics above 1200 Hz removed.
At this point there's pretty much only one component left, so it sounds like a sine wave at 92 Hz. And if we look at the ACF one more time:
The first highest peak corresponds to 938 Hz (not exactly 928 Hz, but within the resolution of the ACF).
In summary, these experiments suggest that pitch perception is not based entirely on spectral analysis, but is also informed by something like autocorrelation. According to the Wikipedia page on the missing fundamental:
It is now widely accepted that the brain processes the information present in the overtones to calculate the fundamental frequency. The precise way in which it does so is still a matter of debate, but the processing seems to be based on an autocorrelation involving the timing of neural impulses in the auditory nerve. However, it has long been noted that any neural mechanisms which may accomplish a delay (a necessary operation of a true autocorrelation) have not been found. At least one model shows a temporal delay to be unnecessary to produce an autocorrelation model of pitch perception, appealing to phase shifts between cochlear filters; however, earlier work has shown that certain sounds with a prominent peak in their autocorrelation function do not elicit a corresponding pitch percept, and that certain sounds without a peak in their autocorrelation function nevertheless elicit a pitch. Autocorrelation can thus be considered, at best, an incomplete model.
It sounds like there are opportunities for more research in this area.
- Write a Comment Select to add a comment
Again, this 'missing fundamental' topic is interesting. For decades I thought I knew how telephones worked. But a few years ago a signal processing guy pointed out to me that in our U.S. telephone system all audio below roughly 350 Hz is filtered out of our telephone audio signal before transmission to a destination phone. Now I'm guessing that the fundamental spectral components of my speech signal are all below 350 Hz and are filtered out by the phone company. But at the destination phone the listener's ear/brain combination, somehow, "replaces" or "regenerates" the missing fundamental components making my speech sound normal. Interesting, huh?
If you have a fundamental and some successive harmonics, removing the fundamental does not change the period of the waveform. The period still is equal to the period of the fundamental whether the fundamental is present or not. It seems that the brain is recognizing periodicity, and people are interpreting that as pitch.
Take a step back and consider if you removed the fundamental from a signal that contained succesive harmonics, would you expect the resulting signal to not have a pitch, or would you expect it to have a different pitch?
I sent this yesterday but it didn't show up. Here is an abbreviated version.
If you have a fundamental and some (>=2) sucessive harmonics, you perceive the pitch as the period of the waveform. If you remove the fudamental, the period will remain unchanged. It seems that people interperet the period as the pitch.
Taking a step back, if the perceived pitch does not remain unchanged i the case described above, what pitch would you expect to hear? Or would you expect the remaining waveform to have no identifiable pitch?
Let me know if that's not working. And just to confirm, we're talking about this link: http://nbviewer.jupyter.org/github/AllenDowney/ThinkDSP/blob/master/code/saxophone.ipynb ?
To post reply to a comment, click on the 'reply' button attached to each comment. To post a new comment (not a reply to a comment) check out the 'Write a Comment' tab at the top of the comments.
Please login (on the right) if you already have an account on this platform.
Otherwise, please use this form to register (free) an join one of the largest online community for Electrical/Embedded/DSP/FPGA/ML engineers: