Autocorrelation and the case of the missing fundamental

Allen DowneyJanuary 21, 201610 comments

[UPDATED January 25, 2016:  One of the examples was broken, also the IPython notebook links now point to nbviewer, where you can hear the examples.]

For sounds with simple harmonic structure, the pitch we perceive is usually the fundamental frequency, even if it is not dominant.  For example, here's the spectrum of a half-second recording of a saxophone.

The first three peaks are at 464, 928, and 1392 Hz.  The pitch we perceive is the fundamental, 464 Hz, which is close to B♭4.  If you don't believe me, you can listen to it, and the following examples, in this IPython notebook.

To understand why we perceive pitch this way, it helps to look at the autocorrelation function (ACF).  If you are not familiar with the ACF, you might want to start with Chapter 5 of Think DSP.

Here is the ACF for this segment:

The highest peak is at a lag 95, which corresponds to frequency 464 Hz.  At least in this example, the pitch we perceive corresponds to the highest correlation in the ACF rather than the highest amplitude component of the spectrum.

Surprisingly, the perceived pitch doesn't change if we remove the fundamental completely. Here's what the spectrum looks like if we use a high-pass filter to clobber the fundamental.

The perceived pitch is still 464 Hz, even though there is no power at that frequency. This phenomenon is called the "missing fundamental". Again, you can hear it in the IPython notebook.

To understand why we hear a frequency that's not in the signal, it helps to look at the ACF again:

Removing the fundamental has little effect on the ACF.  The third peak, which corresponds to 464 Hz, is still the highest, and that's the pitch we perceive.

There are two other peaks that are almost as high, corresponding to 1297 Hz and 722 Hz.  So why don't we perceive either of those pitches, instead of 464 Hz? The reason is that the higher components in the spectrum are harmonics of 464 Hz and they are not harmonics of 722 or 1297 Hz.  So our ear interprets these harmonics as evidence that the "right" fundamental is at 464 Hz.

If we get rid of the high harmonics, the effect goes away.  Here's the spectrum with harmonics above 1200 Hz removed.

At this point there's pretty much only one component left, so it sounds like a sine wave at 92 Hz.  And if we look at the ACF one more time:

The first highest peak corresponds to 938 Hz (not exactly 928 Hz, but within the resolution of the ACF).

In summary, these experiments suggest that pitch perception is not based entirely on spectral analysis, but is also informed by something like autocorrelation.  According to the Wikipedia page on the missing fundamental:

It is now widely accepted that the brain processes the information present in the overtones to calculate the fundamental frequency. The precise way in which it does so is still a matter of debate, but the processing seems to be based on an autocorrelation involving the timing of neural impulses in the auditory nerve.[5] However, it has long been noted that any neural mechanisms which may accomplish a delay (a necessary operation of a true autocorrelation) have not been found.[3] At least one model shows a temporal delay to be unnecessary to produce an autocorrelation model of pitch perception, appealing to phase shifts between cochlear filters;[6] however, earlier work has shown that certain sounds with a prominent peak in their autocorrelation function do not elicit a corresponding pitch percept,[7][8] and that certain sounds without a peak in their autocorrelation function nevertheless elicit a pitch.[9][10] Autocorrelation can thus be considered, at best, an incomplete model.

It sounds like there are opportunities for more research in this area.

[ - ]
Comment by Rick LyonsJanuary 30, 2016
Hi Allen. Out[14] still sounds like a beep to me. And Out[20] sounds much like a pure audio tone. But no matter.

Again, this 'missing fundamental' topic is interesting. For decades I thought I knew how telephones worked. But a few years ago a signal processing guy pointed out to me that in our U.S. telephone system all audio below roughly 350 Hz is filtered out of our telephone audio signal before transmission to a destination phone. Now I'm guessing that the fundamental spectral components of my speech signal are all below 350 Hz and are filtered out by the phone company. But at the destination phone the listener's ear/brain combination, somehow, "replaces" or "regenerates" the missing fundamental components making my speech sound normal. Interesting, huh?
[ - ]
Comment by DirkDoesDSPJanuary 31, 2016
Hi Rick,

If you have a fundamental and some successive harmonics, removing the fundamental does not change the period of the waveform. The period still is equal to the period of the fundamental whether the fundamental is present or not. It seems that the brain is recognizing periodicity, and people are interpreting that as pitch.

Take a step back and consider if you removed the fundamental from a signal that contained succesive harmonics, would you expect the resulting signal to not have a pitch, or would you expect it to have a different pitch?

[ - ]
Comment by AllenDowneyJanuary 30, 2016
Interesting. It's possible that you don't hear the missing fundamental -- apparently some people don't, especially musicians.
[ - ]
Comment by DirkDoesDSPFebruary 1, 2016
Hi Rick,

I sent this yesterday but it didn't show up. Here is an abbreviated version.

If you have a fundamental and some (>=2) sucessive harmonics, you perceive the pitch as the period of the waveform. If you remove the fudamental, the period will remain unchanged. It seems that people interperet the period as the pitch.

Taking a step back, if the perceived pitch does not remain unchanged i the case described above, what pitch would you expect to hear? Or would you expect the remaining waveform to have no identifiable pitch?

[ - ]
Comment by boneJanuary 21, 2016
Very Interesting! I have never thought about this.
[ - ]
Comment by Rick LyonsJanuary 27, 2016
This is an interesting topic. I tried to listen to the audio that had the fundamental saxophone tone removed,IPython signal 'Out[14]', but all I heard was a beep. Did I do something wrong?
[ - ]
Comment by AllenDowneyJanuary 27, 2016
To me, Out[14] still sounds like a saxophone, and still has a perceived pitch around 464 Hz. When you get to Out[20] I expect it to sound like a beep at 928 Hz.

Let me know if that's not working. And just to confirm, we're talking about this link: http://nbviewer.jupyter.org/github/AllenDowney/ThinkDSP/blob/master/code/saxophone.ipynb ?
[ - ]
Comment by CarpetofStarsFebruary 24, 2016
This discussion is extremely interesting as I am doing research in Artificial Intelligence in this area. It alerts me to some things that I missed, specifically, the missing fundamental. Hmmm.
[ - ]
Comment by CarpetofStarsFebruary 24, 2016
I'm new here. How do I award you a beer?
[ - ]
Comment by stephanebFebruary 24, 2016
The beer reward program is new too! Allen will have to enter his paypal email address into his account for the beer button to appear just bellow the title of the blog.

To post reply to a comment, click on the 'reply' button attached to each comment. To post a new comment (not a reply to a comment) check out the 'Write a Comment' tab at the top of the comments.

Please login (on the right) if you already have an account on this platform.

Otherwise, please use this form to register (free) an join one of the largest online community for Electrical/Embedded/DSP/FPGA/ML engineers: