This blog title is inspired from the Peabody award-winning Radiolab episode 60 words. Radiolab is well known for its insightful stories on Science with an amazing sound design. Today's blog is about decoding Radiolab's theme music (actually, just a small "Mmm Newewe" part of it hereafter called the Radiolab sound). I have been taking this online course on Audio Signal Processing where we are taught how to analyze sounds and represent them in a compact form. Most of the topics should be quite easy to understand for readers of this blog. I strongly recommend listening to the course as well as Radiolab.
One very interesting way to analyze any audio is to use what is called a Harmonic Plus Stochastic Model. Idea is to treat the audio as a sum of harmonics of a fundamental frequency. Of course, we need to figure out the fundamental frequency f0 first. First step would be to split the input signal into several frames and do a Fourier transform on each of these frames. Next, we try to find the f0 for each of the frames using a technique called Two-way Mismatch Algorithm. This simple but surprisingly effective algorithm finds all the peaks in the spectrum and selects the fundamental frequency & its harmonics which closely describe the peaks in the frame.
Stochastic part is used to represent anything which cannot be captured by the harmonic analysis (things like the nasal sound, breathing sound etc.,). To calculate the stochastic part, we take the output of Harmonic Analysis and subtract that spectrum from the spectrum of the original signal. We end up with the Residual.
ResidualSpectrum = OriginalSpectrum - HarmonicSpectrum
Now, this Residual need not be completely represented. That won't help with our compacting idea. We only need part of the magnitude spectrum meaning we can do some kind of a low pass filtering to reduce the representation size. As for the phase spectrum, more often, the residual's phase spectrum has a noise like quantity. All we need is Gaussian noise which generates phase between 0 and 2.Pi. Neat!
On to more fun stuff now. I fired up the analysis tool we have been using for the course. Turns out, we can do a super accurate (qualitatively) representation of the Radiolab sound with just 20 harmonics. First plot shows our Radiolab sound in time domain. Second plot shows its spectogram with the 20 harmonics overlayed and the third plot shows our synthesis of the Radiolab sound. Even without the stochastic part, we can do a good quality Mmm Newewe sound.
Cue theme music (pun intended, obviously). What do we need to represent the Radiolab sound? Just 60 numbers. 20 for the Frequency, 20 for the Magnitude and 20 for the Phase. Ok, I cheated a bit. We need 60 numbers for each frame of the signal. Looks like the folks at WNYC like complicated stuff. They have modulated the harmonics with a signal whose frequency is the same for all harmonics but the amplitude swings are larger for higher order harmonics. And this signal at first glance looks like a single frequency sinusoid. But, it is more of a chirp. I tried to manually find this chirp signal and came up with this approximation for the fundamental frequency using Python. Here is the Synthesized version of the Radiolab sound.
import numpy as np from scipy.signal import chirp # 278 is the number of frames in the signal (each frame is 512 samples long) # 44100Hz is the sampling frequency of the signal. xaxis = np.arange(278)*512/44100.0; fitted = 200 + (37.5 * chirp(xaxis, f0=1.8/3.0, t1=xaxis[len(xaxis)-1], f1=2.8/3.0, phi=-180, method='linear'));
I guess one other way of looking at the sinsusoidal chirp is to think of it as Frequency Modulation. I am sure there are better ways of doing this analysis and what better forum to get those ideas from than on DSPRelated.com.
A second (possibly dumb) question Mahadevan: I took a quick look at the 'Two-way Mismatch Algorithm' PDF file. The terminology "partial", "partials", and "partial frequency" are used throughout that paper. Can you tell me what is a "partial" and what is a "partial frequency"? Thanks.
Rookie mistake on the links :). Fixed them now. I also have a synthesized version of the sound just above the Python code.
I am going to borrow a lot from the Audio signal processing course to explain a partial. If you take a pure sinusoid, the magnitude spectrum of it will have just two peaks with the shape decided by the window function being used. On the other hand, if you take a real life signal, much like the one we are analyzing here, there will be several components of the sound which can be represented like a sinusoid. There will also be components which are not strictly sinusoidal (I like to think of this as the peaks whose shape don't match the window function's magnitude spectrum). Such a signal is called a partial. Another way of think of a partial is to imagine it as a sinusoids whose frequencies are slowly time varying. Here is the video link where I learned this concept from: https://class.coursera.org/audio-002/lecture/83 You can start around 4:30 for the definitions.
To post reply to a comment, click on the 'reply' button attached to each comment. To post a new comment (not a reply to a comment) check out the 'Write a Comment' tab at the top of the comments.
Registering will allow you to participate to the forums on ALL the related sites and give you access to all pdf downloads.