Sign in

username:

password:



Not a member?

Search Online Books



Search tips

Free Online Books

Ads

Chapters

Chapter Contents:

Search Mathematics of the DFT

  

Book Index | Global Index


Would you like to be notified by email when Julius Orion Smith III publishes a new entry into his blog?

  

Spectrogram of Speech

Figure 8.10: Classic spectrogram of a speech sample.
\includegraphics[width=\textwidth]{eps/speechspgm}

An example spectrogram for recorded speech data is shown in Fig.8.10. It was generated using the Matlab code displayed in Fig.8.11. The function spectrogram is listed in §I.5. The spectrogram is computed as a sequence of FFTs of windowed data segments. The spectrogram is plotted by spectrogram using imagesc.

Figure 8.11: Matlab for computing a speech spectrogram.

 
[y,fs,bits] = wavread('SpeechSample.wav');
soundsc(y,fs); % Let's hear it
% for classic look:
colormap('gray'); map = colormap; imap = flipud(map);
M = round(0.02*fs);  % 20 ms window is typical
N = 2^nextpow2(4*M); % zero padding for interpolation
w = 0.54 - 0.46 * cos(2*pi*[0:M-1]/(M-1)); % w = hamming(M);
colormap(imap); % Octave wants it here
spectrogram(y,N,fs,w,-M/8,1,60); 
colormap(imap); % Matlab wants it here
title('Hi - This is <you-know-who> ');
ylim([0,(fs/2)/1000]); % don't plot neg. frequencies

In this example, the Hamming window length was chosen to be 20 ms, as is typical in speech analysis. This is short enough so that any single 20 ms frame will typically contain data from only one phoneme,8.3 yet long enough that it will include at least two periods of the fundamental frequency during voiced speech, assuming the lowest voiced pitch to be around 100 Hz.

More generally, for speech and the singing voice (and any periodic tone), the STFT analysis parameters are chosen to trade off among the following conflicting criteria:

  1. The harmonics should be resolved.
  2. Pitch and formant variations should be closely followed.
The formants in speech are the resonances in the vocal tract. They appear as dark groups of harmonics in Fig.8.10. The first two formants largely determine the ``vowel'' in voiced speech. In telephone speech, nominally between 200 and 3200 Hz, only three or four formants are usually present in the band.


Order a Hardcopy of Mathematics of the DFT

Previous: Spectrograms
Next: Filters and Convolution

written by Julius Orion Smith III
Julius Smith's background is in electrical engineering (BS Rice 1975, PhD Stanford 1983). He is presently Professor of Music and Associate Professor (by courtesy) of Electrical Engineering at Stanford's Center for Computer Research in Music and Acoustics (CCRMA), teaching courses and pursuing research related to signal processing applied to music and audio systems. See http://ccrma.stanford.edu/~jos/ for details.


Comments


No comments yet for this page


Add a Comment
You need to login before you can post a comment (best way to prevent spam). ( Not a member? )