Spectrogram of Speech
An example spectrogram for recorded speech data is shown in Fig.8.10. It was generated using the Matlab code displayed in Fig.8.11. The function spectrogram is listed in §I.5. The spectrogram is computed as a sequence of FFTs of windowed data segments. The spectrogram is plotted by spectrogram using imagesc.
[y,fs,bits] = wavread('SpeechSample.wav'); soundsc(y,fs); % Let's hear it % for classic look: colormap('gray'); map = colormap; imap = flipud(map); M = round(0.02*fs); % 20 ms window is typical N = 2^nextpow2(4*M); % zero padding for interpolation w = 0.54 - 0.46 * cos(2*pi*[0:M-1]/(M-1)); % w = hamming(M); colormap(imap); % Octave wants it here spectrogram(y,N,fs,w,-M/8,1,60); colormap(imap); % Matlab wants it here title('Hi - This is <you-know-who> '); ylim([0,(fs/2)/1000]); % don't plot neg. frequencies
In this example, the Hamming window length was chosen to be 20 ms, as is typical in speech analysis. This is short enough so that any single 20 ms frame will typically contain data from only one phoneme,8.6 yet long enough that it will include at least two periods of the fundamental frequency during voiced speech, assuming the lowest voiced pitch to be around 100 Hz.
More generally, for speech and the singing voice (and any periodic tone), the STFT analysis parameters are chosen to trade off among the following conflicting criteria:
- The harmonics should be resolved.
- Pitch and formant variations should be closely followed.
Hann-Windowed Complex Sinusoid