DSPRelated.com
Free Books

Formant Filtering Example

In speech synthesis [27,39], digital filters are often used to simulate formant filtering by the vocal tract. It is well known [23] that the different vowel sounds of speech can be simulated by passing a ``buzz source'' through a only two or three formant filters. As a result, speech is fully intelligible through the telephone bandwidth (nominally only 200-3200 Hz).

A formant is a resonance in the voice spectrum. A single formant may thus be modeled using one biquad (second-order filter section). For example, in the vowel $ [a]$ as in ``father,'' the first three formant center-frequencies have been measured near 700, 1220, and 2600 Hz, with half-power bandwidths10.7 130, 70, and 160 Hz [40].

In principle, the formant filter sections are in series, as can be found by deriving the transfer function of an acoustic tube [48]. As a consequence, the vocal-tract transfer function is an all-pole filter (provided that the nasal tract is closed off or negligible). As a result, there is no need to specify gains for the formant resonators--only center-frequency and bandwidth are necessary to specify each formant, leaving only an overall scale factor unspecified in a cascade (series) formant filter bank.

Numerically, however, it makes more sense to implement disjoint resonances in parallel rather than in series.10.8 This is because when one formant filter is resonating, the others will be attenuating, so that to achieve a particular peak-gain at resonance, the resonating filter must overcome all combined attenuations as well as applying its own gain. In fixed-point arithmetic, this can result in large quantization-noise gains, especially for the last resonator in the chain. As a result of these considerations, our example will implement the formant sections in parallel. This means we must find the appropriate biquad numerators so that when added together, the overall transfer-function numerator is a constant. This will be accomplished using the partial fraction expansion6.8).10.9

The matlab below illustrates the construction of a parallel formant filter bank for simulating the vowel $ [a]$. For completeness, it is used to filter a bandlimited impulse train, in order to synthesize the vowel sound.

F =  [700, 1220, 2600]; % Formant frequencies (Hz)
BW = [130,  70,  160];  % Formant bandwidths (Hz)
fs = 8192;              % Sampling rate (Hz)

nsecs = length(F);
R = exp(-pi*BW/fs);     % Pole radii
theta = 2*pi*F/fs;      % Pole angles
poles = R .* exp(j*theta); % Complex poles
B = 1;  A = real(poly([poles,conj(poles)]));
% freqz(B,A); % View frequency response:

% Convert to parallel complex one-poles (PFE):
[r,p,f] = residuez(B,A);
As = zeros(nsecs,3);
Bs = zeros(nsecs,3);
% complex-conjugate pairs are adjacent in r and p:
for i=1:2:2*nsecs
    k = 1+(i-1)/2;
    Bs(k,:) = [r(i)+r(i+1),  -(r(i)*p(i+1)+r(i+1)*p(i)), 0];
    As(k,:) = [1, -(p(i)+p(i+1)), p(i)*p(i+1)];
end
sos = [Bs,As]; % standard second-order-section form
iperr = norm(imag(sos))/norm(sos); % make sure sos is ~real
disp(sprintf('||imag(sos)||/||sos|| = %g',iperr)); % 1.6e-16
sos = real(sos) % and make it exactly real

% Reconstruct original numerator and denominator as a check:
[Bh,Ah] = psos2tf(sos); % parallel sos to transfer function
% psos2tf appears in the matlab-utilities appendix
disp(sprintf('||A-Ah|| = %g',norm(A-Ah))); % 5.77423e-15
% Bh has trailing epsilons, so we'll zero-pad B:
disp(sprintf('||B-Bh|| = %g',...
             norm([B,zeros(1,length(Bh)-length(B))] - Bh)));
% 1.25116e-15

% Plot overlay and sum of all three
% resonator amplitude responses:
nfft=512;
H = zeros(nsecs+1,nfft);
for i=1:nsecs
  [Hiw,w] = freqz(Bs(i,:),As(i,:));
  H(1+i,:) = Hiw(:).';
end
H(1,:) = sum(H(2:nsecs+1,:));
ttl = 'Amplitude Response';
xlab = 'Frequency (Hz)';
ylab = 'Magnitude (dB)';
sym = '';
lgnd = {'sum','sec 1','sec 2', 'sec 3'};
np=nfft/2; % Only plot for positive frequencies
wp = w(1:np); Hp=H(:,1:np);
figure(1); clf;
myplot(wp,20*log10(abs(Hp)),sym,ttl,xlab,ylab,1,lgnd);
disp('PAUSING'); pause;
saveplot('../eps/lpcexovl.eps');

% Now synthesize the vowel [a]:
nsamps = 256;
f0 = 200; % Pitch in Hz
w0T = 2*pi*f0/fs; % radians per sample

nharm = floor((fs/2)/f0); % number of harmonics
sig = zeros(1,nsamps);
n = 0:(nsamps-1);
% Synthesize bandlimited impulse train
for i=1:nharm,
    sig = sig + cos(i*w0T*n);
end;
sig = sig/max(sig);
speech = filter(1,A,sig);
soundsc([sig,speech]); % hear buzz, then 'ah'

Notes:

  • The sampling rate was chosen to be $ f_s = 8192$ Hz because that is the default Matlab sampling rate, and because that is a typical value used for ``telephone quality'' speech synthesis.
  • The psos2tf utility is listed in §J.7.
  • The overlay of the amplitude responses are shown in Fig.9.6.

Figure 9.6: Overlay of section amplitude responses and their sum.
\includegraphics[width=\twidth]{eps/lpcexovl}

Next Section:
Butterworth Lowpass Filter Example
Previous Section:
Parallel First and/or Second-Order Sections