Sign in

username:

password:



Not a member?

Search Online Books



Search tips

Free Online Books

Sponsor

NEW! TMS320C6474: 3x the performance. 1/3 the cost. Three 1 GHz cores on 1 chip.

Chapters

Chapter Contents:

Search Physical Audio Signal Processing

  

Book Index | Global Index


Would you like to be notified by email when Julius Orion Smith III publishes a new entry into his blog?

  

Formant Synthesis Models

A formant synthesizer is a source-filter model in which the source models the glottal pulse train and the filter models the formant resonances of the vocal tract. Constrained linear prediction can be used to estimate the parameters of formant synthesis models, but more generally, formant peak parameters may be estimated directly from the short-time spectrum (e.g., [264]). The filter in a formant synthesizer is typically implemented using cascade or parallel second-order filter sections, one per formant. Most modern rule-based text-to-speech systems descended from software based on this type of synthesis model [264,265,266].

Another type of formant-synthesis method, developed specifically for singing-voice synthesis is called the FOF method [392]. It can be considered an extension of the VOSIM voice synthesis algorithm [222]. In the FOF method, the formant filters are implemented in the time domain as parallel second-order sections; thus, the vocal-tract impulse response is modeled a sum of three or so exponentially decaying sinusoids. Instead of driving this filter with a glottal pulse wave, a simple impulse is used, thereby greatly reducing computational cost. A convolution of an impulse response with an impulse train is simply a periodic superposition of the impulse response. In the VOSIM algorithm, the impulse response was trimmed to one period in length, thereby avoiding overlap and further reducing computations.

The FOF method also tapers the beginning of the impulse-response using a rising half-cycle of a sinusoid. This qualitatively reduces the ``buzziness'' of the sound, and compensates for having replaced the glottal pulse with an impulse. In practice, however, the synthetic signal is matched to the desired signal in the frequency domain, and the details of the onset taper are adjusted to optimize audio quality more generally, including to broaden the formant resonances.

One of the difficulties of formant synthesis methods is that formant parameter estimation is not always easy