Text-to-Speech Synthesis
Text-to-Speech Synthesis provides a complete, end-to-end account of the process of generating speech by computer. Giving an in-depth explanation of all aspects of current speech synthesis technology, it assumes no specialized prior knowledge. Introductory chapters on linguistics, phonetics, signal processing and speech signals lay the foundation, with subsequent material explaining how this knowledge is put to use in building practical systems that generate speech. Including coverage of the very latest techniques such as unit selection, hidden Markov model synthesis, and statistical text analysis, explanations of the more traditional techniques such as format synthesis and synthesis by rule are also provided. Weaving together the various strands of this multidisciplinary field, the book is designed for graduate students in electrical engineering, computer science, and linguistics. It is also an ideal reference for practitioners in the fields of human communication interaction and telephony.
Why Read This Book
You will get a single, coherent, end-to-end account of how text is turned into intelligible, natural-sounding speech, from linguistics and phonetics through DSP and modern statistical synthesis. The book balances practical system-building guidance with clear explanations of unit-selection, HMM/statistical approaches, prosody, and classic methods so you can both understand and implement real TTS systems.
Who Will Benefit
Engineers, graduate students, and researchers who want to design or evaluate TTS systems — from implementation-focused developers to speech scientists seeking a comprehensive synthesis reference.
Level: Intermediate — Prerequisites: Undergraduate-level math (calculus, linear algebra) and basic signal-processing concepts; familiarity with programming (e.g., scripting or C/C++) is helpful — no prior specialist knowledge of linguistics or TTS is required.
Key Takeaways
- Understand the linguistic and phonetic foundations required to convert text to speech.
- Describe and implement classic synthesis methods (formant and rule-based) as well as concatenative/unit-selection systems.
- Apply DSP techniques for speech analysis and waveform manipulation including spectral analysis, filtering, and vocoding.
- Build and evaluate HMM/statistical waveform and acoustic models for modern TTS.
- Design and tune prosody, intonation, and duration models to improve naturalness and intelligibility.
- Assess and choose corpora, tools, and evaluation methods for system development and research.
Topics Covered
- 1. Introduction: scope and history of speech synthesis
- 2. Speech production and phonetics
- 3. Acoustic description of speech and signal-processing basics
- 4. Speech analysis techniques: spectrograms, LPC, FFT
- 5. Front-end text processing: tokenization, normalization, lexical lookup
- 6. Prosody: intonation, stress, and duration modeling
- 7. Formant and rule-based synthesis
- 8. Concatenative synthesis and unit selection
- 9. Statistical and HMM-based synthesis
- 10. Vocoders, waveform generation and waveform modification
- 11. Corpora, evaluation, and perceptual testing
- 12. Practical system design, toolkits and implementation issues
- 13. Future directions and research challenges
Languages, Platforms & Tools
How It Compares
More focused on synthesis end-to-end than Jurafsky & Martin's broader NLP/textbook coverage and complements Rabiner & Juang's work on recognition — Taylor goes deeper into practical TTS techniques like unit selection and HMM-based synthesis.












