DSPRelated.com
Books

Text-to-Speech Synthesis

Taylor, Paul 2009

Text-to-Speech Synthesis provides a complete, end-to-end account of the process of generating speech by computer. Giving an in-depth explanation of all aspects of current speech synthesis technology, it assumes no specialized prior knowledge. Introductory chapters on linguistics, phonetics, signal processing and speech signals lay the foundation, with subsequent material explaining how this knowledge is put to use in building practical systems that generate speech. Including coverage of the very latest techniques such as unit selection, hidden Markov model synthesis, and statistical text analysis, explanations of the more traditional techniques such as format synthesis and synthesis by rule are also provided. Weaving together the various strands of this multidisciplinary field, the book is designed for graduate students in electrical engineering, computer science, and linguistics. It is also an ideal reference for practitioners in the fields of human communication interaction and telephony.


Why Read This Book

You will get a single, coherent, end-to-end account of how text is turned into intelligible, natural-sounding speech, from linguistics and phonetics through DSP and modern statistical synthesis. The book balances practical system-building guidance with clear explanations of unit-selection, HMM/statistical approaches, prosody, and classic methods so you can both understand and implement real TTS systems.

Who Will Benefit

Engineers, graduate students, and researchers who want to design or evaluate TTS systems — from implementation-focused developers to speech scientists seeking a comprehensive synthesis reference.

Level: Intermediate — Prerequisites: Undergraduate-level math (calculus, linear algebra) and basic signal-processing concepts; familiarity with programming (e.g., scripting or C/C++) is helpful — no prior specialist knowledge of linguistics or TTS is required.

Get This Book

Key Takeaways

  • Understand the linguistic and phonetic foundations required to convert text to speech.
  • Describe and implement classic synthesis methods (formant and rule-based) as well as concatenative/unit-selection systems.
  • Apply DSP techniques for speech analysis and waveform manipulation including spectral analysis, filtering, and vocoding.
  • Build and evaluate HMM/statistical waveform and acoustic models for modern TTS.
  • Design and tune prosody, intonation, and duration models to improve naturalness and intelligibility.
  • Assess and choose corpora, tools, and evaluation methods for system development and research.

Topics Covered

  1. 1. Introduction: scope and history of speech synthesis
  2. 2. Speech production and phonetics
  3. 3. Acoustic description of speech and signal-processing basics
  4. 4. Speech analysis techniques: spectrograms, LPC, FFT
  5. 5. Front-end text processing: tokenization, normalization, lexical lookup
  6. 6. Prosody: intonation, stress, and duration modeling
  7. 7. Formant and rule-based synthesis
  8. 8. Concatenative synthesis and unit selection
  9. 9. Statistical and HMM-based synthesis
  10. 10. Vocoders, waveform generation and waveform modification
  11. 11. Corpora, evaluation, and perceptual testing
  12. 12. Practical system design, toolkits and implementation issues
  13. 13. Future directions and research challenges

Languages, Platforms & Tools

CC++MATLABPythonFestivalHTS (HMM-based TTS)MBROLASTRAIGHTPraat

How It Compares

More focused on synthesis end-to-end than Jurafsky & Martin's broader NLP/textbook coverage and complements Rabiner & Juang's work on recognition — Taylor goes deeper into practical TTS techniques like unit selection and HMM-based synthesis.

Related Books