Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
This will be the definitive book on spoken language systems written by the people at Microsoft Research who have developed the voic-activated technologies that will be imbedded in Windows 2000 and other key Microsoft products of the future. This is not a Microsoft book, however, this is a book on the science and linguistics of this technology and how to use it in developing and building hardware and software products.
Why Read This Book
You should read this book if you want a thorough, practical grounding in how state-of-the-art ASR systems were designed and built around HMM/statistical methods: you will get both the signal-processing and the statistical-modeling perspectives needed to implement and evaluate real spoken-language systems. It balances algorithmic depth (feature extraction, acoustic models, language models, discriminative training) with system-level concerns (corpora, evaluation, toolkits) so you can move from theory to a working system.
Who Will Benefit
Engineers and researchers with a background in DSP or statistical methods who are building or researching automatic speech recognition or spoken-language systems.
Level: Intermediate — Prerequisites: Basic DSP (sampling, FFT, filter banks), linear algebra and probability/statistics (Markov models, maximum likelihood), and some programming experience (MATLAB/C or similar).
Key Takeaways
- Extract robust speech features (e.g., filter-bank, MFCC, delta features) suitable for ASR pipelines.
- Formulate and train HMM-based acoustic models and understand parameter estimation (MLE/EM) and state tying.
- Build and apply n-gram language models and integrate them with acoustic decoding (Viterbi/beam search).
- Apply discriminative training and speaker-adaptation techniques (e.g., MMI, MPE, MLLR) to improve recognition accuracy.
- Design and evaluate complete speech-recognition systems using standard corpora and NIST-style metrics.
- Diagnose noise-robustness issues and apply front-end and model-based strategies to improve performance in adverse conditions.
Topics Covered
- Introduction and overview of spoken-language systems
- Speech production, acoustics, and signal representations
- Feature extraction: filter banks, MFCCs, LPC, and normalization
- Statistical pattern recognition foundations for speech
- Hidden Markov models: structure, decoding and Viterbi search
- Parameter estimation: Baum-Welch/EM and practical training issues
- Acoustic modeling: context-dependent models and state tying
- Language modeling: n-grams, smoothing, and class-based methods
- Discriminative training and advanced optimization (MMI, MPE)
- Speaker and environment adaptation (MLLR, MAP, compensation techniques)
- Decoding architectures, search algorithms, and real-time considerations
- Corpora, evaluation methods, and system integration
- Practical system development, toolkits, and deployment
Languages, Platforms & Tools
How It Compares
Covers similar ASR algorithmic ground as Rabiner & Juang's 'Fundamentals of Speech Recognition' but is more system- and implementation-oriented; complements Jurafsky & Martin by focusing on signal/acoustic modeling rather than NLP linguistics.












