Automatic Speech Recognition: A Deep Learning Approach (Signals and Communication Technology)
This book provides a comprehensive overview of the recent advancement in the field of automatic speech recognition with a focus on deep learning models including deep neural networks and many of their variants. This is the first automatic speech recognition book dedicated to the deep learning approach. In addition to the rigorous mathematical treatment of the subject, the book also presents insights and theoretical foundation of a series of highly successful deep learning models.
Why Read This Book
You will learn how deep learning transformed automatic speech recognition and how to apply DNNs, DBNs, CNNs, and RNNs to build practical ASR systems. The book blends rigorous mathematical treatment with engineering insights so you can both understand model theory and implement state-of-the-art acoustic models and training pipelines.
Who Will Benefit
Engineers and graduate students with some background in signal processing or machine learning who want to design or research modern ASR systems using deep learning methods.
Level: Advanced — Prerequisites: Linear algebra, probability and statistics, basic signal processing (discrete-time signals, spectral analysis), familiarity with machine learning fundamentals and programming (MATLAB or Python).
Key Takeaways
- Implement DNN-based acoustic models and integrate them with HMM/decoder frameworks
- Extract and preprocess speech features (MFCC, PLP, filterbanks, normalization) suitable for neural models
- Apply DBNs, CNNs, and RNN/LSTM architectures to frame-level and sequence modeling tasks
- Train models with sequence-discriminative criteria (MMI, MPE, sMBR) and understand CTC-style approaches
- Design decoding pipelines including WFST-based decoding and language model integration
- Improve robustness with adaptation, speaker normalization, noise compensation, and enhancement techniques
Topics Covered
- Introduction: Overview of ASR and the Deep Learning Revolution
- Speech Signals, Feature Extraction, and Preprocessing
- Classical Acoustic Modeling: GMM-HMM Foundations
- Neural Network Fundamentals: Feedforward Nets, RBMs, and DBNs
- Deep Neural Networks for Acoustic Modeling
- Convolutional Neural Networks for Speech
- Recurrent Neural Networks and LSTMs for Sequence Modeling
- Sequence Training and Discriminative Criteria (MMI, MPE, sMBR, CTC)
- Decoding, WFSTs, and Language Model Integration
- Adaptation, Speaker Normalization, and Robustness
- Speech Enhancement and Noise-Robust ASR Techniques
- Large-Vocabulary Continuous Speech Recognition (LVCSR) Systems
- Practical System Design, Toolkits, and Research Directions
Languages, Platforms & Tools
How It Compares
Compared with Rabiner & Juang's Fundamentals of Speech Recognition (classic HMM/GMM focus), this book emphasizes deep learning-based acoustic modeling and modern sequence training; it complements general deep learning texts (e.g., Goodfellow et al.) by applying DL specifically to ASR.












