DSPRelated.com
Books

Automatic Speech Recognition: A Deep Learning Approach (Signals and Communication Technology)

Yu, Dong, Deng, Li 2014

This book provides a comprehensive overview of the recent advancement in the field of automatic speech recognition with a focus on deep learning models including deep neural networks and many of their variants. This is the first automatic speech recognition book dedicated to the deep learning approach. In addition to the rigorous mathematical treatment of the subject, the book also presents insights and theoretical foundation of a series of highly successful deep learning models.


Why Read This Book

You will learn how deep learning transformed automatic speech recognition and how to apply DNNs, DBNs, CNNs, and RNNs to build practical ASR systems. The book blends rigorous mathematical treatment with engineering insights so you can both understand model theory and implement state-of-the-art acoustic models and training pipelines.

Who Will Benefit

Engineers and graduate students with some background in signal processing or machine learning who want to design or research modern ASR systems using deep learning methods.

Level: Advanced — Prerequisites: Linear algebra, probability and statistics, basic signal processing (discrete-time signals, spectral analysis), familiarity with machine learning fundamentals and programming (MATLAB or Python).

Get This Book

Key Takeaways

  • Implement DNN-based acoustic models and integrate them with HMM/decoder frameworks
  • Extract and preprocess speech features (MFCC, PLP, filterbanks, normalization) suitable for neural models
  • Apply DBNs, CNNs, and RNN/LSTM architectures to frame-level and sequence modeling tasks
  • Train models with sequence-discriminative criteria (MMI, MPE, sMBR) and understand CTC-style approaches
  • Design decoding pipelines including WFST-based decoding and language model integration
  • Improve robustness with adaptation, speaker normalization, noise compensation, and enhancement techniques

Topics Covered

  1. Introduction: Overview of ASR and the Deep Learning Revolution
  2. Speech Signals, Feature Extraction, and Preprocessing
  3. Classical Acoustic Modeling: GMM-HMM Foundations
  4. Neural Network Fundamentals: Feedforward Nets, RBMs, and DBNs
  5. Deep Neural Networks for Acoustic Modeling
  6. Convolutional Neural Networks for Speech
  7. Recurrent Neural Networks and LSTMs for Sequence Modeling
  8. Sequence Training and Discriminative Criteria (MMI, MPE, sMBR, CTC)
  9. Decoding, WFSTs, and Language Model Integration
  10. Adaptation, Speaker Normalization, and Robustness
  11. Speech Enhancement and Noise-Robust ASR Techniques
  12. Large-Vocabulary Continuous Speech Recognition (LVCSR) Systems
  13. Practical System Design, Toolkits, and Research Directions

Languages, Platforms & Tools

PythonMATLABC/C++KaldiHTKTheanoTensorFlowMATLAB Signal Processing ToolboxCUDA/cuDNN (GPU acceleration)

How It Compares

Compared with Rabiner & Juang's Fundamentals of Speech Recognition (classic HMM/GMM focus), this book emphasizes deep learning-based acoustic modeling and modern sequence training; it complements general deep learning texts (e.g., Goodfellow et al.) by applying DL specifically to ASR.

Related Books

Duda, Richard O., Hart, Pet...
Jurafsky, Daniel, Martin, J...
Rabiner, Lawrence, Schafer,...