DSPRelated.com
Books

Speech and Audio Signal Processing: Processing and Perception of Speech and Music

Gold, Ben, Morgan, Nelson, Ellis, Dan 2011

When Speech and Audio Signal Processing published in 1999,it stood out from its competition in its breadth of coverage andits accessible, intutiont-based style. This book was aimed atindividual students and engineers excited about the broad span ofaudio processing and curious to understand the availabletechniques. Since then, with the advent of the iPod in 2001,the field of digital audio and music has exploded, leading to amuch greater interest in the technical aspects of audioprocessing.

This Second Edition will update and revise the originalbook to augment it with new material describing both the enablingtechnologies of digital music distribution (most significantly theMP3) and a range of exciting new research areas in automatic musiccontent processing (such as automatic transcription, musicsimilarity, etc.) that have emerged in the past five years, drivenby the digital music revolution.

New chapter topics include: * Psychoacoustic Audio Coding, describing MP3 and relatedaudio coding schemes based on psychoacoustic masking ofquantization noise * Music Transcription, including automatically derivingnotes, beats, and chords from music signals. * Music Information Retrieval, primarily focusing onaudio-based genre classification, artist/style identification, andsimilarity estimation. * Audio Source Separation, including multi-microphonebeamforming, blind source separation, and the perception-inspiredtechniques usually referred to as Computational Auditory SceneAnalysis (CASA).


Why Read This Book

You should read this book if you want a unified, engineer-friendly introduction to both the signal processing techniques and perceptual ideas behind speech and music systems. It balances intuition, practical algorithms (e.g., STFT, LPC, perceptual coding), and real-world examples so you can move quickly from theory to application.

Who Will Benefit

Upper-level undergraduates, graduate students, and practicing engineers working on audio, speech recognition, coding, or music-information tasks who need both DSP fundamentals and perceptual context.

Level: Intermediate — Prerequisites: Undergraduate-level signals and systems (DTFT/DFT/FFT), basic calculus and linear algebra, and familiarity with basic probability; MATLAB or similar familiarity is helpful but not required.

Get This Book

Key Takeaways

  • Perform time-frequency analysis using STFT and spectrograms for speech and music signals.
  • Extract common speech features (e.g., MFCCs, spectral envelopes) and understand their perceptual motivation.
  • Implement linear predictive coding (LPC) for analysis/synthesis and apply it to speech modeling.
  • Apply perceptual/audio coding principles (masking, psychoacoustics) to understand formats like MP3.
  • Estimate pitch and harmonic structure for voiced sounds and musical notes.
  • Design basic noise-reduction and enhancement algorithms using spectral and statistical methods.

Topics Covered

  1. Introduction: goals, overview of speech and music signals
  2. Acoustics and Perception: hearing, critical bands, loudness
  3. Discrete-Time Signal Analysis: Fourier review, FFT, spectral estimation
  4. Short-Time Analysis: STFT, spectrograms, windows
  5. Filter Banks and Multirate Methods
  6. Speech Production and Source-Filter Models
  7. Linear Predictive Coding and Analysis-by-Synthesis
  8. Pitch, Harmonics, and Voiced/Unvoiced Processing
  9. Perceptual Audio Coding and Masking (MP3 and perceptual models)
  10. Feature Extraction for Recognition and Music Analysis (MFCCs, chroma)
  11. Noise Reduction and Enhancement; Statistical Methods
  12. Applications: speech coding, synthesis, music information retrieval
  13. Appendices: useful transforms, MATLAB examples and practical tips

Languages, Platforms & Tools

MATLABPythonFFT and signal-processing toolkits (MATLAB/Python libraries)Audio codec standards (MP3/MPEG) discussed conceptually

How It Compares

Broader and more perceptually oriented than Rabiner & Juang's Fundamentals of Speech Recognition (which focuses on statistical recognition); more speech- and perception-focused than Zölzer's Digital Audio Signal Processing, which emphasizes effects and implementation details.

Related Books