Human and Machine Hearing: Extracting Meaning from Sound
Human and Machine Hearing is the first book to comprehensively describe how human hearing works and how to build machines to analyze sounds in the same way that people do. Drawing on over thirty-five years of experience in analyzing hearing and building systems, Richard F. Lyon explains how we can now build machines with close-to-human abilities in speech, music, and other sound-understanding domains. He explains human hearing in terms of engineering concepts, and describes how to incorporate those concepts into machines for a wide range of modern applications. The details of this approach are presented at an accessible level, to bring a diverse range of readers, from neuroscience to engineering, to a common technical understanding. The description of hearing as signal-processing algorithms is supported by corresponding open-source code, for which the book serves as motivating documentation.
Why Read This Book
You will learn how human hearing can be expressed as concrete signal‑processing algorithms and how to turn those insights into practical machine implementations for speech, music, and general sound understanding. The book blends physiology, perceptual phenomena, and engineering — giving you intuitions, models (like cochlear filterbanks and auditory nerve encoding), and algorithmic recipes you can apply to DSP, recognition, and audio analysis tasks.
Who Will Benefit
Engineers, researchers, and advanced students in audio/speech processing, DSP, and computational neuroscience who want to design biologically inspired algorithms for sound analysis and understanding.
Level: Intermediate — Prerequisites: Undergraduate-level signal processing (linear systems, convolution, Fourier transform), basic calculus and probability, and familiarity with discrete-time DSP concepts (filters, FFT).
Key Takeaways
- Implement biologically motivated cochlear filterbanks and gammatone-style filters for robust time–frequency analysis.
- Design auditory-inspired front ends for speech and music processing that improve feature robustness and perceptual relevance.
- Apply temporal and spectral analysis methods (e.g., auditory spectrograms, onset detection, pitch cues) to real audio tasks.
- Build statistical and adaptive processing stages that mimic auditory nerve encoding and early neural processing for improved classification and detection.
- Translate perceptual and physiological concepts (masking, compression, nonlinear transduction) into algorithmic components for practical systems.
- Evaluate and tune auditory models for applications in speech recognition, audio source separation, and acoustic scene analysis.
Topics Covered
- Preface and overview: why model hearing for machines
- Anatomy and electrophysiology of the ear: cochlea to auditory nerve
- The cochlea as a signal processor: mechanics, tuning, and nonlinearities
- Filterbanks and gammatone models: implementing auditory frequency analysis
- Temporal processing and neural encoding: envelopes, fine structure, and spike timing
- Spectral analysis and time–frequency representations for auditory tasks
- Pitch, timbre, and perceptual cues: extracting musical and voice features
- Adaptive filtering and short‑term plasticity in auditory models
- Statistical signal processing for hearing: noise, masking, and inference
- Auditory scene analysis: segregation, grouping, and source identification
- Applications: speech recognition, music analysis, hearing aids, and radar/communications parallels
- Practical implementation notes, sample code, and evaluation methods
- Appendices: mathematical background, useful transforms, and further reading
Languages, Platforms & Tools
How It Compares
Compared with Gold & Morgan’s "Speech and Audio Signal Processing" (practical DSP for speech), Lyon emphasizes biologically grounded auditory models and how to implement them; compared with Bregman’s "Auditory Scene Analysis," Lyon provides more engineer‑oriented, implementable algorithms rather than primarily psychoacoustic theory.












