DSPRelated.com
Books

Human and Machine Hearing: Extracting Meaning from Sound

Lyon, Richard F. 2017

Human and Machine Hearing is the first book to comprehensively describe how human hearing works and how to build machines to analyze sounds in the same way that people do. Drawing on over thirty-five years of experience in analyzing hearing and building systems, Richard F. Lyon explains how we can now build machines with close-to-human abilities in speech, music, and other sound-understanding domains. He explains human hearing in terms of engineering concepts, and describes how to incorporate those concepts into machines for a wide range of modern applications. The details of this approach are presented at an accessible level, to bring a diverse range of readers, from neuroscience to engineering, to a common technical understanding. The description of hearing as signal-processing algorithms is supported by corresponding open-source code, for which the book serves as motivating documentation.


Why Read This Book

You will learn how human hearing can be expressed as concrete signal‑processing algorithms and how to turn those insights into practical machine implementations for speech, music, and general sound understanding. The book blends physiology, perceptual phenomena, and engineering — giving you intuitions, models (like cochlear filterbanks and auditory nerve encoding), and algorithmic recipes you can apply to DSP, recognition, and audio analysis tasks.

Who Will Benefit

Engineers, researchers, and advanced students in audio/speech processing, DSP, and computational neuroscience who want to design biologically inspired algorithms for sound analysis and understanding.

Level: Intermediate — Prerequisites: Undergraduate-level signal processing (linear systems, convolution, Fourier transform), basic calculus and probability, and familiarity with discrete-time DSP concepts (filters, FFT).

Get This Book

Key Takeaways

  • Implement biologically motivated cochlear filterbanks and gammatone-style filters for robust time–frequency analysis.
  • Design auditory-inspired front ends for speech and music processing that improve feature robustness and perceptual relevance.
  • Apply temporal and spectral analysis methods (e.g., auditory spectrograms, onset detection, pitch cues) to real audio tasks.
  • Build statistical and adaptive processing stages that mimic auditory nerve encoding and early neural processing for improved classification and detection.
  • Translate perceptual and physiological concepts (masking, compression, nonlinear transduction) into algorithmic components for practical systems.
  • Evaluate and tune auditory models for applications in speech recognition, audio source separation, and acoustic scene analysis.

Topics Covered

  1. Preface and overview: why model hearing for machines
  2. Anatomy and electrophysiology of the ear: cochlea to auditory nerve
  3. The cochlea as a signal processor: mechanics, tuning, and nonlinearities
  4. Filterbanks and gammatone models: implementing auditory frequency analysis
  5. Temporal processing and neural encoding: envelopes, fine structure, and spike timing
  6. Spectral analysis and time–frequency representations for auditory tasks
  7. Pitch, timbre, and perceptual cues: extracting musical and voice features
  8. Adaptive filtering and short‑term plasticity in auditory models
  9. Statistical signal processing for hearing: noise, masking, and inference
  10. Auditory scene analysis: segregation, grouping, and source identification
  11. Applications: speech recognition, music analysis, hearing aids, and radar/communications parallels
  12. Practical implementation notes, sample code, and evaluation methods
  13. Appendices: mathematical background, useful transforms, and further reading

Languages, Platforms & Tools

MATLABPython (NumPy/SciPy)C/C++pseudocode/examplesgeneral-purpose CPUsembedded DSPs and microcontrollersGPUs for accelerationMATLAB Signal Processing ToolboxLibrosaFFTWNumPy/SciPyexample gammatone/auditory toolboxesPraat (for analysis)

How It Compares

Compared with Gold & Morgan’s "Speech and Audio Signal Processing" (practical DSP for speech), Lyon emphasizes biologically grounded auditory models and how to implement them; compared with Bregman’s "Auditory Scene Analysis," Lyon provides more engineer‑oriented, implementable algorithms rather than primarily psychoacoustic theory.

Related Books