Search Spectral Audio Signal Processing
Would you like to be notified by email when Julius Orion Smith III publishes a new entry into his blog?
In this chapter, we looked at a variety of time-frequency displays appropriate for audio signals. All were implemented in terms of the short-time Fourier transform (STFT). The classical spectrogram was reviewed, and its performance on a speech sample was illustrated. A loudness spectrogram based on a model of time-varying loudness perception [86] was discussed. In this model, the STFT (or a multi-resolution STFT), is smoothed and non-uniformly resampled in frequency to approximate an auditory filter bank, whose power output is taken to be the excitation pattern. A compressive nonlinearity is then applied to produce the specific loudness, which we took as our loudness spectrogram. The specific loudness can be optionally smoothed with respect to time to form a short- or long-term loudness spectrogram. Summing over frequency yields the corresponding loudness functions versus time. Finally, ideal, non-uniform, spectral resampling was discussed; such a general-purpose tool is useful for converting an FFT to an auditory filter bank, and for creating other non-uniform filter banks as resamplings of an FFT output.
