So glad I found this website! Lots of super helpful resources. This article inspired my question: Zero-Padding for Interpolating Spectral Peaks | Spectral Audio Signal Processing (dsprelated.com)
I was curious about how oversampling improves the peak amplitude resolution for audio applications - especially since we aren't adding any NEW data to the signal, just zerofilling.
For audio, it's important to know what the "True Peak" value is when preparing the audio for release to ensure that we don't have intersample peaks that exceed 0 dbFS when converted from D/A.
I understand that oversampling essentially zero-fills the data, which increases the number of bins we have in our FFT, but how does this improve the ability to determine the peak amplitude? Is there just a function that looks at the max value of each bin or something to report the max peak amplitude? So essentially, the more bins we have, the better "chance" we will resolve the "true max peak"?
Sorry if this is a really naive question. I'm not a programmer, but I'm trying to understand more about signal processing for audio applications. Thanks so much in advance!
While it is true that oversampling doesn't add any more data, what it does provide is extra spaces for the smooth interpolation of the spectrum to better estimate the height of the peaks, especially if one of those peaks is a pure tone not quite at the centre of its particular spectral band. See the other answers for extra details.
This article talks about spectral interpolation. This article talks about having a better estimate for the maximum amplitude of a frequency in a spectrum.
What you talk about is about an intersample maximum values.
Algorithm describing the process of ISL measurements are well described in an ITU document.
Oh this article is great! So if I understand the article (pg 18, annex II), the zerofilling just adds the extra points to effectively oversample to 192 khz, and the additional data is removed using a low pass filter like many others mentioned on this post.
The low pass filter removes this extra data which manifests as additional harmonic "images" to satisfy the sampling theorem. This leaves us with several additional "sampling intervals" that we didn't have before. Then to find the "true peak height", a "peak-sample algorithm" is run, which is mentioned in that article you linked.
Is this "peak-sample algorithm" just a line of code that looks at the absolute value of the sample data after the low pass filter? That is the part I'm having a hard time understanding. If that is the case, this entire process essentially just increases the number of samples available to increase our probability of landing close to the "true peak", right?
Thanks again for the comment and link!
Hi - oversampling is:
- inserting zeros in between samples
- low-pass filtering of the result.
The zero-insertion does not help with your problem, but the low-pass filtering does. In non-signal processing terms the low-pass filter interpolates between the samples with a curve that is only allowed to have a small slope (slow change-rate). In some cases - e.g. when the sample values rise and then suddenly drop, the low-pass will cause the interpolation curve to overshoot and create higher values than the sample values.
In practice - for many purposes and for most audio signals - this should be neglectable. The low-pass has a very high cut-of frequency (almost Nyquist - half the sampling rate) and for very low frequencies the low-pass has very little effect - low-passing with the real filter becomes very similar to linear interpolation. For higher frequencies the energy is usually lower, so even though the low-pass will cause overshoot, the values will usually be safely below clipping.
It might be a problem if you are mastering a dense recording to -0.1dBFs - then your compresssed snare drum might cause clipping that you do not see in the sample values.
raising the sample rate by inserting time domain zeros causes spectral replcates in the frequency domain. the low-pass filter suppresses the replicates leaving the original spectrum at a higher sample rate which is seen as band limited interpolation between the original input time samples. The inserted zeros can't contribute to the spectrum or the filter output (zero times a coefficient is zero and zero added to the accumulator doesn't chang its value). The polyphase filter partition discards the zeros and simply moves the original input samples to different coefficient sets ( I call it the gatling gun model) see the chspter on interpolators in the text book multirate signal processing.
A better option is to convert the real input samples to the analytic signal with a hilbert transforrm filter... a simple half band filter spectrally shifted to the quarter sample rate. The envelope of the complex data is always the correct amplitude without the need to increase the sample rate. See hilbert transform filters in AGC systems in same text book
Here is a picture:
blue: signal each sample repeated 4 times for plot purpose
red: same signal upsampled by 4.
The second peak unveiled a new maximum due to interpolation. Basics of DSP.
Very interesting question. You sample a bandwidth limited signal with respect to Nyquist and you have two samples, x(k) and x(k+1). What is a lower boundary for the reconstructed analog signal x(t) in this interval?
I do not have a clue.
I 'believe' for perfect reconstruction we need to interpolate from minus inf to plus inf (i.e. as far back as big bang then towards edge of universe). Obviously a cosmic nonsense. That means infinite filter and infinite number of digital samples. But for all practical purposes I will follow my test setup for a given application.