TSM Examples

To illustrate some fundamental points, let's look at some TSM waveforms for a test signal consisting of two constant-amplitude sinusoids near 400 Hz having frequencies separated by 10 Hz (to create an amplitude envelope having 10 beats/sec). We will perform a $ 2\times $ time expansion ($ 2\times $ ``slow-down'') using the following three algorithms:

The vocoder STFT frame size is set to 45 ms, the analysis hop size is 1/4 frame, or 11.25 ms, so the analysis frame rate is 89 Hz.

The results are shown in Figures 10.18 through 10.23.

Phase-Continued STFT TSM

Figure 10.18 shows the phase-continued-frames case in which relative phase is not preserved across FFT bins. As a result, the amplitude envelope is not preserved in the time domain within each frame. Figure 10.19 shows the spectrum of the same case, revealing significant distortion products at multiples of the frame rate due to the intra-frame amplitude-envelope distortion, which then ungracefully transitions to the next frame. Note that modulation sidebands corresponding to multiples of the frame rate are common in nonlinearly processed STFTs.

Figure 10.18: Phase-continued vocoder waveforms at 2X expansion.
\includegraphics[width=\twidth]{eps/pv-ellis-wave}

Figure 10.19: Phase-continued vocoder spectra at 2X expansion.
\includegraphics[width=\twidth]{eps/pv-ellis-spec}


Relative-Phase-Preserving STFT TSM

Figure 10.20 shows the relative-phase-preserving (sometimes called ``phase-locked'') vocoder case in which relative phase is preserved across FFT bins. As a result, the amplitude envelope is preserved very well in each frame, and segues from one frame to the next look much better on the envelope level, but now the individual FFT bin frequencies are phase-modulated from frame to frame. Both plots show the same number of beats per second while the overall duration is doubled in the second plot, as desired. Figure 10.21 shows the corresponding spectrum; instead of distortion-modulation on the scale of the frame rate, the spectral distortion looks more broadband--consistent with phase-discontinuities across the entire spectrum from one frame to the next.

Figure 10.20: Phase-locked vocoder waveforms at 2X time expansion.
\includegraphics[width=\twidth]{eps/pv-salsman-wave}

Figure 10.21: Phase-Locked Vocoder Spectra at 2X Time Expansion.
\includegraphics[width=\twidth]{eps/pv-salsman-spec}


SOLA-FS TSM

Finally, Figures 10.22 and 10.23 show the time and frequency domain plots for the SOLA-FS algorithm (a time-domain method). SOLA-type algorithms perform slow-down by repeating frames locally. (In this case, each frame could be repeated once to accomplish the $ 2\times $ slow-down.) They maximize cross-correlation at the ``loop-back'' points in order to minimize discontinuity distortion, but such distortion is always there, though typically attenuated by a cross-fade on the loop-back. We can see twice as many ``carrier cycles'' under each beat, meaning that the beat frequency (amplitude envelope) was not preserved, but neither was it severely distorted in this case. SOLA algorithms tend to work well on speech, but can ``stutter'' when attack transients happen to be repeated. SOLA algorithms should be adjusted to avoid repeating a transient frame; similarly, they should avoid discarding a transient frame when speeding up.

Figure 10.22: SOLA-FS waveforms at 2X expansion.
\includegraphics[width=\twidth]{eps/pv-solafs-wave}

Figure 10.23: SOLA-FS spectra at 2X expansion.
\includegraphics[width=\twidth]{eps/pv-solafs-spec}


Next Section:
Further Reading
Previous Section:
Phase Continuation