Audio speed changer without changing the pitch

Started by Dirichlet 7 years ago3 replieslatest reply 6 years ago292 views


   There are loads of audio speed changer software on the web. These software seems to be able to change audio speed without changing the pitch of human voice (at least not discernible when I play at 2 times original speed).

   How does that work? I thought changing the play speed would cause the  spectrum to shrink or spread, therefore, changing the pitch. 

   Thanks for your help.

  Chuan Huang

[ - ]
Reply by Y(J)SDecember 22, 2017

These methods don't change the sampling rate, and thus don't change the pitch. 

One family of methods exploits the stationarity of speech over short time periods. For example, if you want to reduce the speed by 25 percent, and in a given phoneme there are 8 repetitions of a wave pattern, then carefully removing two of them will do what you want. Of course, you need to mind the pitch period. In practice, the most common methods (e.g., SOLA) perform overlap and add rather than trying to "edit" waveforms. Once again, assuming that you want to reduce the speed by 25 percent, what you need to do is to cut the waveform at some point, and then overlap the next section with the previous one (with proper weighting to keep the energy constant) by 25%. Of course, you need to mind the pitch period to overlap correctly.

Another method relies on extracting pitch and formant information, and replaying the speech keeping the pitch constant. For example, you can encode the speech using a phase vocoder or some LPC method or sinusoidal modeling, and then regenerate the speech.


[ - ]
Reply by DaniloDaraDecember 22, 2017

SOLA is a method that works fine, and is based on individuation of so-called EPOCHS - which are more or less the lowest frequency repetitive time period which comes with hi energy.
SOLA is based on INSERTION of epochs (which makes the tempo lower) or on REMOVAL of epochs (which makes the tempo faster).
On top of this, a PID-kind control is necessary to decide when to hole (add / remove) based on what's going on over the channel(s).

Of course, all the channels must be holed at the same time, it might be not easy, so the PID is actually determining the quality of the result.

Sola is usally running in the frequency domain. But I made some good implementations of Sola in the time domain, as well.

[ - ]
Reply by laurentlefaucheurDecember 22, 2017

Hi, you should look at PSOLA (pitch synchronous overlap-and-add) description (https://en.wikipedia.org/wiki/PSOLA).