Forums

Fast (real-time) time stretch code

Started by R May 29, 2008
I am looking for code for slowing down music without altering pitch.

If anyone is interested, here are some programs that are designed for
doing this for music transcription. A couple examples here with free
trials:

Amazing Slow Downer:  http://www.ronimusic.com/ 
Transcribe!: http://www.seventhstring.com/
Slow Gold: http://www.worldwidewoodshed.com/products.htm

I'm looking to do something similar (for a non-commercial app) but
with different features.

I think that some softer (Transcribe! above) does this via Fourier
transform. Is that the best way to achieve this without reentrant
glitches?
first of all, usually for a "real-time" process we mean that the same
amount of time going in is what comes out.  time-scaling or resampled
(and pitch-shifted) sounds have a different number samples going in
than coming out (with the same time assumed between samples).  if such
were running real-time, that means it could be started and running
real-time for an indefinitely long period of time.  in that case, you
would hit the end of a buffer on one side or the other.

On May 29, 5:01 am, R <R...@nospam.com> wrote:
> I am looking for code for slowing down music without altering pitch.
as you're slowing it down, are your samples coming from disk (or whatever source) repeating over some regions?
> > If anyone is interested, here are some programs that are designed for > doing this for music transcription. A couple examples here with free > trials: > > Amazing Slow Downer: http://www.ronimusic.com/ > Transcribe!:http://www.seventhstring.com/ > Slow Gold:http://www.worldwidewoodshed.com/products.htm > > I'm looking to do something similar (for a non-commercial app) but > with different features. > > I think that some softer (Transcribe! above) does this via Fourier > transform. Is that the best way to achieve this without reentrant > glitches?
it's about the only way to do it for broadbanded, full-mix audio. if the source was a monophonic instrument that plays single notes at a time, then time-scaling can be accomplished by staying in the time- domain and making judicious use of splicing. but if your audio has all sorts of harmonically unrelated frequency components, doing this in the time domain might result in a splice where not all harmonic components are spliced in phase. when some splice has some frequency component that is 180 degrees out of phase when spliced, that's gonna sound pretty bad. r b-j
On Thu, 29 May 2008 14:35:49 -0700 (PDT), robert bristow-johnson
<rbj@audioimagination.com> wrote:

> >first of all, usually for a "real-time" process we mean that the same >amount of time going in is what comes out.
I meant response time. I could probably pre-crunch an entire MP3 file, but I was hoping to process designated segments on the fly.
>On May 29, 5:01 am, R <R...@nospam.com> wrote: >> I am looking for code for slowing down music without altering pitch. > >as you're slowing it down, are your samples coming from disk (or >whatever source) repeating over some regions?
Yes, samples coming from disk. Yes, an option to keep replaying the segment, if that's what you meant. Usually it would entail slow down rather than speed up.
>> If anyone is interested, here are some programs that are designed for >> doing this for music transcription. A couple examples here with free >> trials: >> >> Amazing Slow Downer: http://www.ronimusic.com/ >> Transcribe!:http://www.seventhstring.com/ >> Slow Gold:http://www.worldwidewoodshed.com/products.htm >> >> I'm looking to do something similar (for a non-commercial app) but >> with different features. >> >> I think that some softer (Transcribe! above) does this via Fourier >> transform. Is that the best way to achieve this without reentrant >> glitches? > >it's about the only way to do it for broadbanded, full-mix audio. if >the source was a monophonic instrument that plays single notes at a >time, then time-scaling can be accomplished by staying in the time- >domain and making judicious use of splicing.
Yeah, I understand the splice problem. I've seen some older hardware "Harmonizers" that try to make intelligent decisions on splice points, but that's tough with more than one or two simultaneous pitches. And as you point out, overtones won't necessarily align even with monophonic sources. But even the old hardware was not that bad sometimes. Guitarists and keyboard players used those for playing root+5th or even full chords at times. You've probably heard the result on recordings. Still, if it can be done via FFT without having to crunch overnight, that would be preferable. The sound doesn't have to be hifi, but best that it's free of distracting pulsing or harsh artifacts. So...any code available for doing this? Pref in C/C++ so I could get it running on a Windows machine.
On May 29, 11:48&#2013266080;pm, R <R...@nospam.com> wrote:
> On Thu, 29 May 2008 14:35:49 -0700 (PDT), robert bristow-johnson > > <r...@audioimagination.com> wrote: > > >first of all, usually for a "real-time" process we mean that the same > >amount of time going in is what comes out. > > I meant response time.
i think you meant the computation is efficient enough that if you can play back the audio file either faster or slower from the disk, and the algorithm output doesn't fall behind more than a known and bounded delay.
> > >as you're slowing it down, are your samples coming from disk (or > >whatever source) repeating over some regions? > > Yes, samples coming from disk. Yes, an option to keep replaying the > segment, if that's what you meant.
yeah, if you're slowing it down, you would have to repeat segments in some manner. and if you were speeding it up, you would be omitting some segments. (discounting any cross-fading in the splices.)
> > >> I think that some softer (Transcribe! above) does this via Fourier > >> transform. Is that the best way to achieve this without reentrant > >> glitches? > > >it's about the only way to do it for broadbanded, full-mix audio. &#2013266080;if > >the source was a monophonic instrument that plays single notes at a > >time, then time-scaling can be accomplished by staying in the time- > >domain and making judicious use of splicing. > > Yeah, I understand the splice problem. I've seen some older hardware > "Harmonizers" that try to make intelligent decisions on splice points, > but that's tough with more than one or two simultaneous pitches.
it depends on the relationship between pitches. playing a heavy power chord (fifth and major third) should not sound so bad.
> And > as you point out, overtones won't necessarily align even with > monophonic sources.
did i say that?? (i have to check.) for *harmonic* monophonic sources, you should nearly always be able to find a splice length that makes all of the harmonic overtones happy. if they slightly detune at the weaker very high harmonics, those splices won't be particularly noticible.
> But even the old hardware was not that bad > sometimes. Guitarists and keyboard players used those for playing > root+5th or even full chords at times. You've probably heard the > result on recordings.
yeah. actually i was in on the pitch-shifting algs on one of the Eventide Harmonizer models. and my point above is even more true (that some polyphonic input to a time-domain pitch shifter can come out very good, depending on what the notes are) for chords that are just fifths and no third. those pitch-shift fine. easy. (think of the tonic and it's fifth as being the 2nd and 3rd harmonic of a common fundamental that doesn't necessarily have any energy at the fundamental. then it's a periodic function. sorta.)
> Still, if it can be done via FFT without having to crunch overnight, > that would be preferable. The sound doesn't have to be hifi, but best > that it's free of distracting pulsing or harsh artifacts.
there a bunch of products. SoundToys (or Wave Mechanics) SPEED, Serato Pitch 'n Time.
> So...any code available for doing this? Pref in C/C++ so I could get > it running on a Windows machine.
it's not too hard to write a simple phase vocoder. i'm not gonna send you any code nor tell you any tricks that make it sound better than what you might get from a published alg (like Laroche or Puckette). that's not too hard. do you have your file and sound I/O worked out? r b-j
On Thu, 29 May 2008 22:02:29 -0700 (PDT), robert bristow-johnson
<rbj@audioimagination.com> wrote:

>On May 29, 11:48&#2013266080;pm, R <R...@nospam.com> wrote: >> On Thu, 29 May 2008 14:35:49 -0700 (PDT), robert bristow-johnson >> >> <r...@audioimagination.com> wrote: >> >> >first of all, usually for a "real-time" process we mean that the same >> >amount of time going in is what comes out. >> >> I meant response time. > >i think you meant the computation is efficient enough that if you can >play back the audio file either faster or slower from the disk, and >the algorithm output doesn't fall behind more than a known and bounded >delay.
Ahem...
>> >as you're slowing it down, are your samples coming from disk (or >> >whatever source) repeating over some regions? >> >> Yes, samples coming from disk. Yes, an option to keep replaying the >> segment, if that's what you meant. > >yeah, if you're slowing it down, you would have to repeat segments in >some manner. and if you were speeding it up, you would be omitting >some segments. (discounting any cross-fading in the splices.)
OK--that's obvious. I thought maybe you were suggesting that caching a preprocessed version of the audio file would be more efficient if it were to be played multiple times. Which is a good thought, because it probably would be looped.
>> And >> as you point out, overtones won't necessarily align even with >> monophonic sources. > >did i say that?? (i have to check.)
Oh, maybe you didn't. I had tried to find info via Google archives of this group. Maybe that was from one of those posts.
>> But even the old hardware was not that bad >> sometimes. Guitarists and keyboard players used those for playing >> root+5th or even full chords at times. You've probably heard the >> result on recordings. > >yeah. actually i was in on the pitch-shifting algs on one of the >Eventide Harmonizer models.
No kidding. That's what I had in mind when referring to splice algs, etc. You probably worked on the newer versions, so you would have had access to some serious DSP power. The older ones used bit-slice processors. Some had options for a primitive secondary channel that assisted in finding splice points.
>> Still, if it can be done via FFT without having to crunch overnight, >> that would be preferable. The sound doesn't have to be hifi, but best >> that it's free of distracting pulsing or harsh artifacts. > >there a bunch of products. SoundToys (or Wave Mechanics) SPEED, >Serato Pitch 'n Time.
I wasn't looking for a pre-written program, but I've been wondering whether I'd be further ahead learning how to host a VST plugin.
>> So...any code available for doing this? Pref in C/C++ so I could get >> it running on a Windows machine. > >it's not too hard to write a simple phase vocoder. i'm not gonna send >you any code nor tell you any tricks that make it sound better than >what you might get from a published alg (like Laroche or Puckette). >that's not too hard. do you have your file and sound I/O worked out?
No problem writing file or sound IO. Or the UI for that matter. I've done enough of that. I was just looking for a start on the time stretch code. The idea is mostly for a quick rehearsal/transcription tool, so I didn't want to get too deep into piles of DSP books. I'll look for the Laroche and Puckette algorithms (thanks for the lead). Maybe if I'm lucky, someone has posted some working C or C++ code.
On May 30, 2:42 am, R <R...@nospam.com> wrote:
> On Thu, 29 May 2008 22:02:29 -0700 (PDT), robert bristow-johnson > > <r...@audioimagination.com> wrote: > >On May 29, 11:48 pm, R <R...@nospam.com> wrote: > >> On Thu, 29 May 2008 14:35:49 -0700 (PDT), robert bristow-johnson > > >> <r...@audioimagination.com> wrote: > > >> >first of all, usually for a "real-time" process we mean that the same > >> >amount of time going in is what comes out. > > >> I meant response time. > > >i think you meant the computation is efficient enough that if you can > >play back the audio file either faster or slower from the disk, and > >the algorithm output doesn't fall behind more than a known and bounded > >delay. > > Ahem...
not sure what you mean here.
> >> >as you're slowing it down, are your samples coming from disk (or > >> >whatever source) repeating over some regions? > > >> Yes, samples coming from disk. Yes, an option to keep replaying the > >> segment, if that's what you meant. > > >yeah, if you're slowing it down, you would have to repeat segments in > >some manner. and if you were speeding it up, you would be omitting > >some segments. (discounting any cross-fading in the splices.) > > OK--that's obvious. I thought maybe you were suggesting that caching a > preprocessed version of the audio file would be more efficient if it > were to be played multiple times.
no i meant that whatever the process is, at least in audio DSP processing that the process can handle processing the input to the output without falling farther and farther behind. that's all i mean when i think of "real-time". for other disciplines, there are additional requirements, but not audio DSP. oddly, even though i have zero contribution to the comp.dsp FAQ, for some odd luck i got to contribute to the comp.realtime FAQ to the point of contributing to the definition. http://www.faqs.org/faqs/realtime-computing/faq/ "In a real-time DSP process, the analyzed (input) and/or generated (output) samples (whether they are grouped together in large segments or processed individually) can be processed (or generated) continuously in the time it takes to input and/or output the same set of samples independent of the processing delay. "Consider an audio DSP example: if a process requires 2.01 seconds to analyze or process 2.00 seconds of sound, it is not real-time. If it takes 1.99 seconds, it is (or can be made into) a real-time DSP process. "A common life example I like to make is standing in a line (or queue) waiting for the checkout in a grocery store. If the line asymtotically grows longer and longer without bound, the checkout process is not real- time. If the length of the line is bounded, customers are being 'processed' and outputted as rapidly, on average, as they are being inputted and that process *is* real-time. The grocer might go out of business or must at least lose business if he/she cannot make his/her checkout process real-time (so it's fundamentally important that this process be real-time)."
> >> But even the old hardware was not that bad > >> sometimes. Guitarists and keyboard players used those for playing > >> root+5th or even full chords at times. You've probably heard the > >> result on recordings. > > >yeah. actually i was in on the pitch-shifting algs on one of the > >Eventide Harmonizer models. > > No kidding. That's what I had in mind when referring to splice algs, > etc. You probably worked on the newer versions, so you would have had > access to some serious DSP power. The older ones used bit-slice > processors. Some had options for a primitive secondary channel that > assisted in finding splice points.
i worked on the DSP4000 which had some later spins. they told me that some of my algs survived to the later modes. i did not work on the classic H3000 nor the SP2016 or similar. i thought that the bit-slice (AMD2900 series) was just the SP2016. the H3000 was that old crappy 16-bit TI DSP (3 of 'em).
> >> Still, if it can be done via FFT without having to crunch overnight, > >> that would be preferable. The sound doesn't have to be hifi, but best > >> that it's free of distracting pulsing or harsh artifacts. > > >there a bunch of products. SoundToys (or Wave Mechanics) SPEED, > >Serato Pitch 'n Time. > > I wasn't looking for a pre-written program, but I've been wondering > whether I'd be further ahead learning how to host a VST plugin. >
maybe. i've never done VST, but i think that i shoulda learnt how to.
> >> So...any code available for doing this? Pref in C/C++ so I could get > >> it running on a Windows machine. > > >it's not too hard to write a simple phase vocoder. i'm not gonna send > >you any code nor tell you any tricks that make it sound better than > >what you might get from a published alg (like Laroche or Puckette). > >that's not too hard. do you have your file and sound I/O worked out? > > No problem writing file or sound IO. Or the UI for that matter. I've > done enough of that. I was just looking for a start on the time > stretch code. The idea is mostly for a quick rehearsal/transcription > tool, so I didn't want to get too deep into piles of DSP books. I'll > look for the Laroche and Puckette algorithms (thanks for the lead). > Maybe if I'm lucky, someone has posted some working C or C++ code.
check out the music-dsp archive. maybe there. R, send me a decent email address. i'll see if i can find some old program that might run on Matlab or Octave for you. it's from a paper i did in 2001 so it's just proof of concept, slow, and not ready for prime-time in any product. if you can translate that to C (maybe get a decent FFT routine in C, perhaps FFTW), you'll have something to start with. r b-j