DSPRelated.com
Forums

Time Scaling using Phase Vocoder

Started by Rob Vermeulen May 24, 2004
Hello,

I am pretty new to the topic of DSP.  However, I do have some practical
experience with FFT and windowing.
Now what I like to have is some explanation of how to implement a phase
vocoder to create a (real-time) time compress/expand algoritm.
Ofcourse I've been wandering around google for quite some time and I did see
some already-made implementations for the Windows platform. But this does
not do the trick.
Every time I am using somebody else's lib I keep getting serious problems.
This is obvious when you don't understand a byte of code in that lib.
Using commercial ActiveX components that do what I want was not satisfactory
(lincensing mumbo jumbo blah blah) therefore I want to write my own stuff.

Is there anyone who can explain step by step how to build such an algoritm,
without throwing lots of formulae? Or is there a website that can make it
clear to me?
Please note that I do have some (second-hand) experience with FFT.

Thanks in advance!

With kind regards,

Rob Vermeulen
Arbor AudioCommunications B.V.
The Netherlands



Did you try my online tutorials at

  http://www.dspdimension.com/html/timepitch.html
  http://www.dspdimension.com/html/pshiftstft.html

yet? That should get you started. The main page is http://www.dspdimension.com

Regards,
--smb
Hello Stephan,

Yes, your page was one of the first I found.

I translated your DFT a pied article into Dutch in order to explain to my
fellow students how FFT works in theory. Good article!

The information about pvoc got me as far as I am now. I know there's
something like Phase Vocoding, I have a vague idea of how it works but I
still haven't seen the light ;)

I understand the FFT and windowing part. I also know that no modification of
the magnitudes is necessary.  But it stops at the point where I want to
modify the phase part of the analysed fragment.
I read a thesis from Florian Hammer about this subject but I do not have the
mathematical background to completely understand the formulae. Also my
English is not sufficient in order to understand quotes like "heterodyned
phase increment".

Anyway, I am very focussed on understanding the theory behind Phase Vocoders
but it is the practical code implementation of the pvoc that has a bigger
priority. So if someone could explain me step by step how to implement a
time shifter using a Phase Vocoder, maybe it'll bring me closer to
understand the theory.

Don't get me wrong, I do not want somebody to give me complete source code
of it. I definetely want to _know_ how it works.

With kind regards,

Rob Vermeulen

"Stephan M. Bernsee" <stephan.bernsee@web.de> wrote in message
news:38ab652c.0405242309.341a63b7@posting.google.com...
> Did you try my online tutorials at > > http://www.dspdimension.com/html/timepitch.html > http://www.dspdimension.com/html/pshiftstft.html > > yet? That should get you started. The main page is
http://www.dspdimension.com
> > Regards, > --smb
I must have overlooked some of your explanations when I first read the web
pages you pointed out, but now I've read your explanation about pitch
shifting. This does explain a lot to me, especially the part about obtaining
the true frequencies for each bin.

Here comes my true question.... what next ? :-)

I have got my magnitudes and true frequencies and now I want to time stretch
it. That shouldn't be too difficult anymore... Can you give me a hint on how
to achieve this?

Best regards,

Rob

"Stephan M. Bernsee" <stephan.bernsee@web.de> wrote in message
news:38ab652c.0405242309.341a63b7@posting.google.com...
> Did you try my online tutorials at > > http://www.dspdimension.com/html/timepitch.html > http://www.dspdimension.com/html/pshiftstft.html > > yet? That should get you started. The main page is
http://www.dspdimension.com
> > Regards, > --smb
Hi Rob,

I'm a bit short of time right now, so I can't really explain it to you
in full, but I'm pretty sure the subject line will catch Richard
Dobson's attention. He's a long time phase vocoder guru (as are some
others here, like r b-j...;-) ), maybe he can get into more detail.

The code on dspdimension works quite well btw. Airy Andr&#4294967295; has recently
made a time/pitch manipulation plug in out of it that uses the pitch
shifting procedure in combination with a sample rate conversion to
produce time stretching, so you might want to try this.

Regards,
--smb
Hello smb,

>but I'm pretty sure the subject line will catch Richard > Dobson's attention. He's a long time phase vocoder guru (as are some > others here, like r b-j...;-) ), maybe he can get into more detail.
Nope, haven't seen any responses yet.
> The code on dspdimension works quite well btw. Airy Andr&#4294967295; has recently > made a time/pitch manipulation plug in out of it that uses the pitch > shifting procedure in combination with a sample rate conversion to > produce time stretching, so you might want to try this.
Thanks for the hint! This actually works quite well. Only there are two drawbacks: - The (i)FFT algorithm in the pitchshifter isn't lightning fast :-) - My sample rate convertor makes horrible quality. What I do to convert the sample rate is to linear interpolate one buffer into another with the factor oldSR/newSR being equal to OldBufferSize/NewBufferSize. This seems to work well with a 50% and 200% conversion, but all other pecentages sound like an old cassette-player in a huge bathroom ;-) (metal/echo sound). Can you (ore anyone else) give a hint how to improve quality? Best regards, Rob