Reply by Richard Dobson December 14, 20062006-12-14
robert bristow-johnson wrote:
> Patrick wrote: > > >>1)I have to window the data >>2) FFT >>3)I have to find the true frequencies and magnitudes of each bin > > > no, you have to determine the accurate frequencies of each sinusoidal > component. a single sinusoidal component, when windowed, will occupy > several FFT bins. > >
That would be true of the more elaborate peak-tracking vocoders (McAualay/Quatieri, SNDAN, CLAM etc), but not of the "naive" pitch shifters used by Stephan's and my code. Those are deliberately meant to be simple demos of the process, and actually work pretty well, not least for the arbitrarily complex sounds we like to subject to pvoc transformations. They are by no means confined to single pitched tones where identifying peaks is a relative no-brainer. In a nutshell, classic pvoc does: for each overlapped block: in-->window-->FFT-->mag/phase-->mag/freq (retaining running phase each frame, so we can update the derived frequency contents of each bin, each frame) -- process the frame ad lib e.g. pitch shift mag/freq-->mag/phase-->IFFT-->window The pitch shift is applied in the processing stage, simply by scaling the frequency values of each bin and moving them to the require new bin position as necessary. Yes, it is far from ideal (there are many papers around discussing refinements of the method, to deal with the inevitable phasing errors, which do indeed identify peaks), but surprisingly, this "naive" method does work, though progressively worse of course for large-interval shifts. Most of the sophisticated peak-tracking vocoders rely on offline processing, so they casn scan the whole data multiple times if necessary; whereas these naive shifters are used for real-time streaming effects where we certainly cannot look ahead, and want to have to remember as little of past data as possible. I should also add that composers/programmers are notorious for "hacking" pvoc frames in ways that would no doubt horrify dsp engineers, but which almost always produce musically interesting results. Richard Dobson
Reply by robert bristow-johnson December 14, 20062006-12-14
Patrick wrote:

> 1)I have to window the data > 2) FFT > 3)I have to find the true frequencies and magnitudes of each bin
no, you have to determine the accurate frequencies of each sinusoidal component. a single sinusoidal component, when windowed, will occupy several FFT bins.
> 4)do some processing
whatever that means.
> 5)find the phases back from the true frequencies
i think what you want to do is adjust the phases of each sinusoidal component in the current frame so that when overla-added to the corresponding sinusoidal component of the previous frame that they are phase aligned and no spurious null occurs when overlap-adding.
> 6)IFFT > 7)overlap-add > > now this is a phase vocoder right?
close.
> and now to do pitch scaling, it all > happens in step 4, this is where I move the data from one bin to > another (without changing the values).
you have to move all of the adjacent bins of a particular sinusoidal component from the location where they originally are to where they would be if the component happened to be at the frequency you want it to be. do not stretch out or shrink this group of adjacent bins. just move them.
> No interpolation is needed
some interpolation is needed unless you are accepting some frequency quantization error. if it turned out that you were sliding
> and I just ignore bins that falls beyond 1/2 the FFT size
i guess that's better than aliasing them. that's only a problem for upshifting. but, if you're upshifting, you'll have some gaps to fill and in downshifting, you have the opposite problem: overlapping in the frequency domain. r b-j
Reply by Patrick December 13, 20062006-12-13
OK,
I looked at the code for pvptrans and it does phase correction. I'm a
little confused since you said all I had to do is to copy the phase and
magnitude from one bin to another.

If I look at an algorithm like this one:
http://www.s3.kth.se/signal/edu/projekt/students/01/blue/finalreport/finalreport.html
They multiply the true frequency by the scale factor when moving to
another bin. But if you listen to the sound samples provided, it
doesn't sound very good, it's got a strong chorus effect.
What is the difference between this algorithm and the one from pvptran?

Maybe I just don't understand the whole process. Correct me if I'm
wrong:
1)I have to window the data
2) FFT
3)I have to find the true frequencies and magnitudes of each bin
4)do some processing
5)find the phases back from the true frequencies
6)IFFT
7)overlap-add

now this is a phase vocoder right?  and now to do pitch scaling, it all
happens in step 4, this is where I move the data from one bin to
another (without changing the values). No interpolation is needed and I
just ignore bins that falls beyond 1/2 the FFT size

Thank you,
Patrick Dumais


Richard Dobson wrote:
> Patrick wrote: > > Thank you, > > > > the pvptrans is exactly what I'm looking for. Does anyone have more > > information on this (theory so that I understand more what I'm doing > > instead of just copying the code) > > > > Patrick Dumais > > > > Well, the theory of the whole phase vocoder is rather more than can be > outlined here, but there is shedloads about it on the net (just google > on "phase vocoder"), and the dspdimension pages themselves are as good a > start as any. Look also for papers by Mark Dolson (author of the CARL > pvoc I base my own work on) and Jean Laroche. Miller Puckette (of PD > fame) has also written widely on pvoc. > > As for the pitch-shift algorithm itself there is not much to it. > Basically, moving data from the source bin(s) to the destination > bin(s), with no attempt at peak-detection or tracking. It is probably > best to try to follow the code through and work it out. The dspdimension > example is the simplest, because it uses separate arrays for input and > output. The version I use (by trevor Wishart) overwrites the input > arrays (a big reason for the faster processing) so leaps through an > extra reasonably predictable hoop which I will leave you to discover, as > I am about to go out! > > Richard Dobson
Reply by Richard Dobson December 13, 20062006-12-13
Patrick wrote:
> Thank you, > > the pvptrans is exactly what I'm looking for. Does anyone have more > information on this (theory so that I understand more what I'm doing > instead of just copying the code) > > Patrick Dumais >
Well, the theory of the whole phase vocoder is rather more than can be outlined here, but there is shedloads about it on the net (just google on "phase vocoder"), and the dspdimension pages themselves are as good a start as any. Look also for papers by Mark Dolson (author of the CARL pvoc I base my own work on) and Jean Laroche. Miller Puckette (of PD fame) has also written widely on pvoc. As for the pitch-shift algorithm itself there is not much to it. Basically, moving data from the source bin(s) to the destination bin(s), with no attempt at peak-detection or tracking. It is probably best to try to follow the code through and work it out. The dspdimension example is the simplest, because it uses separate arrays for input and output. The version I use (by trevor Wishart) overwrites the input arrays (a big reason for the faster processing) so leaps through an extra reasonably predictable hoop which I will leave you to discover, as I am about to go out! Richard Dobson
Reply by Jerry Wolf December 13, 20062006-12-13
Patrick wrote:
> I am trying to implement an algorithm that will pitch-scale an > incomming sound in real-time.
On the same subject, see the current thread "TTS Pitch/Rate" in comp.speech.research.
Reply by Patrick December 13, 20062006-12-13
Thank you,

the pvptrans is exactly what I'm looking for. Does anyone have more
information on this (theory so that I understand more what I'm doing
instead of just copying the code)

Patrick Dumais

Richard Dobson wrote:
> Patrick wrote: > > So what I want to do is not "pitch sclaling" then? to be sure: I don't > > want to alter time at all, I want to have the same effect as if someone > > was breathing Helium when talking into the microphone (it doesn't make > > him speak faster). > > > > So the technique you are talking about is a Harmonizer? I will look it > > up, but can you give me more information on this? > > > > As we (computer musicians) understand it, you do indeed want pitch > scaling, as that is what we understand by pitch transposition without > changing duration; and it is precisely what Stephan Sprenger offers as > an example on his dspdimension pages. See also my demo real-time VST > plugin here (based on CARL pvoc): > > http://people.bath.ac.uk/masrwd/pvplugs.html > > Time scaling would similarly be changing duration without changing pitch. > > You will find more extensive pvoc tools in Csound, some by me (not least > the streaming pvoc framework itself with the "fsig" datatype), many more > by Vitor Lazzarini, including a pitch shifter based (IIRC) on the > dspdimension example. > > The term "harmoniser" is typically reserved for where one or more pitch > shifts are mixed with the source. > > Pitch shifting in this form is a sort-of solved problem except for the > usual prevailing issues: > > latency (increases with FFT size) > Time/Freq tradeoff > phase smearing on transients (drums etc), related to T/F tradeoff > CPU cost > > Given all that, you may well feel the emphasis should be on "sort-of" > rather than on "solved". > > Richard Dobson
Reply by Richard Dobson December 13, 20062006-12-13
Patrick wrote:
> So what I want to do is not "pitch sclaling" then? to be sure: I don't > want to alter time at all, I want to have the same effect as if someone > was breathing Helium when talking into the microphone (it doesn't make > him speak faster). > > So the technique you are talking about is a Harmonizer? I will look it > up, but can you give me more information on this? >
As we (computer musicians) understand it, you do indeed want pitch scaling, as that is what we understand by pitch transposition without changing duration; and it is precisely what Stephan Sprenger offers as an example on his dspdimension pages. See also my demo real-time VST plugin here (based on CARL pvoc): http://people.bath.ac.uk/masrwd/pvplugs.html Time scaling would similarly be changing duration without changing pitch. You will find more extensive pvoc tools in Csound, some by me (not least the streaming pvoc framework itself with the "fsig" datatype), many more by Vitor Lazzarini, including a pitch shifter based (IIRC) on the dspdimension example. The term "harmoniser" is typically reserved for where one or more pitch shifts are mixed with the source. Pitch shifting in this form is a sort-of solved problem except for the usual prevailing issues: latency (increases with FFT size) Time/Freq tradeoff phase smearing on transients (drums etc), related to T/F tradeoff CPU cost Given all that, you may well feel the emphasis should be on "sort-of" rather than on "solved". Richard Dobson
Reply by Patrick December 13, 20062006-12-13
So what I want to do is not "pitch sclaling" then? to be sure: I don't
want to alter time at all, I want to have the same effect as if someone
was breathing Helium when talking into the microphone (it doesn't make
him speak faster).

So the technique you are talking about is a Harmonizer? I will look it
up, but can you give me more information on this?

Thank you,
Patrick Dumais


jeff227 wrote:
> Patrick; > > I have done the same thing you are looking for with a simple rotating > buffer using different advance rates for the read and write pointers. It's > actually a chorus effect taken to the extreme. It's a DSP implementation > of the original "tape loop" pitch shifters. > > The problem with that approach is how to conceal the discontinuity where > one pointer overtakes the other. The side effect is a glitch or "warble" > in the sound. There are several splice techniques that do a fairly good > job of minimizing this depending on the audio program and amount of > shift. > > The advantage of this approach is that it is very simple and the harmonics > remain proportional to fundamental (the "chipmonk" sound). > > BTW, I have used some FFT-based harmonizers that also had a warble or > strange phasing sound in the shifted audio. It just isn't possible (yet!) > to remove or insert time in real time. There will always be some kind of > side effect.
Reply by jeff227 December 13, 20062006-12-13
Patrick;

I have done the same thing you are looking for with a simple rotating
buffer using different advance rates for the read and write pointers. It's
actually a chorus effect taken to the extreme.  It's a DSP implementation
of the original "tape loop" pitch shifters.

The problem with that approach is how to conceal the discontinuity where
one pointer overtakes the other.  The side effect is a glitch or "warble"
in the sound.  There are several splice techniques that do a fairly good
job of minimizing this depending on the audio program and amount of
shift.

The advantage of this approach is that it is very simple and the harmonics
remain proportional to fundamental (the "chipmonk" sound).

BTW, I have used some FFT-based harmonizers that also had a warble or
strange phasing sound in the shifted audio.  It just isn't possible (yet!)
to remove or insert time in real time.  There will always be some kind of
side effect.
Reply by Patrick December 12, 20062006-12-12
Hi,

I am trying to implement an algorithm that will pitch-scale an
incomming sound in real-time.
If I try the algorithm that Stephan M. Bernsee did, or follow
indications from other websites, this introduces a chorus-like effect
in the sound. almost everywhere they talk about avoiding the "alvin and
the chipmunks" effect, but that's exactly what I want to do.

Now if I take the same algorithm and don't multiply the true frequency
(in the step between analysis and synthesis) by the scale factor, that
is: I just move the magnitudes to other bins and copy its true
frequency without doing anything on it (so basicaly i'm just copying
the phase wihtout adjusting it), it does exactly what I'm looking for
but there is a kind amplitude modulation effect done on the signal. The
pitch is altered exactly like I want it too (it sounds as if I was
breathing Helium when I'm talking) and there is no chorusing at all.

it makes me think, is the phase vocoder really what I want to use? is
it supposed to introduce such chorusing? and what about not adjusting
the phase? how do I fix the amplitude modulation?

note that, for a factor of 2, I am moving the magnitude and phase of
bin X to bin X*2.  so there is nothing between X*2 and (X+1)*2. Should
I use interpolation? And what if X*2 > FFTLength/2 ?

Thank you, any help would be appreciated,
Patrick Dumais