# pitch scaling and chorus

Started by December 12, 2006
```Hi,

I am trying to implement an algorithm that will pitch-scale an
incomming sound in real-time.
If I try the algorithm that Stephan M. Bernsee did, or follow
indications from other websites, this introduces a chorus-like effect
in the sound. almost everywhere they talk about avoiding the "alvin and
the chipmunks" effect, but that's exactly what I want to do.

Now if I take the same algorithm and don't multiply the true frequency
(in the step between analysis and synthesis) by the scale factor, that
is: I just move the magnitudes to other bins and copy its true
frequency without doing anything on it (so basicaly i'm just copying
the phase wihtout adjusting it), it does exactly what I'm looking for
but there is a kind amplitude modulation effect done on the signal. The
pitch is altered exactly like I want it too (it sounds as if I was
breathing Helium when I'm talking) and there is no chorusing at all.

it makes me think, is the phase vocoder really what I want to use? is
the phase? how do I fix the amplitude modulation?

note that, for a factor of 2, I am moving the magnitude and phase of
bin X to bin X*2.  so there is nothing between X*2 and (X+1)*2. Should
I use interpolation? And what if X*2 > FFTLength/2 ?

Thank you, any help would be appreciated,
Patrick Dumais

```
```Patrick;

I have done the same thing you are looking for with a simple rotating
buffer using different advance rates for the read and write pointers. It's
actually a chorus effect taken to the extreme.  It's a DSP implementation
of the original "tape loop" pitch shifters.

The problem with that approach is how to conceal the discontinuity where
one pointer overtakes the other.  The side effect is a glitch or "warble"
in the sound.  There are several splice techniques that do a fairly good
job of minimizing this depending on the audio program and amount of
shift.

The advantage of this approach is that it is very simple and the harmonics
remain proportional to fundamental (the "chipmonk" sound).

BTW, I have used some FFT-based harmonizers that also had a warble or
strange phasing sound in the shifted audio.  It just isn't possible (yet!)
to remove or insert time in real time.  There will always be some kind of
side effect.
```
```So what I want to do is not "pitch sclaling" then? to be sure: I don't
want to alter time at all, I want to have the same effect as if someone
was breathing Helium when talking into the microphone (it doesn't make
him speak faster).

So the technique you are talking about is a Harmonizer? I will look it

Thank you,
Patrick Dumais

jeff227 wrote:
> Patrick;
>
> I have done the same thing you are looking for with a simple rotating
> buffer using different advance rates for the read and write pointers. It's
> actually a chorus effect taken to the extreme.  It's a DSP implementation
> of the original "tape loop" pitch shifters.
>
> The problem with that approach is how to conceal the discontinuity where
> one pointer overtakes the other.  The side effect is a glitch or "warble"
> in the sound.  There are several splice techniques that do a fairly good
> job of minimizing this depending on the audio program and amount of
> shift.
>
> The advantage of this approach is that it is very simple and the harmonics
> remain proportional to fundamental (the "chipmonk" sound).
>
> BTW, I have used some FFT-based harmonizers that also had a warble or
> strange phasing sound in the shifted audio.  It just isn't possible (yet!)
> to remove or insert time in real time.  There will always be some kind of
> side effect.

```
```Patrick wrote:
> So what I want to do is not "pitch sclaling" then? to be sure: I don't
> want to alter time at all, I want to have the same effect as if someone
> was breathing Helium when talking into the microphone (it doesn't make
> him speak faster).
>
> So the technique you are talking about is a Harmonizer? I will look it
>

As we (computer musicians) understand it, you do indeed want pitch
scaling, as that is what we understand by pitch transposition without
changing duration; and it is precisely what Stephan Sprenger offers as
an example on his dspdimension pages. See also my demo real-time VST
plugin here (based on CARL pvoc):

http://people.bath.ac.uk/masrwd/pvplugs.html

Time scaling would similarly be changing duration without changing pitch.

You will find more extensive pvoc tools in Csound, some by me (not least
the streaming pvoc framework itself with the "fsig" datatype), many more
by Vitor Lazzarini, including a pitch shifter based (IIRC) on the
dspdimension example.

The term "harmoniser" is typically reserved for where one or more pitch
shifts are mixed with the source.

Pitch shifting in this form is a sort-of solved problem except for the
usual prevailing issues:

latency (increases with FFT size)
phase smearing on transients (drums etc), related to T/F tradeoff
CPU cost

Given all that, you may well feel the emphasis should be on "sort-of"
rather than on "solved".

Richard Dobson
```
```Thank you,

the pvptrans is exactly what I'm looking for. Does anyone have more
information on this (theory so that I understand more what I'm doing
instead of just copying the code)

Patrick Dumais

Richard Dobson wrote:
> Patrick wrote:
> > So what I want to do is not "pitch sclaling" then? to be sure: I don't
> > want to alter time at all, I want to have the same effect as if someone
> > was breathing Helium when talking into the microphone (it doesn't make
> > him speak faster).
> >
> > So the technique you are talking about is a Harmonizer? I will look it
> > up, but can you give me more information on this?
> >
>
> As we (computer musicians) understand it, you do indeed want pitch
> scaling, as that is what we understand by pitch transposition without
> changing duration; and it is precisely what Stephan Sprenger offers as
> an example on his dspdimension pages. See also my demo real-time VST
> plugin here (based on CARL pvoc):
>
> http://people.bath.ac.uk/masrwd/pvplugs.html
>
> Time scaling would similarly be changing duration without changing pitch.
>
> You will find more extensive pvoc tools in Csound, some by me (not least
> the streaming pvoc framework itself with the "fsig" datatype), many more
> by Vitor Lazzarini, including a pitch shifter based (IIRC) on the
> dspdimension example.
>
> The term "harmoniser" is typically reserved for where one or more pitch
> shifts are mixed with the source.
>
> Pitch shifting in this form is a sort-of solved problem except for the
> usual prevailing issues:
>
> latency (increases with FFT size)
> phase smearing on transients (drums etc), related to T/F tradeoff
> CPU cost
>
> Given all that, you may well feel the emphasis should be on "sort-of"
> rather than on "solved".
>
> Richard Dobson

```
```Patrick wrote:
> I am trying to implement an algorithm that will pitch-scale an
> incomming sound in real-time.

On the same subject, see the current thread "TTS Pitch/Rate" in
comp.speech.research.

```
```Patrick wrote:
> Thank you,
>
> the pvptrans is exactly what I'm looking for. Does anyone have more
> information on this (theory so that I understand more what I'm doing
> instead of just copying the code)
>
> Patrick Dumais
>

Well, the theory of the whole phase vocoder is rather more than can be
on "phase vocoder"), and the dspdimension pages themselves are as good a
start as any. Look also for papers by Mark Dolson (author of the CARL
pvoc I base my own work on) and Jean Laroche. Miller Puckette (of PD
fame) has also written widely on pvoc.

As for the pitch-shift algorithm itself there is not much to it.
Basically, moving data from  the source bin(s) to the destination
bin(s), with no attempt at peak-detection or tracking. It is probably
best to try to follow the code through and work it out. The dspdimension
example is the simplest, because it uses separate arrays for input and
output. The version I use (by trevor Wishart) overwrites the input
arrays (a big reason for the faster processing) so leaps through an
extra reasonably predictable hoop which I will leave you to discover, as
I am about to go out!

Richard Dobson
```
```OK,
I looked at the code for pvptrans and it does phase correction. I'm a
little confused since you said all I had to do is to copy the phase and
magnitude from one bin to another.

If I look at an algorithm like this one:
http://www.s3.kth.se/signal/edu/projekt/students/01/blue/finalreport/finalreport.html
They multiply the true frequency by the scale factor when moving to
another bin. But if you listen to the sound samples provided, it
doesn't sound very good, it's got a strong chorus effect.
What is the difference between this algorithm and the one from pvptran?

Maybe I just don't understand the whole process. Correct me if I'm
wrong:
1)I have to window the data
2) FFT
3)I have to find the true frequencies and magnitudes of each bin
4)do some processing
5)find the phases back from the true frequencies
6)IFFT

now this is a phase vocoder right?  and now to do pitch scaling, it all
happens in step 4, this is where I move the data from one bin to
another (without changing the values). No interpolation is needed and I
just ignore bins that falls beyond 1/2 the FFT size

Thank you,
Patrick Dumais

Richard Dobson wrote:
> Patrick wrote:
> > Thank you,
> >
> > the pvptrans is exactly what I'm looking for. Does anyone have more
> > information on this (theory so that I understand more what I'm doing
> > instead of just copying the code)
> >
> > Patrick Dumais
> >
>
> Well, the theory of the whole phase vocoder is rather more than can be
> on "phase vocoder"), and the dspdimension pages themselves are as good a
> start as any. Look also for papers by Mark Dolson (author of the CARL
> pvoc I base my own work on) and Jean Laroche. Miller Puckette (of PD
> fame) has also written widely on pvoc.
>
> As for the pitch-shift algorithm itself there is not much to it.
> Basically, moving data from  the source bin(s) to the destination
> bin(s), with no attempt at peak-detection or tracking. It is probably
> best to try to follow the code through and work it out. The dspdimension
> example is the simplest, because it uses separate arrays for input and
> output. The version I use (by trevor Wishart) overwrites the input
> arrays (a big reason for the faster processing) so leaps through an
> extra reasonably predictable hoop which I will leave you to discover, as
> I am about to go out!
>
> Richard Dobson

```
```Patrick wrote:

> 1)I have to window the data
> 2) FFT
> 3)I have to find the true frequencies and magnitudes of each bin

no, you have to determine the accurate frequencies of each sinusoidal
component.  a single sinusoidal component, when windowed, will occupy
several FFT bins.

> 4)do some processing

whatever that means.

> 5)find the phases back from the true frequencies

i think what you want to do is adjust the phases of each sinusoidal
component in the current frame so that when overla-added to the
corresponding sinusoidal component of the previous frame that they are
phase aligned and no spurious null occurs when overlap-adding.

> 6)IFFT
>
> now this is a phase vocoder right?

close.

>  and now to do pitch scaling, it all
> happens in step 4, this is where I move the data from one bin to
> another (without changing the values).

you have to move all of the adjacent bins of a particular sinusoidal
component from the location where they originally are to where they
would be if the component happened to be at the frequency you want it
to be.  do not stretch out or shrink this group of adjacent bins.  just
move them.

> No interpolation is needed

some interpolation is needed unless you are accepting some frequency
quantization error.  if it turned out that you were sliding

> and I just ignore bins that falls beyond 1/2 the FFT size

i guess that's better than aliasing them.  that's only a problem for
upshifting.

but, if you're upshifting, you'll have some gaps to fill and in
downshifting, you have the opposite problem: overlapping in the
frequency domain.

r b-j

```
```robert bristow-johnson wrote:
> Patrick wrote:
>
>
>>1)I have to window the data
>>2) FFT
>>3)I have to find the true frequencies and magnitudes of each bin
>
>
> no, you have to determine the accurate frequencies of each sinusoidal
> component.  a single sinusoidal component, when windowed, will occupy
> several FFT bins.
>
>

That would be true of the more elaborate peak-tracking vocoders
(McAualay/Quatieri, SNDAN, CLAM etc), but not of the "naive" pitch
shifters used by Stephan's and my code. Those are deliberately meant to
be simple demos of the process, and actually work pretty well, not least
for the arbitrarily complex sounds we like to subject to pvoc
transformations. They are by no means confined to single pitched tones
where identifying peaks is a relative no-brainer.

In a nutshell, classic pvoc does:

for each overlapped block:
in-->window-->FFT-->mag/phase-->mag/freq
(retaining running phase each frame, so we can update the derived
frequency contents of each bin, each frame)

-- process the frame ad lib  e.g. pitch shift

mag/freq-->mag/phase-->IFFT-->window

The pitch shift is applied in the processing  stage, simply by scaling
the frequency values of each bin and moving them to the require new bin
position as necessary. Yes, it is far from ideal (there are many papers
around discussing refinements of the method, to deal with the inevitable
phasing errors, which do indeed identify peaks), but surprisingly, this
"naive" method does work, though progressively worse of course for
large-interval shifts.

Most of the sophisticated peak-tracking vocoders rely on offline
processing, so they casn scan the whole data multiple times if
necessary; whereas these naive shifters are used for real-time streaming
effects where we certainly cannot look ahead, and want to have to
remember as little of past data as possible.

I should also add that composers/programmers are notorious for "hacking"
pvoc frames in ways that would no doubt horrify dsp engineers, but which
almost always produce musically interesting results.

Richard Dobson
```