comp.dsp | PitchShift using STFT

Hi All!

Greetings!

I was implementing pitch shift using STFT (the algorithm that Mr.
Bernsee discusses at his site "dspdimension"). Its working absolutely
fine but as I take the semitone to 12 which yields a pitching factor or
2.0 (one octave up), the output sounds like somewhat of "vibrato" added
to it. Its not that clean. If you pitch shift the same file using
Audacity at semitone value of 12, its much cleaner and the difference
is remarkable.
I am not sure why this is so.

I am using fftw library for fft.

Any clue to hunt it down?

Thanks and regards
--Himanshu

Reply by robert bristow-johnson ●August 11, 20052005-08-11

in article 1123755895.306005.267110@g47g2000cwa.googlegroups.com, Himanshu
at hs.chauhan@gmail.com wrote on 08/11/2005 07:01:

> I was implementing pitch shift using STFT (the algorithm that Mr.
> Bernsee discusses at his site "dspdimension"). Its working absolutely
> fine but as I take the semitone to 12 which yields a pitching factor or
> 2.0 (one octave up), the output sounds like somewhat of "vibrato" added
> to it. Its not that clean. If you pitch shift the same file using
> Audacity at semitone value of 12, its much cleaner and the difference
> is remarkable.
> I am not sure why this is so.

i haven't look though Stephan's code, but this sounds to me like a phase
issue between frames.  what are you doing to 1. identify difference
frequency components and 2. to glue the different frequency components
together across the frame boundaries?  these are the two most difficult
operations, IMO, of the frequency domain method.

-- 

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."

Reply by FA ●August 13, 20052005-08-13

Hi All,
Stephan's demo program is basically a phase vocoder and this artefact is to
be expected using a phase vocoder - shifting down an octave works fine BTW
:-)

Actually if you go back to DSP Dimension now you will find that Stephan has
released a free (but not open source) library called Dirac that uses a far
more sophisticated multiresolution approach. I haven't tried it yet but it
seems to have rave reviews - and Stephan certainly appears to be a bit of a
guru in this field.

I believe that Dirac is based on wavelets but he hasn't released many
technical details - don't blame him mind :-)

Mind you given that he's using wavelets I'd love to know how he is dealing
with aliasing - I was tinkering a bit back using McAuly-Quatieri sinusoidal
modelling with wavelets and couldn't get the aliasing nailed - my current
foray into pitch shifting is using the idea proposed by Scott Levine and
Tony Verma using Laplacian Pyramid filters with an antialiasing kernel
instead of wavelets - similar idea, different filter - the laplacian pylamid
isn't critically sampled but the extra CPU cost is minimal (most of it is in
the resynthesis anyway)

I suspect Audacity uses PSOLA - Pitch Synchronous Overlap Add which is a
time domain technique that works well for speech and monophonic sources but
isn't so great for polyphonic sources - however more sophisticated things
like MQ analysis give much greater flexibility in the sort of effects that
you can have, for instance I'm currently experimenting with sound source
separation to enable such things as splitting notes between the left and
right channels and doing "interesting" panning effects.

Cheers,
Fraser.

robert bristow-johnson wrote in message ...
>in article 1123755895.306005.267110@g47g2000cwa.googlegroups.com, Himanshu
>at hs.chauhan@gmail.com wrote on 08/11/2005 07:01:
>
>> I was implementing pitch shift using STFT (the algorithm that Mr.
>> Bernsee discusses at his site "dspdimension"). Its working absolutely
>> fine but as I take the semitone to 12 which yields a pitching factor or
>> 2.0 (one octave up), the output sounds like somewhat of "vibrato" added
>> to it. Its not that clean. If you pitch shift the same file using
>> Audacity at semitone value of 12, its much cleaner and the difference
>> is remarkable.
>> I am not sure why this is so.
>
>i haven't look though Stephan's code, but this sounds to me like a phase
>issue between frames.  what are you doing to 1. identify difference
>frequency components and 2. to glue the different frequency components
>together across the frame boundaries?  these are the two most difficult
>operations, IMO, of the frequency domain method.
>
>--
>
>r b-j                  rbj@audioimagination.com
>
>"Imagination is more important than knowledge."
>
>

Reply by Himanshu ●August 13, 20052005-08-13

Hi!

I am a newbie in DSP. I was trying to understand what it was doing. I
think I understood everything but one. How does an overlapping of
frames help? 

Regards
--Himanshu

Reply by robert bristow-johnson ●August 13, 20052005-08-13

in article HyhLe.3597$2C5.637@newsfe1-win.ntli.net, FA at fa@v.net wrote on
08/13/2005 03:45:

> Stephan's demo program is basically a phase vocoder and this artefact is to
> be expected using a phase vocoder

can be expected in *some* phase vocoders.  if the hop size is small enough
and if each sinusoidal peak is processed carefully (well lined up to the
same sinusoid in the previous peak), sustained tones should come out fine.

> Actually if you go back to DSP Dimension now you will find that Stephan has
> released a free (but not open source) library called Dirac that uses a far
> more sophisticated multiresolution approach. I haven't tried it yet but it
> seems to have rave reviews - and Stephan certainly appears to be a bit of a
> guru in this field.

there are others, some whom you name below.

> I believe that Dirac is based on wavelets but he hasn't released many
> technical details - don't blame him :-)
> 
> Mind you given that he's using wavelets I'd love to know how he is dealing
> with aliasing - I was tinkering a bit back using McAuly-Quatieri sinusoidal
> modelling with wavelets and couldn't get the aliasing nailed -

i'm curious what you mean.  the same kinda aliasing that happens when
something is undersampled?  a single sinuosiodal component might end up in
several different wavelet components, but if they are all frequency scaled
by the same factor, they should still add up to the same (scaled) sinusoid.

> my current
> foray into pitch shifting is using the idea proposed by Scott Levine and
> Tony Verma using Laplacian Pyramid filters with an antialiasing kernel
> instead of wavelets - similar idea, different filter - the laplacian pylamid
> isn't critically sampled but the extra CPU cost is minimal (most of it is in
> the resynthesis anyway)
> 
> I suspect Audacity uses PSOLA - Pitch Synchronous Overlap Add which is a
> time domain technique that works well for speech and monophonic sources but
> isn't so great for polyphonic sources - however more sophisticated things
> like MQ analysis give much greater flexibility in the sort of effects that
> you can have, for instance I'm currently experimenting with sound source
> separation to enable such things as splitting notes between the left and
> right channels and doing "interesting" panning effects.

many ways to skin the cat, but no absolutely perfect way that works for all
types of signals.


-- 

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."


> robert bristow-johnson wrote in message ...
>> in article 1123755895.306005.267110@g47g2000cwa.googlegroups.com, Himanshu
>> at hs.chauhan@gmail.com wrote on 08/11/2005 07:01:
>> 
>>> I was implementing pitch shift using STFT (the algorithm that Mr.
>>> Bernsee discusses at his site "dspdimension"). Its working absolutely
>>> fine but as I take the semitone to 12 which yields a pitching factor or
>>> 2.0 (one octave up), the output sounds like somewhat of "vibrato" added
>>> to it. Its not that clean. If you pitch shift the same file using
>>> Audacity at semitone value of 12, its much cleaner and the difference
>>> is remarkable.
>>> I am not sure why this is so.
>> 
>> i haven't look though Stephan's code, but this sounds to me like a phase
>> issue between frames.  what are you doing to 1. identify difference
>> frequency components and 2. to glue the different frequency components
>> together across the frame boundaries?  these are the two most difficult
>> operations, IMO, of the frequency domain method.

Reply by robert bristow-johnson ●August 13, 20052005-08-13

in article 1123945552.178102.157900@g47g2000cwa.googlegroups.com, Himanshu
at hs.chauhan@gmail.com wrote on 08/13/2005 11:05:

> I am a newbie in DSP. I was trying to understand what it was doing. I
> think I understood everything but one. How does an overlapping of
> frames help? 

supressing clicks.

-- 

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."

Reply by ●August 13, 20052005-08-13

robert bristow-johnson <rbj@audioimagination.com> writes:

> 
> supressing clicks.
> 

For some reason, I read this as "supressing chicks"... which seemed a
little OT.  Clearly, I was mistaken.

Ciao,

Peter K.

Reply by robert bristow-johnson ●August 13, 20052005-08-13

in article uy87563w1.fsf@remove.ieee.org, Peter K. at
p.kootsookos@remove.ieee.org wrote on 08/13/2005 15:19:

> robert bristow-johnson <rbj@audioimagination.com> writes:
> 
>> 
>> supressing clicks.
>> 
> 
> For some reason, I read this as "supressing chicks"... which seemed a
> little OT.  Clearly, I was mistaken.

no, you got it right.  we gotta keep them damn clicks down because otherwise
they grow up to be hens and peck at the likes of us.

-- 

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."

Reply by Himanshu ●August 13, 20052005-08-13

How are the clicks suppresed? Is it because of the windowing being
done? I am still not able to correlate the two? Help.

Regards
--Himanshu

PitchShift using STFT

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group