DSPRelated.com
Forums

PitchShift using STFT

Started by Himanshu August 11, 2005
Hi All!

Greetings!

I was implementing pitch shift using STFT (the algorithm that Mr.
Bernsee discusses at his site "dspdimension"). Its working absolutely
fine but as I take the semitone to 12 which yields a pitching factor or
2.0 (one octave up), the output sounds like somewhat of "vibrato" added
to it. Its not that clean. If you pitch shift the same file using
Audacity at semitone value of 12, its much cleaner and the difference
is remarkable.
I am not sure why this is so.

I am using fftw library for fft.

Any clue to hunt it down?

Thanks and regards
--Himanshu

in article 1123755895.306005.267110@g47g2000cwa.googlegroups.com, Himanshu
at hs.chauhan@gmail.com wrote on 08/11/2005 07:01:

> I was implementing pitch shift using STFT (the algorithm that Mr. > Bernsee discusses at his site "dspdimension"). Its working absolutely > fine but as I take the semitone to 12 which yields a pitching factor or > 2.0 (one octave up), the output sounds like somewhat of "vibrato" added > to it. Its not that clean. If you pitch shift the same file using > Audacity at semitone value of 12, its much cleaner and the difference > is remarkable. > I am not sure why this is so.
i haven't look though Stephan's code, but this sounds to me like a phase issue between frames. what are you doing to 1. identify difference frequency components and 2. to glue the different frequency components together across the frame boundaries? these are the two most difficult operations, IMO, of the frequency domain method. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
Hi All,
Stephan's demo program is basically a phase vocoder and this artefact is to
be expected using a phase vocoder - shifting down an octave works fine BTW
:-)

Actually if you go back to DSP Dimension now you will find that Stephan has
released a free (but not open source) library called Dirac that uses a far
more sophisticated multiresolution approach. I haven't tried it yet but it
seems to have rave reviews - and Stephan certainly appears to be a bit of a
guru in this field.

I believe that Dirac is based on wavelets but he hasn't released many
technical details - don't blame him mind :-)

Mind you given that he's using wavelets I'd love to know how he is dealing
with aliasing - I was tinkering a bit back using McAuly-Quatieri sinusoidal
modelling with wavelets and couldn't get the aliasing nailed - my current
foray into pitch shifting is using the idea proposed by Scott Levine and
Tony Verma using Laplacian Pyramid filters with an antialiasing kernel
instead of wavelets - similar idea, different filter - the laplacian pylamid
isn't critically sampled but the extra CPU cost is minimal (most of it is in
the resynthesis anyway)

I suspect Audacity uses PSOLA - Pitch Synchronous Overlap Add which is a
time domain technique that works well for speech and monophonic sources but
isn't so great for polyphonic sources - however more sophisticated things
like MQ analysis give much greater flexibility in the sort of effects that
you can have, for instance I'm currently experimenting with sound source
separation to enable such things as splitting notes between the left and
right channels and doing "interesting" panning effects.

Cheers,
Fraser.

robert bristow-johnson wrote in message ...
>in article 1123755895.306005.267110@g47g2000cwa.googlegroups.com, Himanshu >at hs.chauhan@gmail.com wrote on 08/11/2005 07:01: > >> I was implementing pitch shift using STFT (the algorithm that Mr. >> Bernsee discusses at his site "dspdimension"). Its working absolutely >> fine but as I take the semitone to 12 which yields a pitching factor or >> 2.0 (one octave up), the output sounds like somewhat of "vibrato" added >> to it. Its not that clean. If you pitch shift the same file using >> Audacity at semitone value of 12, its much cleaner and the difference >> is remarkable. >> I am not sure why this is so. > >i haven't look though Stephan's code, but this sounds to me like a phase >issue between frames. what are you doing to 1. identify difference >frequency components and 2. to glue the different frequency components >together across the frame boundaries? these are the two most difficult >operations, IMO, of the frequency domain method. > >-- > >r b-j rbj@audioimagination.com > >"Imagination is more important than knowledge." > >
Hi!

I am a newbie in DSP. I was trying to understand what it was doing. I
think I understood everything but one. How does an overlapping of
frames help? 

Regards
--Himanshu

in article HyhLe.3597$2C5.637@newsfe1-win.ntli.net, FA at fa@v.net wrote on
08/13/2005 03:45:

> Stephan's demo program is basically a phase vocoder and this artefact is to > be expected using a phase vocoder
can be expected in *some* phase vocoders. if the hop size is small enough and if each sinusoidal peak is processed carefully (well lined up to the same sinusoid in the previous peak), sustained tones should come out fine.
> Actually if you go back to DSP Dimension now you will find that Stephan has > released a free (but not open source) library called Dirac that uses a far > more sophisticated multiresolution approach. I haven't tried it yet but it > seems to have rave reviews - and Stephan certainly appears to be a bit of a > guru in this field.
there are others, some whom you name below.
> I believe that Dirac is based on wavelets but he hasn't released many > technical details - don't blame him :-) > > Mind you given that he's using wavelets I'd love to know how he is dealing > with aliasing - I was tinkering a bit back using McAuly-Quatieri sinusoidal > modelling with wavelets and couldn't get the aliasing nailed -
i'm curious what you mean. the same kinda aliasing that happens when something is undersampled? a single sinuosiodal component might end up in several different wavelet components, but if they are all frequency scaled by the same factor, they should still add up to the same (scaled) sinusoid.
> my current > foray into pitch shifting is using the idea proposed by Scott Levine and > Tony Verma using Laplacian Pyramid filters with an antialiasing kernel > instead of wavelets - similar idea, different filter - the laplacian pylamid > isn't critically sampled but the extra CPU cost is minimal (most of it is in > the resynthesis anyway) > > I suspect Audacity uses PSOLA - Pitch Synchronous Overlap Add which is a > time domain technique that works well for speech and monophonic sources but > isn't so great for polyphonic sources - however more sophisticated things > like MQ analysis give much greater flexibility in the sort of effects that > you can have, for instance I'm currently experimenting with sound source > separation to enable such things as splitting notes between the left and > right channels and doing "interesting" panning effects.
many ways to skin the cat, but no absolutely perfect way that works for all types of signals. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
> robert bristow-johnson wrote in message ... >> in article 1123755895.306005.267110@g47g2000cwa.googlegroups.com, Himanshu >> at hs.chauhan@gmail.com wrote on 08/11/2005 07:01: >> >>> I was implementing pitch shift using STFT (the algorithm that Mr. >>> Bernsee discusses at his site "dspdimension"). Its working absolutely >>> fine but as I take the semitone to 12 which yields a pitching factor or >>> 2.0 (one octave up), the output sounds like somewhat of "vibrato" added >>> to it. Its not that clean. If you pitch shift the same file using >>> Audacity at semitone value of 12, its much cleaner and the difference >>> is remarkable. >>> I am not sure why this is so. >> >> i haven't look though Stephan's code, but this sounds to me like a phase >> issue between frames. what are you doing to 1. identify difference >> frequency components and 2. to glue the different frequency components >> together across the frame boundaries? these are the two most difficult >> operations, IMO, of the frequency domain method.
in article 1123945552.178102.157900@g47g2000cwa.googlegroups.com, Himanshu
at hs.chauhan@gmail.com wrote on 08/13/2005 11:05:

> I am a newbie in DSP. I was trying to understand what it was doing. I > think I understood everything but one. How does an overlapping of > frames help?
supressing clicks. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
robert bristow-johnson <rbj@audioimagination.com> writes:

> > supressing clicks. >
For some reason, I read this as "supressing chicks"... which seemed a little OT. Clearly, I was mistaken. Ciao, Peter K.
in article uy87563w1.fsf@remove.ieee.org, Peter K. at
p.kootsookos@remove.ieee.org wrote on 08/13/2005 15:19:

> robert bristow-johnson <rbj@audioimagination.com> writes: > >> >> supressing clicks. >> > > For some reason, I read this as "supressing chicks"... which seemed a > little OT. Clearly, I was mistaken.
no, you got it right. we gotta keep them damn clicks down because otherwise they grow up to be hens and peck at the likes of us. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
How are the clicks suppresed? Is it because of the windowing being
done? I am still not able to correlate the two? Help.

Regards
--Himanshu