DSPRelated.com
Forums

PSOLA -- changes in duration

Started by Himanshu August 21, 2005
Hi all!

Greetings!

I was working on pitching shifting for quite sometime. I used Mr.
Bernsee's code for that purpose, modified it to use fftw and tweaked a
little to bring its response time down, not much but by 7 seconds. But
I am still not able to use it for real-time processing. The listing
spends 98% of its time performing in-out transforms to frequency
domain. So I thought about switching to PSOLA.

But before I put any of my efforts in it, can PSOLA be used in
real-time processing? Moreover, I read the following on rice website:

"...the smaller signals are modified by either repeating or leaving out
speech segments, depending on whether the pitch of the target speaker
is higher or lower than the pitch of the source speaker..."

If thats the case, the duration of the signal will change. Is there any
other algorithm that can change the pitch in realtime without changing
duration of the signal?

Thanks and regards
--Himanshu

What sort of machine are you running this on? The Bernsee code is essentially a 
phase vocoder, and I have had a full pvoc (full of calls to sqrt and atan2 etc!) 
with pitch shifting running comfortably in real time on a Pentium 4 for several 
years now. This includes an implementation in Csound ("streaming pvoc opcodes"), 
but I have also published demo VST plugins with sources, hence able to use FFTW, 
and on a Pentium 4 2.2GHz, typical CPU load for a mono stream is under 10%. The 
exact load depends on FFT size and frame overlap: For FFT=1024 and frame overlap 
= 25%, cpu load is around 6%.

So unless you are using a really slow CPU I have to suppose that there is 
something untoward with your implementation. I have an unmodified build of the 
dspdimension code on the same machine and  while is is unsurprisingly slower 
than a fully optimised fftw-based pvoc, it is still clearly  real-time capable 
at about 35% CPU, including cost of file i/o. (using VC++ v6, so not even using 
the advanced f/p facilities). Of course, on a sub-GHz machine, it would start to 
hit 100% CPU load.

My demo VST code is available here:

http://www.bath.ac.uk/~masrwd/pvplugs.html

(But ignore the Macintosh versions; they are obsolete (pre OS X) and I will be 
removing them shortly.)


Richard Dobson

Himanshu wrote:
> Hi all! > > Greetings! > > I was working on pitching shifting for quite sometime. I used Mr. > Bernsee's code for that purpose, modified it to use fftw and tweaked a > little to bring its response time down, not much but by 7 seconds. But > I am still not able to use it for real-time processing. The listing > spends 98% of its time performing in-out transforms to frequency > domain. So I thought about switching to PSOLA.
...
I tweaked this on a pentium 4 (3 GHz running mandriva 2005 Linux) and
am using it as an audio unit on a G4 (400 MHz). The speed is okay and
response time is drastically reduced with my optimized code along with
the usage of FFTW. On my mac its performance is not that great!
Moreover, I don't want to rely on fftw coz i need to port this on
blackfinn 533. Thats why I asked is  there any other time domain
technique like PSOLA that can be of help.

Btw, on a 5 min audio file (stereo, 44100 KHz, 32-bit) it takes around
1.3 minutes with semitone of 4, 2048 Bins and overlap factor of 4.
Which audacity performs in less than 15 seconds! with same duration of
signal! I am wonder what its using?

Thanks and regards
--Himanshu

in article 1124599010.620102.192500@g49g2000cwa.googlegroups.com, Himanshu
at hs.chauhan@gmail.com wrote on 08/21/2005 00:36:

> I was working on pitching shifting for quite sometime. I used Mr. > Bernsee's code for that purpose, modified it to use fftw and tweaked a > little to bring its response time down, not much but by 7 seconds. But > I am still not able to use it for real-time processing. The listing > spends 98% of its time performing in-out transforms to frequency > domain. So I thought about switching to PSOLA.
what sort of content are you pitch shifting? a single voice or monophonic instrument? or some full bandwidth mixed orchestral music? there is PSOLA, WSOLA, all sorts of SOLA and OLA, and they can work pretty good for monophonic harmonic tones. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
Hi!

I am working on full bandwidth mixed orchestral music.

>there is PSOLA, WSOLA, all sorts of SOLA and OLA, and they can work >pretty good for monophonic harmonic tones.
Won't they work on orchestral music? Regards --Himanshu
in article 1124725947.857469.44760@g44g2000cwa.googlegroups.com, Himanshu at
hs.chauhan@gmail.com wrote on 08/22/2005 11:52:

> I am working on full bandwidth mixed orchestral music. > >> there is PSOLA, WSOLA, all sorts of SOLA and OLA, and they can work >pretty >> good for monophonic harmonic tones. > > Won't they work on orchestral music?
it will have glitches and if the amount of shifting is large, many glitches per second. all of these synchronous overlap-add methods work by examining the audio for similarities in the waveform (this would be a pitch-detector for quasi-periodic sounds) and then splicing in (for up-shifting) extra cycles periods or splicing out (for down-shifting) unwanted cycles or periods of the quasi-periodic waveform. if there *is* no matching similarity (which is the case if the waveform is not periodic in any way), it will look for the best match it can find, but the splice won't be seamless. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
Himanshu wrote:
> ... can PSOLA be used in > real-time processing? Moreover, I read the following on rice website: > > "...the smaller signals are modified by either repeating or leaving out > speech segments, depending on whether the pitch of the target speaker > is higher or lower than the pitch of the source speaker..." > > If thats the case, the duration of the signal will change. Is there any > other algorithm that an change the pitch in realtime without changing > duration of the signal?
Changing the pitch without changing the duration, and changing the duration without changing the pitch, are related processes. If you can do one, you can often get the other just by resampling, or playing back the time/frequency modified waveform at a different sample rate that its original. For instance, take 1 second of 100 Hz sine waves sampled at 44100, copy 10 cycles of the sine wave and insert at a convienient zero crossing, and the intermediate result is a 100 Hz tone with a duration of 1.1 seconds. Output this result at 48510 samples per second and playback will take 1 second, same as the original, but sound like a 110 Hz tone. More commonly, one would resample the 48510 signal at 44100, giving the same final result after the antialias filter. Try the same with the sum of multiple close but relatively prime frequencies and you will see one difficulty with using PSOLA and similar algorithms with full polyphonic sound sources. And of course, if the time or frequency shift is large, then you will need to make sure the bandpass and artifact frequencies of the intial shifting algorithm end up in the right place after the final resampling. IMHO. YMMV. -- rhn A.T nicholson d.O.t C-o-M