Hi all! Greetings! I was working on pitching shifting for quite sometime. I used Mr. Bernsee's code for that purpose, modified it to use fftw and tweaked a little to bring its response time down, not much but by 7 seconds. But I am still not able to use it for real-time processing. The listing spends 98% of its time performing in-out transforms to frequency domain. So I thought about switching to PSOLA. But before I put any of my efforts in it, can PSOLA be used in real-time processing? Moreover, I read the following on rice website: "...the smaller signals are modified by either repeating or leaving out speech segments, depending on whether the pitch of the target speaker is higher or lower than the pitch of the source speaker..." If thats the case, the duration of the signal will change. Is there any other algorithm that can change the pitch in realtime without changing duration of the signal? Thanks and regards --Himanshu
PSOLA -- changes in duration
Started by ●August 21, 2005
Reply by ●August 21, 20052005-08-21
What sort of machine are you running this on? The Bernsee code is essentially a phase vocoder, and I have had a full pvoc (full of calls to sqrt and atan2 etc!) with pitch shifting running comfortably in real time on a Pentium 4 for several years now. This includes an implementation in Csound ("streaming pvoc opcodes"), but I have also published demo VST plugins with sources, hence able to use FFTW, and on a Pentium 4 2.2GHz, typical CPU load for a mono stream is under 10%. The exact load depends on FFT size and frame overlap: For FFT=1024 and frame overlap = 25%, cpu load is around 6%. So unless you are using a really slow CPU I have to suppose that there is something untoward with your implementation. I have an unmodified build of the dspdimension code on the same machine and while is is unsurprisingly slower than a fully optimised fftw-based pvoc, it is still clearly real-time capable at about 35% CPU, including cost of file i/o. (using VC++ v6, so not even using the advanced f/p facilities). Of course, on a sub-GHz machine, it would start to hit 100% CPU load. My demo VST code is available here: http://www.bath.ac.uk/~masrwd/pvplugs.html (But ignore the Macintosh versions; they are obsolete (pre OS X) and I will be removing them shortly.) Richard Dobson Himanshu wrote:> Hi all! > > Greetings! > > I was working on pitching shifting for quite sometime. I used Mr. > Bernsee's code for that purpose, modified it to use fftw and tweaked a > little to bring its response time down, not much but by 7 seconds. But > I am still not able to use it for real-time processing. The listing > spends 98% of its time performing in-out transforms to frequency > domain. So I thought about switching to PSOLA....
Reply by ●August 21, 20052005-08-21
I tweaked this on a pentium 4 (3 GHz running mandriva 2005 Linux) and am using it as an audio unit on a G4 (400 MHz). The speed is okay and response time is drastically reduced with my optimized code along with the usage of FFTW. On my mac its performance is not that great! Moreover, I don't want to rely on fftw coz i need to port this on blackfinn 533. Thats why I asked is there any other time domain technique like PSOLA that can be of help. Btw, on a 5 min audio file (stereo, 44100 KHz, 32-bit) it takes around 1.3 minutes with semitone of 4, 2048 Bins and overlap factor of 4. Which audacity performs in less than 15 seconds! with same duration of signal! I am wonder what its using? Thanks and regards --Himanshu
Reply by ●August 22, 20052005-08-22
in article 1124599010.620102.192500@g49g2000cwa.googlegroups.com, Himanshu at hs.chauhan@gmail.com wrote on 08/21/2005 00:36:> I was working on pitching shifting for quite sometime. I used Mr. > Bernsee's code for that purpose, modified it to use fftw and tweaked a > little to bring its response time down, not much but by 7 seconds. But > I am still not able to use it for real-time processing. The listing > spends 98% of its time performing in-out transforms to frequency > domain. So I thought about switching to PSOLA.what sort of content are you pitch shifting? a single voice or monophonic instrument? or some full bandwidth mixed orchestral music? there is PSOLA, WSOLA, all sorts of SOLA and OLA, and they can work pretty good for monophonic harmonic tones. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
Reply by ●August 22, 20052005-08-22
Hi! I am working on full bandwidth mixed orchestral music.>there is PSOLA, WSOLA, all sorts of SOLA and OLA, and they can work >pretty good for monophonic harmonic tones.Won't they work on orchestral music? Regards --Himanshu
Reply by ●August 22, 20052005-08-22
in article 1124725947.857469.44760@g44g2000cwa.googlegroups.com, Himanshu at hs.chauhan@gmail.com wrote on 08/22/2005 11:52:> I am working on full bandwidth mixed orchestral music. > >> there is PSOLA, WSOLA, all sorts of SOLA and OLA, and they can work >pretty >> good for monophonic harmonic tones. > > Won't they work on orchestral music?it will have glitches and if the amount of shifting is large, many glitches per second. all of these synchronous overlap-add methods work by examining the audio for similarities in the waveform (this would be a pitch-detector for quasi-periodic sounds) and then splicing in (for up-shifting) extra cycles periods or splicing out (for down-shifting) unwanted cycles or periods of the quasi-periodic waveform. if there *is* no matching similarity (which is the case if the waveform is not periodic in any way), it will look for the best match it can find, but the splice won't be seamless. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
Reply by ●August 22, 20052005-08-22
Himanshu wrote:> ... can PSOLA be used in > real-time processing? Moreover, I read the following on rice website: > > "...the smaller signals are modified by either repeating or leaving out > speech segments, depending on whether the pitch of the target speaker > is higher or lower than the pitch of the source speaker..." > > If thats the case, the duration of the signal will change. Is there any > other algorithm that an change the pitch in realtime without changing > duration of the signal?Changing the pitch without changing the duration, and changing the duration without changing the pitch, are related processes. If you can do one, you can often get the other just by resampling, or playing back the time/frequency modified waveform at a different sample rate that its original. For instance, take 1 second of 100 Hz sine waves sampled at 44100, copy 10 cycles of the sine wave and insert at a convienient zero crossing, and the intermediate result is a 100 Hz tone with a duration of 1.1 seconds. Output this result at 48510 samples per second and playback will take 1 second, same as the original, but sound like a 110 Hz tone. More commonly, one would resample the 48510 signal at 44100, giving the same final result after the antialias filter. Try the same with the sum of multiple close but relatively prime frequencies and you will see one difficulty with using PSOLA and similar algorithms with full polyphonic sound sources. And of course, if the time or frequency shift is large, then you will need to make sure the bandpass and artifact frequencies of the intial shifting algorithm end up in the right place after the final resampling. IMHO. YMMV. -- rhn A.T nicholson d.O.t C-o-M