Hi all, Been trying to sort out the details of Phase Vocoder based audio time stretching. I basically understand the business of FFTs, and of resynthesizing with the windowed segments spaced farther apart (having adjusted appropriately for phase, of course.) What I don't understand is this: When you change the overlap of the windows to stretch the sound, why don't you get amplitude modulation (or do you?) since their should be 'dips' in the amplitude of the resultant signal, and what do you do if you want to stretch so much that the windows don't overlap anymore? Am I misunderstanding this and you need to interpolate between the phases and magnitudes of the input frames? Thanks, Dave
Help with PV Time-stretch
Started by ●July 3, 2006
Reply by ●July 3, 20062006-07-03
>Hi all, > >Been trying to sort out the details of Phase Vocoder based audio time >stretching. I basically understand the business of FFTs, and of >resynthesizing with the windowed segments spaced farther apart (having >adjusted appropriately for phase, of course.) > >What I don't understand is this: When you change the overlap of the >windows to stretch the sound, why don't you get amplitude modulation (or >do you?) since their should be 'dips' in the amplitude of the resultant >signal, and what do you do if you want to stretch so much that thewindows>don't overlap anymore? > >Am I misunderstanding this and you need to interpolate between thephases>and magnitudes of the input frames? > >Thanks, > >DaveHi, You cannot just change the amount of overlap. There are different implementations and strategies which fit into the category "phase vocoder". It is even possible to do the resynthesis without framewise overlap-add, but instead by calculating breakpoint envelopes for the amplitudes and frequencies of sine waves produced by an oscillator bank. If you want to do overlap-add you still can use breakpoint envelopes (e.g. build by linear interpolation between the frames). You can go from phase information to frequency information (for every bin) and you can calculate the phases for your resynthesis frames from that. So the overlap can stay the same as in the analysis but the frames are constructed with bins which follow the shape of you frequency and magnitude envelopes but slowed down or sped up. 1. Window your buffer (hanning should be fine) 2. Rearrange your buffer before the analysis so that the middle point is at the beginning (otherwise your phase information will be very bad) 3. maybe zeropad (but your last sample of the buffer is now in the middle - so you have to stuff your zeros in the middle) 4. fft 5. To convert from phase information to frequency information you check the phase differences between successive frames and/or you can try some sort of peak finding to find one sinusoid for a peak which leaks to multiple bins in your fft. gr. Anton>
Reply by ●July 3, 20062006-07-03
>>Hi all, >> >>Been trying to sort out the details of Phase Vocoder based audio time >>stretching. I basically understand the business of FFTs, and of >>resynthesizing with the windowed segments spaced farther apart (having >>adjusted appropriately for phase, of course.) >> >>What I don't understand is this: When you change the overlap of the >>windows to stretch the sound, why don't you get amplitude modulation(or>>do you?) since their should be 'dips' in the amplitude of the resultant >>signal, and what do you do if you want to stretch so much that the >windows >>don't overlap anymore? >> >>Am I misunderstanding this and you need to interpolate between the >phases >>and magnitudes of the input frames? >> >>Thanks, >> >>Dave > >Hi, > >You cannot just change the amount of overlap. There are different >implementations and strategies which fit into the category "phase >vocoder". >It is even possible to do the resynthesis without framewise overlap-add, >but instead by calculating breakpoint envelopes for the amplitudes and >frequencies of sine waves produced by an oscillator bank. If you wantto>do overlap-add you still can use breakpoint envelopes (e.g. build by >linear interpolation between the frames). You can go from phase >information to frequency information (for every bin) and you cancalculate>the phases for your resynthesis frames from that. So the overlap canstay>the same as >in the analysis but the frames are constructed with bins which follow >the shape of you frequency and magnitude envelopes but slowed down >or sped up. > >1. Window your buffer (hanning should be fine) >2. Rearrange your buffer before the analysis so that the > middle point is at the beginning (otherwise your phase > information will be very bad) >3. maybe zeropad (but your last sample of the buffer is now > in the middle - so you have to stuff your zeros in the middle) >4. fft >5. To convert from phase information to frequency information > you check the phase differences between successive frames > and/or you can try some sort of peak finding to find > one sinusoid for a peak which leaks to multiple bins > in your fft. > >gr. >Anton >I once experimented with phase vocoding and maybe the best way to get started is by looking at working implementations In matlab: http://labrosa.ee.columbia.edu/matlab/pvoc/ In C: http://quitte.de/dsp/pvoc.html http://www.cs.bath.ac.uk/~jpff/NOS-DREAM/researchdev/pvocex/pvocex.html Another approach (not a standard phase vocoder but related): http://www.cerlsoundgroup.org/Loris/ A paper by miller puckette: 33. Puckette, M. 1995. "Phase-locked Vocoder." Proceedings, IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics. http://www.iem.at/projekte/dsp/hammer/hammer.pdf http://www.iua.upf.es/mtg/publications/OrganizedSound.5.3.pdf good luck, Anton
Reply by ●July 4, 20062006-07-04
dave-compdsp wrote:> > What I don't understand is this: When you change the overlap of the > windows to stretch the sound, why don't you get amplitude modulation (or > do you?) since their should be 'dips' in the amplitude of the resultant > signal, and what do you do if you want to stretch so much that the windows > don't overlap anymore?when you time-scale audio, you are moving the adjacent window (or frame) spacing. the spacing *after* this movement should correspond to the windows exactly overlap complementarily (one window going down and the next one going up the same amount). that means, for reconstruction there should be 50% overlap. exactly two windowed frames of sound overlapping with one window going up will the other goes down. if you're time-stretching, then, in the input data, the windows are spaced closer than they would be upon reconstruction. that means more than 50% overlap you have more than two windows overlapping at a time. if you're compressing, then, in the input data, the windows are spaced further apart than they would be upon reconstruction. that means less than 50% overlap you have two or no windows overlapping at a time. r b-j
Reply by ●July 4, 20062006-07-04
Thanks Anton, for your reply. A question:>> >>1. Window your buffer (hanning should be fine) >>2. Rearrange your buffer before the analysis so that the >> middle point is at the beginning (otherwise your phase >> information will be very bad)I'm not sure I understand this. Is there a short answer to why it's necessary?>I once experimented with phase vocoding and maybe the best way >to get started is by looking at working implementations > >In matlab: >http://labrosa.ee.columbia.edu/matlab/pvoc/ > >In C: >http://quitte.de/dsp/pvoc.html >http://www.cs.bath.ac.uk/~jpff/NOS-DREAM/researchdev/pvocex/pvocex.html > >Another approach (not a standard phase vocoder but related): >http://www.cerlsoundgroup.org/Loris/ > >A paper by miller puckette: >33. Puckette, M. 1995. "Phase-locked Vocoder." Proceedings, IEEE ASSP >Workshop on Applications of Signal Processing to Audio and Acoustics. > >http://www.iem.at/projekte/dsp/hammer/hammer.pdf >http://www.iua.upf.es/mtg/publications/OrganizedSound.5.3.pdf > >good luck, >AntonThanks for the tips. I'll look at these. Dave
Reply by ●July 4, 20062006-07-04
Ah, thanks! That should have been obvious. Dave> >dave-compdsp wrote: >> >> What I don't understand is this: When you change the overlap of the >> windows to stretch the sound, why don't you get amplitude modulation(or>> do you?) since their should be 'dips' in the amplitude of theresultant>> signal, and what do you do if you want to stretch so much that thewindows>> don't overlap anymore? > >when you time-scale audio, you are moving the adjacent window (or >frame) spacing. the spacing *after* this movement should correspond to >the windows exactly overlap complementarily (one window going down and >the next one going up the same amount). that means, for reconstruction >there should be 50% overlap. exactly two windowed frames of sound >overlapping with one window going up will the other goes down. > >if you're time-stretching, then, in the input data, the windows are >spaced closer than they would be upon reconstruction. that means more >than 50% overlap you have more than two windows overlapping at a time. > >if you're compressing, then, in the input data, the windows are spaced >further apart than they would be upon reconstruction. that means less >than 50% overlap you have two or no windows overlapping at a time. > >r b-j > >
Reply by ●July 5, 20062006-07-05
> >Thanks Anton, for your reply. A question: >>> >>>1. Window your buffer (hanning should be fine) >>>2. Rearrange your buffer before the analysis so that the >>> middle point is at the beginning (otherwise your phase >>> information will be very bad) > >I'm not sure I understand this. Is there a short answer to why it's >necessary? >You need to have the midpoint of the window at the time-origin. The phase information from the fft corresponds to the phase of your components at the time-origin. I *think* the effect on the phase has to do with the fact that in the frequency domain the windowing is a convolution with [0.5, -0.25, 0,0,....,-0.25] for the non rearranged von Hann window [0.5, 0.25, 0,0,...., 0.25] for the rearranged one. and as long as there are just positive real numbers in the convolution kernel this should not have any effect on the phase. But.. ooph I am totally unsure about this explanation. I hope somebody will correct me here or give a better explanation. You can read about it here: http://ccrma.stanford.edu/~jos/parshl/Filling_FFT_Input_Buffer.html
Reply by ●July 6, 20062006-07-06
"dave-compdsp" <dsp@oink.co.uk> writes:> Thanks Anton, for your reply. A question: >>> >>>1. Window your buffer (hanning should be fine) >>>2. Rearrange your buffer before the analysis so that the >>> middle point is at the beginning (otherwise your phase >>> information will be very bad) > > I'm not sure I understand this. Is there a short answer to why it's > necessary?To bring the center of the windowed data into the time-origin. This is called zero-phase windowing. It always made intuitively sense to me, because the fft phase information corresponds to the phase at the beginning of the buffer (the time-origin) and the window will have the strongest suppresion at this point (down to 0 in case of a von Hann window). And for a hanning window the cosine is at phase pi at this point (because it is -0.5 cos(..)) But now that you ask, I realize that this is not an explanation. Some thoughts about it: The multiplication with the window in the time domain is a convolution in the frequency domain. So looking at the window for both cases gives for the non-rearranged case: time domain: 0.5 - 0.5 * cos(2*pi*n/(N-1)) freq. domain convolution: [-0.25, 0.5, -0.25] for the rearranged case: time domain: 0.5 + 0.5 * cos(2*pi*n/(N-1)) freq. domain convolution: [ 0.25, 0.5, 0.25] The only difference are the minus signs. I really hope somebody will enlight me, with a real explanation. What is the effect on the phase when you apply a hanning window before taking the fft and how does rearranging the windowed data, so that the midpoint of the window is in the time-origin, improve the result? Some resources for the topic: F. J. Harris, "On the use of windows for harmonic analysis with the discrete fourier transform" Proceedings of the IEEE, vol. 66, no. 1, pp. 51--83, 1978 (I am not an IEEE member, so I do not have access to this paper) Here are some pages which talk about how to do zero phase windowing but there is no in-depth explanation why: http://ccrma.stanford.edu/~jos/parshl/Filling_FFT_Input_Buffer.html http://www.iua.upf.es/~xserra/articles/msm/computation.html http://www.dsprelated.com/showmessage/41851/1.php
Reply by ●July 6, 20062006-07-06
Anton wrote: ...> for the non-rearranged case:...> freq. domain convolution: [-0.25, 0.5, -0.25]A high-pass binomial (approximation to Gaussian) filter of order 2.> for the rearranged case:...> freq. domain convolution: [ 0.25, 0.5, 0.25]A low-pass binomial filter of order 2. What's the significance? Beats me! Dilip? Clay? Anybody? Jerry -- Engineering is the art of making what you want from things you can get. �����������������������������������������������������������������������
Reply by ●July 7, 20062006-07-07
Anton <bantone@casema.nl> writes:> "dave-compdsp" <dsp@oink.co.uk> writes: > >> Thanks Anton, for your reply. A question: >>>> >>>>1. Window your buffer (hanning should be fine) >>>>2. Rearrange your buffer before the analysis so that the >>>> middle point is at the beginning (otherwise your phase >>>> information will be very bad) >> >> I'm not sure I understand this. Is there a short answer to why it's >> necessary? > > > To bring the center of the windowed data into the time-origin. > This is called zero-phase windowing. >Hi, I have to admit I was just following an advice that a teacher gave to me and I saw that other people are using this kind of technique (PARSHL, sms tools etc..). I thought it makes sense, but I cannot give any reason why this is an advantage. So I take back my statement: "otherwise your phase information will be very bad". The transform of the rearranged window contains just positive real numbers. Why is that good? I don't know. I have to find my old sources from when I was playing around with phase vocoding. What is the issue with zero-phase windowing? gr. Anton






