Hi All, Has anyone else played around with "zero phase" windows for STFT? The idea here is that instead of the "conventional" way of applying the window where the zero'th window coefficient is applied to the zero'th input sample in the FFT buffer etc. and all of the zero padding (if any) is applied at the end, with a zero phase window you apply the centre coefficient of the window to the zero'th input sample and circular index the window applying any zero padding in the centre of the FFT buffer. The reason that I ask is because I've noticed a few interesting side effects of this approach - the most interesting of which is that if you work out the "real" frequency at each FFT bin by phase differentiation (pretty much using the phase vocoder approach) what you will see is that there are "clusters" of bins that all register the same "real" frequency. This has two advantages: The first is if you want to carry out peak picking to identify partials then you will get much less frequency jitter than a conventional window in the case where poor resolution causes subsequent frames to move up and down a bin or two. The second advantage is really cool - basically you can identify partials that aren't actually FFT peaks - e.g. partials obscured by spectral smearing. What I've done is to write a loop which looks for frequency clusters and if it finds one it then looks for a frequency cluster around the first and second harmonic - the reason for the harmonic test is that this approach isn't completely infallible, however with a few hearistics like the harmonic test it has managed to give significantly higher perceived frequency resolution for low pitches when applied to my sinusoidal modelling application than I would otherwise have been able to get with a given window size - which has obvious time resolution benefits. Anyway - I just wanted to see if anyone else has tried this, I've seen the zero phase window used in a couple of places, but I've not seen anyone else use the "frequency clustering" property. Interested in comments. TTFN Fraser.
"zero phase" FFT windows
Started by ●September 11, 2005
Reply by ●September 11, 20052005-09-11
FA wrote:> Hi All, > Has anyone else played around with "zero phase" windows for STFT? > > The idea here is that instead of the "conventional" way of applying the > window where the zero'th window coefficient is applied to the zero'th input > sample in the FFT buffer etc. and all of the zero padding (if any) is > applied at the end, with a zero phase window you apply the centre > coefficient of the window to the zero'th input sample and circular index the > window applying any zero padding in the centre of the FFT buffer.OK... the center window coefficient of an M length window is applied to x[0]. You use the wrap-around effect to map the "leading edge" of the window to the end of the data buffer... so far so good. What part of the data window is mapped to the end? x[-M/2]..x[-1]?> The reason that I ask is because I've noticed a few interesting side effects > of this approach - the most interesting of which is that if you work out the > "real" frequency at each FFT bin by phase differentiation (pretty much using > the phase vocoder approach) what you will see is that there are "clusters" > of bins that all register the same "real" frequency.I don't understand. What do you mean by "real frequency"? How is this approach different from examining the magnitude of the spectrum?> This has two advantages: > > The first is if you want to carry out peak picking to identify partials then > you will get much less frequency jitter than a conventional window in the > case where poor resolution causes subsequent frames to move up and down a > bin or two.How do "conventional windows" shift the frequency? It is well known that windowing broadens the frequency peaks, but do they shift?> The second advantage is really cool - basically you can identify partials > that aren't actually FFT peaks - e.g. partials obscured by spectral > smearing. What I've done is to write a loop which looks for frequency > clusters and if it finds one it then looks for a frequency cluster around > the first and second harmonic - the reason for the harmonic test is that > this approach isn't completely infallible, however with a few hearistics > like the harmonic test it has managed to give significantly higher perceived > frequency resolution for low pitches when applied to my sinusoidal modelling > application than I would otherwise have been able to get with a given window > size - which has obvious time resolution benefits.I don't understand what you mean. What is a "partial"? What is a "frequency cluster"?> Anyway - I just wanted to see if anyone else has tried this, I've seen the > zero phase window used in a couple of places, but I've not seen anyone else > use the "frequency clustering" property.I don't understand much of what you do here. As far as I can tell, doing a time shift in the DFT as you do here, would only affect the phase term of the spectrum, not the magnitudes. So basically, I don't see any reason why anything interesting should happen here. If, on the other hand, you have been examining the real parts of the spectrum, I would not be surprised if you see some effects. Could you provide some more details about what you do, and how? Rune
Reply by ●September 11, 20052005-09-11
FA wrote:> Hi All, > Has anyone else played around with "zero phase" windows for STFT? > > The idea here is that instead of the "conventional" way of applying the > window where the zero'th window coefficient is applied to the zero'th input > sample in the FFT buffer etc. and all of the zero padding (if any) is > applied at the end, with a zero phase window you apply the centre > coefficient of the window to the zero'th input sample and circular index the > window applying any zero padding in the centre of the FFT buffer. > > The reason that I ask is because I've noticed a few interesting side effects > of this approach - the most interesting of which is that if you work out the > "real" frequency at each FFT bin by phase differentiation (pretty much using > the phase vocoder approach) what you will see is that there are "clusters" > of bins that all register the same "real" frequency. > ...I am a bit puzzled by this observation; I have found this clustering happens in all the phase vocoders I have studied (e.g. the CARL one written by Mark Dolson, which I have based all my own work on), and I have assuemd that it is a natural emergent feature from the delta-phase calcuation between frames (in pvoc, to determine amplitude+frequency for each bin). So I am intrigued that there could be a phase vocoder where this doesn't happen - seems like a contradiction in terms. None of the pvocs that I am aware of does zero-padding (maybe they shoudld!), but some do "double-windowing" (option in CARL pvoc, default in F.R.Foore pvoc). The clustering of bins around a peak has been described by several authors in the context of the goal of reducing phase vocoder transient smearing. A paper by Miller Puckette (of PD fame) on the "phase-locked vocoder" was followed by a paper by Dolson and Jean Laroche offering a more advanced method; both seem to exploit this clustering effect (though Dolson and Laroche describe their calculations in terms of phase tracking rather than frequency bunching; whether this is a meaninful distinction is another matter). So you may find it useful to study the Dolson/Laroche paper: "Improved Phase Vocoder TimeScale Modification of Audio", IEEE Transactions on AYdio and Speech processing, Vol 7:3, 1999. If you Google on that title you will find lots of other useful related material, as that paper has been very widely cited. A modern source for Dolson's phase vocoder is the Csound sources (look for the streaming pvoc opcodes), or my own updated versions at: http://dream.cs.bath.ac.uk/researchdev/pvocex/pvocex.html I would be interested to have your observations on this in the context of your "conventional" description - I thought CARL pvoc was already "conventional"! Note that the method Dolson and Laroche describe is patented on behalf of Creative Labs/Emu; it is employed for example in the "Audigy" range of soundcards for their time-scaling facilities. I would have to re-read all the literature on partial tracking to be sure that bin clustering per se has not been exploited, but in all the pvocs I know of it is such an obvious phenomenon that it is most unlikely it has escaped attention. The main problem as you indicate is that of tracking weak partials (which may be genuine even if not harmonically related to a suspected fundamental). So the research interest is no so much that it happens, but in how best/accurately to exploit the fact that it happens. Richard Dobson
Reply by ●September 14, 20052005-09-14
Hi Richard, As you rightly observe what I'm doing is pretty similar to a phase vocoder. Basically I'm applying the phase vocoder phase differentiation to calculate the "true" frequency at each bin. My app isn't a true phase vocoder in the sense that although I have used the phase differentiation that one does in a phase vocoder I am actually using McAulay Quatieri Sinusoidal Modelling - that is to say I do peak detection and pruning (using a psychoacoustic model) but the frequency (in my magnitude/frequency/phase parameterisation of each partiall) has been obtained using the pvoc phase differentiation approach. With respect to the zero phase window, my observation was simply that using this type of window instead of the more conventional window gives less frequency jitter when tracking between frames - I suspect that this is less of an issue in a conventional pvoc because all of the bins are considered whereas in the Sinusoidal Modelling only those bins corresponding to true sinusoidal partials are considered for modification and resynthesis. Aplologies - my posting was intended to stimulate discussion, but you observe that I had probably combined two concepts - that of the zero phase window and that of the frequency clustering effect - probably just confused matters that :-( I'll take a look at some of the references you have mentioned - I'm interested to see how others have made use of this phenomenon - as you say it's such a noticeable effect when pvoc'ing. It has certainly given noticeable improvements to my pitch shifting application so I reckon that there is at least some mileage in it. Regards, Frase. Richard Dobson wrote in message ...>FA wrote: > >> Hi All, >> Has anyone else played around with "zero phase" windows for STFT? >> >> The idea here is that instead of the "conventional" way of applying the >> window where the zero'th window coefficient is applied to the zero'thinput>> sample in the FFT buffer etc. and all of the zero padding (if any) is >> applied at the end, with a zero phase window you apply the centre >> coefficient of the window to the zero'th input sample and circular indexthe>> window applying any zero padding in the centre of the FFT buffer. >> >> The reason that I ask is because I've noticed a few interesting sideeffects>> of this approach - the most interesting of which is that if you work outthe>> "real" frequency at each FFT bin by phase differentiation (pretty muchusing>> the phase vocoder approach) what you will see is that there are"clusters">> of bins that all register the same "real" frequency. >> ... > >I am a bit puzzled by this observation; I have found this clusteringhappens in>all the phase vocoders I have studied (e.g. the CARL one written by MarkDolson,>which I have based all my own work on), and I have assuemd that it is anatural>emergent feature from the delta-phase calcuation between frames (in pvoc,to>determine amplitude+frequency for each bin). So I am intrigued that therecould>be a phase vocoder where this doesn't happen - seems like a contradictionin>terms. None of the pvocs that I am aware of does zero-padding (maybe they >shoudld!), but some do "double-windowing" (option in CARL pvoc, default in >F.R.Foore pvoc). > >The clustering of bins around a peak has been described by several authorsin>the context of the goal of reducing phase vocoder transient smearing. Apaper by>Miller Puckette (of PD fame) on the "phase-locked vocoder" was followed bya>paper by Dolson and Jean Laroche offering a more advanced method; both seemto>exploit this clustering effect (though Dolson and Laroche describe their >calculations in terms of phase tracking rather than frequency bunching;whether>this is a meaninful distinction is another matter). > >So you may find it useful to study the Dolson/Laroche paper: > >"Improved Phase Vocoder TimeScale Modification of Audio", >IEEE Transactions on AYdio and Speech processing, Vol 7:3, 1999. > >If you Google on that title you will find lots of other useful relatedmaterial,>as that paper has been very widely cited. > >A modern source for Dolson's phase vocoder is the Csound sources (look forthe>streaming pvoc opcodes), or my own updated versions at: > >http://dream.cs.bath.ac.uk/researchdev/pvocex/pvocex.html > >I would be interested to have your observations on this in the context ofyour>"conventional" description - I thought CARL pvoc was already"conventional"!> >Note that the method Dolson and Laroche describe is patented on behalf of >Creative Labs/Emu; it is employed for example in the "Audigy" range of >soundcards for their time-scaling facilities. I would have to re-read allthe>literature on partial tracking to be sure that bin clustering per se hasnot>been exploited, but in all the pvocs I know of it is such an obviousphenomenon>that it is most unlikely it has escaped attention. The main problem as you >indicate is that of tracking weak partials (which may be genuine even ifnot>harmonically related to a suspected fundamental). So the research interestis no>so much that it happens, but in how best/accurately to exploit the factthat it>happens. > >Richard Dobson > > > > >
Reply by ●September 14, 20052005-09-14
Howdy Rune, If you take a look at the thread Richard Dobson replied with he's on the right track, when I refer to "true" frequencies I mean the frequencies that have been obtained on each bin by using the phase vocoder phase differentiation approach. The term "partial" is quite a common term from Sinusoidal Modelling and it refers simply to spectral peaks or more accurately it refers to real spectral peaks as opposed to sidelobe peaks and the trick in Sinusoidal Modelling is to extract the partials but not the sidelobes The frequency clustering that I refer to is an artefact of the phase vocoder algorithm and as I said in my original post I've made use of that effect in my application to identify likely partials that aren't actually spectral peaks (when I said earlier that partials are spectral peaks what I was trying to say was that partials are perceptually significant sinusoidal components - impulses in the frequency domain if you like) Regards, Frase. Rune Allnor wrote in message <1126463598.258802.289760@f14g2000cwb.googlegroups.com>...> >FA wrote: >> Hi All, >> Has anyone else played around with "zero phase" windows for STFT? >> >> The idea here is that instead of the "conventional" way of applying the >> window where the zero'th window coefficient is applied to the zero'thinput>> sample in the FFT buffer etc. and all of the zero padding (if any) is >> applied at the end, with a zero phase window you apply the centre >> coefficient of the window to the zero'th input sample and circular indexthe>> window applying any zero padding in the centre of the FFT buffer. > >OK... the center window coefficient of an M length window is applied >to x[0]. You use the wrap-around effect to map the "leading edge" of >the window to the end of the data buffer... so far so good. > >What part of the data window is mapped to the end? x[-M/2]..x[-1]? > >> The reason that I ask is because I've noticed a few interesting sideeffects>> of this approach - the most interesting of which is that if you work outthe>> "real" frequency at each FFT bin by phase differentiation (pretty muchusing>> the phase vocoder approach) what you will see is that there are"clusters">> of bins that all register the same "real" frequency. > >I don't understand. What do you mean by "real frequency"? How is >this approach different from examining the magnitude of the spectrum? > >> This has two advantages: >> >> The first is if you want to carry out peak picking to identify partialsthen>> you will get much less frequency jitter than a conventional window in the >> case where poor resolution causes subsequent frames to move up and down a >> bin or two. > >How do "conventional windows" shift the frequency? It is well known >that windowing broadens the frequency peaks, but do they shift? > >> The second advantage is really cool - basically you can identify partials >> that aren't actually FFT peaks - e.g. partials obscured by spectral >> smearing. What I've done is to write a loop which looks for frequency >> clusters and if it finds one it then looks for a frequency cluster around >> the first and second harmonic - the reason for the harmonic test is that >> this approach isn't completely infallible, however with a few hearistics >> like the harmonic test it has managed to give significantly higherperceived>> frequency resolution for low pitches when applied to my sinusoidalmodelling>> application than I would otherwise have been able to get with a givenwindow>> size - which has obvious time resolution benefits. > >I don't understand what you mean. What is a "partial"? What is a >"frequency cluster"? > >> Anyway - I just wanted to see if anyone else has tried this, I've seenthe>> zero phase window used in a couple of places, but I've not seen anyoneelse>> use the "frequency clustering" property. > >I don't understand much of what you do here. As far as I can tell, >doing a time shift in the DFT as you do here, would only affect the >phase term of the spectrum, not the magnitudes. So basically, I don't >see any reason why anything interesting should happen here. > >If, on the other hand, you have been examining the real parts of the >spectrum, I would not be surprised if you see some effects. > >Could you provide some more details about what you do, and how? > >Rune >