DSPRelated.com
Forums

"zero phase" FFT windows

Started by FA September 11, 2005
Hi All,
Has anyone else played around with "zero phase" windows for STFT?

The idea here is that instead of the "conventional" way of applying the
window where the zero'th window coefficient is applied to the zero'th input
sample in the FFT buffer etc. and all of the zero padding (if any) is
applied at the end, with a zero phase window you apply the centre
coefficient of the window to the zero'th input sample and circular index the
window applying any zero padding in the centre of the FFT buffer.

The reason that I ask is because I've noticed a few interesting side effects
of this approach - the most interesting of which is that if you work out the
"real" frequency at each FFT bin by phase differentiation (pretty much using
the phase vocoder approach) what you will see is that there are "clusters"
of bins that all register the same "real" frequency.

This has two advantages:

The first is if you want to carry out peak picking to identify partials then
you will get much less frequency jitter than a conventional window in the
case where poor resolution causes subsequent frames to move up and down a
bin or two.

The second advantage is really cool - basically you can identify partials
that aren't actually FFT peaks - e.g. partials obscured by spectral
smearing. What I've done is to write a loop which looks for frequency
clusters and if it finds one it then looks for a frequency cluster around
the first and second harmonic - the reason for the harmonic test is that
this approach isn't completely infallible, however with a few hearistics
like the harmonic test it has managed to give significantly higher perceived
frequency resolution for low pitches when applied to my sinusoidal modelling
application than I would otherwise have been able to get with a given window
size - which has obvious time resolution benefits.

Anyway - I just wanted to see if anyone else has tried this, I've seen the
zero phase window used in a couple of places, but I've not seen anyone else
use the "frequency clustering" property.

Interested in comments.

TTFN
Fraser.


FA wrote:
> Hi All, > Has anyone else played around with "zero phase" windows for STFT? > > The idea here is that instead of the "conventional" way of applying the > window where the zero'th window coefficient is applied to the zero'th input > sample in the FFT buffer etc. and all of the zero padding (if any) is > applied at the end, with a zero phase window you apply the centre > coefficient of the window to the zero'th input sample and circular index the > window applying any zero padding in the centre of the FFT buffer.
OK... the center window coefficient of an M length window is applied to x[0]. You use the wrap-around effect to map the "leading edge" of the window to the end of the data buffer... so far so good. What part of the data window is mapped to the end? x[-M/2]..x[-1]?
> The reason that I ask is because I've noticed a few interesting side effects > of this approach - the most interesting of which is that if you work out the > "real" frequency at each FFT bin by phase differentiation (pretty much using > the phase vocoder approach) what you will see is that there are "clusters" > of bins that all register the same "real" frequency.
I don't understand. What do you mean by "real frequency"? How is this approach different from examining the magnitude of the spectrum?
> This has two advantages: > > The first is if you want to carry out peak picking to identify partials then > you will get much less frequency jitter than a conventional window in the > case where poor resolution causes subsequent frames to move up and down a > bin or two.
How do "conventional windows" shift the frequency? It is well known that windowing broadens the frequency peaks, but do they shift?
> The second advantage is really cool - basically you can identify partials > that aren't actually FFT peaks - e.g. partials obscured by spectral > smearing. What I've done is to write a loop which looks for frequency > clusters and if it finds one it then looks for a frequency cluster around > the first and second harmonic - the reason for the harmonic test is that > this approach isn't completely infallible, however with a few hearistics > like the harmonic test it has managed to give significantly higher perceived > frequency resolution for low pitches when applied to my sinusoidal modelling > application than I would otherwise have been able to get with a given window > size - which has obvious time resolution benefits.
I don't understand what you mean. What is a "partial"? What is a "frequency cluster"?
> Anyway - I just wanted to see if anyone else has tried this, I've seen the > zero phase window used in a couple of places, but I've not seen anyone else > use the "frequency clustering" property.
I don't understand much of what you do here. As far as I can tell, doing a time shift in the DFT as you do here, would only affect the phase term of the spectrum, not the magnitudes. So basically, I don't see any reason why anything interesting should happen here. If, on the other hand, you have been examining the real parts of the spectrum, I would not be surprised if you see some effects. Could you provide some more details about what you do, and how? Rune
FA wrote:

> Hi All, > Has anyone else played around with "zero phase" windows for STFT? > > The idea here is that instead of the "conventional" way of applying the > window where the zero'th window coefficient is applied to the zero'th input > sample in the FFT buffer etc. and all of the zero padding (if any) is > applied at the end, with a zero phase window you apply the centre > coefficient of the window to the zero'th input sample and circular index the > window applying any zero padding in the centre of the FFT buffer. > > The reason that I ask is because I've noticed a few interesting side effects > of this approach - the most interesting of which is that if you work out the > "real" frequency at each FFT bin by phase differentiation (pretty much using > the phase vocoder approach) what you will see is that there are "clusters" > of bins that all register the same "real" frequency. > ...
I am a bit puzzled by this observation; I have found this clustering happens in all the phase vocoders I have studied (e.g. the CARL one written by Mark Dolson, which I have based all my own work on), and I have assuemd that it is a natural emergent feature from the delta-phase calcuation between frames (in pvoc, to determine amplitude+frequency for each bin). So I am intrigued that there could be a phase vocoder where this doesn't happen - seems like a contradiction in terms. None of the pvocs that I am aware of does zero-padding (maybe they shoudld!), but some do "double-windowing" (option in CARL pvoc, default in F.R.Foore pvoc). The clustering of bins around a peak has been described by several authors in the context of the goal of reducing phase vocoder transient smearing. A paper by Miller Puckette (of PD fame) on the "phase-locked vocoder" was followed by a paper by Dolson and Jean Laroche offering a more advanced method; both seem to exploit this clustering effect (though Dolson and Laroche describe their calculations in terms of phase tracking rather than frequency bunching; whether this is a meaninful distinction is another matter). So you may find it useful to study the Dolson/Laroche paper: "Improved Phase Vocoder TimeScale Modification of Audio", IEEE Transactions on AYdio and Speech processing, Vol 7:3, 1999. If you Google on that title you will find lots of other useful related material, as that paper has been very widely cited. A modern source for Dolson's phase vocoder is the Csound sources (look for the streaming pvoc opcodes), or my own updated versions at: http://dream.cs.bath.ac.uk/researchdev/pvocex/pvocex.html I would be interested to have your observations on this in the context of your "conventional" description - I thought CARL pvoc was already "conventional"! Note that the method Dolson and Laroche describe is patented on behalf of Creative Labs/Emu; it is employed for example in the "Audigy" range of soundcards for their time-scaling facilities. I would have to re-read all the literature on partial tracking to be sure that bin clustering per se has not been exploited, but in all the pvocs I know of it is such an obvious phenomenon that it is most unlikely it has escaped attention. The main problem as you indicate is that of tracking weak partials (which may be genuine even if not harmonically related to a suspected fundamental). So the research interest is no so much that it happens, but in how best/accurately to exploit the fact that it happens. Richard Dobson
Hi Richard,
As you rightly observe what I'm doing is pretty similar to a phase vocoder.
Basically
I'm applying the phase vocoder phase differentiation to calculate the "true"
frequency
at each bin. My app isn't a true phase vocoder in the sense that although I
have used the phase
differentiation that one does in a phase vocoder I am actually using McAulay
Quatieri Sinusoidal Modelling - that is to say I do peak detection and
pruning (using
a psychoacoustic model) but the frequency (in my magnitude/frequency/phase
parameterisation
of each partiall) has been obtained using the pvoc phase differentiation
approach.

With respect to the zero phase window, my observation was simply that using
this type of window instead of the more conventional window gives less
frequency jitter
when tracking between frames - I suspect that this is less of an issue in a
conventional
pvoc because all of the bins are considered whereas in the Sinusoidal
Modelling only
those bins corresponding to true sinusoidal partials are considered for
modification
and resynthesis.

Aplologies - my posting was intended to stimulate discussion, but you
observe that I had
probably combined two concepts - that of the zero phase window and that of
the frequency
clustering effect - probably just confused matters that :-(

I'll take a look at some of the references you have mentioned - I'm
interested to see how
others have made use of this phenomenon - as you say it's such a noticeable
effect
when pvoc'ing.

It has certainly given noticeable improvements to my pitch shifting
application so I reckon
that there is at least some mileage in it.

Regards,
Frase.


Richard Dobson wrote in message ...
>FA wrote: > >> Hi All, >> Has anyone else played around with "zero phase" windows for STFT? >> >> The idea here is that instead of the "conventional" way of applying the >> window where the zero'th window coefficient is applied to the zero'th
input
>> sample in the FFT buffer etc. and all of the zero padding (if any) is >> applied at the end, with a zero phase window you apply the centre >> coefficient of the window to the zero'th input sample and circular index
the
>> window applying any zero padding in the centre of the FFT buffer. >> >> The reason that I ask is because I've noticed a few interesting side
effects
>> of this approach - the most interesting of which is that if you work out
the
>> "real" frequency at each FFT bin by phase differentiation (pretty much
using
>> the phase vocoder approach) what you will see is that there are
"clusters"
>> of bins that all register the same "real" frequency. >> ... > >I am a bit puzzled by this observation; I have found this clustering
happens in
>all the phase vocoders I have studied (e.g. the CARL one written by Mark
Dolson,
>which I have based all my own work on), and I have assuemd that it is a
natural
>emergent feature from the delta-phase calcuation between frames (in pvoc,
to
>determine amplitude+frequency for each bin). So I am intrigued that there
could
>be a phase vocoder where this doesn't happen - seems like a contradiction
in
>terms. None of the pvocs that I am aware of does zero-padding (maybe they >shoudld!), but some do "double-windowing" (option in CARL pvoc, default in >F.R.Foore pvoc). > >The clustering of bins around a peak has been described by several authors
in
>the context of the goal of reducing phase vocoder transient smearing. A
paper by
>Miller Puckette (of PD fame) on the "phase-locked vocoder" was followed by
a
>paper by Dolson and Jean Laroche offering a more advanced method; both seem
to
>exploit this clustering effect (though Dolson and Laroche describe their >calculations in terms of phase tracking rather than frequency bunching;
whether
>this is a meaninful distinction is another matter). > >So you may find it useful to study the Dolson/Laroche paper: > >"Improved Phase Vocoder TimeScale Modification of Audio", >IEEE Transactions on AYdio and Speech processing, Vol 7:3, 1999. > >If you Google on that title you will find lots of other useful related
material,
>as that paper has been very widely cited. > >A modern source for Dolson's phase vocoder is the Csound sources (look for
the
>streaming pvoc opcodes), or my own updated versions at: > >http://dream.cs.bath.ac.uk/researchdev/pvocex/pvocex.html > >I would be interested to have your observations on this in the context of
your
>"conventional" description - I thought CARL pvoc was already
"conventional"!
> >Note that the method Dolson and Laroche describe is patented on behalf of >Creative Labs/Emu; it is employed for example in the "Audigy" range of >soundcards for their time-scaling facilities. I would have to re-read all
the
>literature on partial tracking to be sure that bin clustering per se has
not
>been exploited, but in all the pvocs I know of it is such an obvious
phenomenon
>that it is most unlikely it has escaped attention. The main problem as you >indicate is that of tracking weak partials (which may be genuine even if
not
>harmonically related to a suspected fundamental). So the research interest
is no
>so much that it happens, but in how best/accurately to exploit the fact
that it
>happens. > >Richard Dobson > > > > >
Howdy Rune,
If you take a look at the thread Richard Dobson replied with he's on the
right track, when I refer to "true"
frequencies I mean the frequencies that have been obtained on each bin by
using the phase vocoder
phase differentiation approach.

The term "partial" is quite a common term from Sinusoidal Modelling and it
refers simply to spectral peaks
or more accurately it refers to real spectral peaks as opposed to sidelobe
peaks and the trick in
Sinusoidal Modelling is to extract the partials but not the sidelobes

The frequency clustering that I refer to is an artefact of the phase vocoder
algorithm and as I said in my
original post I've made use of that effect in my application to identify
likely partials that aren't actually
spectral peaks (when I said earlier that partials are spectral peaks what I
was trying to say was that
partials are perceptually significant sinusoidal components - impulses in
the frequency domain if you like)

Regards,
Frase.


Rune Allnor wrote in message
<1126463598.258802.289760@f14g2000cwb.googlegroups.com>...
> >FA wrote: >> Hi All, >> Has anyone else played around with "zero phase" windows for STFT? >> >> The idea here is that instead of the "conventional" way of applying the >> window where the zero'th window coefficient is applied to the zero'th
input
>> sample in the FFT buffer etc. and all of the zero padding (if any) is >> applied at the end, with a zero phase window you apply the centre >> coefficient of the window to the zero'th input sample and circular index
the
>> window applying any zero padding in the centre of the FFT buffer. > >OK... the center window coefficient of an M length window is applied >to x[0]. You use the wrap-around effect to map the "leading edge" of >the window to the end of the data buffer... so far so good. > >What part of the data window is mapped to the end? x[-M/2]..x[-1]? > >> The reason that I ask is because I've noticed a few interesting side
effects
>> of this approach - the most interesting of which is that if you work out
the
>> "real" frequency at each FFT bin by phase differentiation (pretty much
using
>> the phase vocoder approach) what you will see is that there are
"clusters"
>> of bins that all register the same "real" frequency. > >I don't understand. What do you mean by "real frequency"? How is >this approach different from examining the magnitude of the spectrum? > >> This has two advantages: >> >> The first is if you want to carry out peak picking to identify partials
then
>> you will get much less frequency jitter than a conventional window in the >> case where poor resolution causes subsequent frames to move up and down a >> bin or two. > >How do "conventional windows" shift the frequency? It is well known >that windowing broadens the frequency peaks, but do they shift? > >> The second advantage is really cool - basically you can identify partials >> that aren't actually FFT peaks - e.g. partials obscured by spectral >> smearing. What I've done is to write a loop which looks for frequency >> clusters and if it finds one it then looks for a frequency cluster around >> the first and second harmonic - the reason for the harmonic test is that >> this approach isn't completely infallible, however with a few hearistics >> like the harmonic test it has managed to give significantly higher
perceived
>> frequency resolution for low pitches when applied to my sinusoidal
modelling
>> application than I would otherwise have been able to get with a given
window
>> size - which has obvious time resolution benefits. > >I don't understand what you mean. What is a "partial"? What is a >"frequency cluster"? > >> Anyway - I just wanted to see if anyone else has tried this, I've seen
the
>> zero phase window used in a couple of places, but I've not seen anyone
else
>> use the "frequency clustering" property. > >I don't understand much of what you do here. As far as I can tell, >doing a time shift in the DFT as you do here, would only affect the >phase term of the spectrum, not the magnitudes. So basically, I don't >see any reason why anything interesting should happen here. > >If, on the other hand, you have been examining the real parts of the >spectrum, I would not be surprised if you see some effects. > >Could you provide some more details about what you do, and how? > >Rune >