DSPRelated.com
Forums

scaling fixed point fft

Started by Bob June 24, 2003
Hello,

I have constructed a 256 pt complex fft. My scaling is causing
problems as the outputs from each stage are divided by 4 to avoid
overflow. The input data and the twiddle factor coeffs are 16 bits
wide (Q15). My problem is that when the data arrives at the last two
butterfly stages of the FFT, it is non-existant, due to all the
scaling beforehand. All inputs to these stages are zero. Thus I get
nothing at the output.

How can I work around this?

Many thanks 

Bob
Short answer (other factors not considered): (note:FS=Full Scale)

What you are doing is called "unconditional block scaling".

Try scaling your input components (max of combined real and imaginary parts)
to be < 0.5 FS
If you have 8 stages for a 256-pt FFT change your stage output scaling to
divide by 2.
If you are doing integers multiplies be sure the extra "sign" bit is taken
into account (could be doing an unintentional divide by 2 if not taken into
account, if so could replace previous divide by 2).
Be sure any complex numbers that you are adding to complex products have the
same scaling.

Test on less than full scale input with a single complex frequency ( like
0.25FS, j*0.25FS, -0.25FS,-j*0.25FS, ...) without any further input scaling
to generally validate internal scaling and the rest of the algorithm (this
is not a rigorous test but can point out the presence of problems). When you
evaluate the results, include all scaling so you can compare actual values
in "computed vs expected" rather than general output characteristics.

Test with input scaling in place and a variety of inputs.

Let me know if that helps.  There are some other things to be aware of, but
see if that gets you basically working.

Dirk

Dirk A. Bell
DSP Consultant

"Bob" <stenasc@yahoo.com> wrote in message
news:20540d3a.0306241505.6f28b72b@posting.google.com...
> Hello, > > I have constructed a 256 pt complex fft. My scaling is causing > problems as the outputs from each stage are divided by 4 to avoid > overflow. The input data and the twiddle factor coeffs are 16 bits > wide (Q15). My problem is that when the data arrives at the last two > butterfly stages of the FFT, it is non-existant, due to all the > scaling beforehand. All inputs to these stages are zero. Thus I get > nothing at the output. > > How can I work around this? > > Many thanks > > Bob
Bob wrote:

> Hello, > > I have constructed a 256 pt complex fft. My scaling is causing > problems as the outputs from each stage are divided by 4 to avoid > overflow. The input data and the twiddle factor coeffs are 16 bits > wide (Q15). My problem is that when the data arrives at the last two > butterfly stages of the FFT, it is non-existant, due to all the > scaling beforehand. All inputs to these stages are zero. Thus I get > nothing at the output. > > How can I work around this?
For radix-2 algorithms scaling by a factor of 1/4 on each pass seems a little extreme. Even scaling by a factor of 1/2 on each pass may be too much. Perhaps you're using a radix-4 algorithm? (For radix 2) to preserve the rms signal amplitude you should scale by a factor of 1/sqrt(2) on each pass, ideally. But since that is usually awkward to implement most people would scale by 1/2 on every other pass. Also remember that although this scaling preserves rms amplitude across all bins, the effect of the FFT on sinusoids is to concentrate their energy into a single bin (worst case), so with this scaling the maximum magnitude can still grow by sqrt(256) (4 bits). So for a 256 point FFT using 16 bit fixed point arithmetic you only really have 12 useful bits of dynamic range in the input. If these 12 bits are the bottom 12 bits then I think scaling by 1/2 on every other pass is the right thing to do. If these 12 bits are the top 12 bits then I think scaling by 1/2 after every pass is the right thing to do. (In effect this will introduce an extra shift right of 4 bits). Regards -- Adrian Hey
In article bdb4a9$o20$1$8300dec7@news.demon.co.uk, Adrian Hey at
ahey@NoSpicedHam.iee.org wrote on 06/24/2003 23:06:

> Bob wrote: > >> Hello, >> >> I have constructed a 256 pt complex fft. My scaling is causing >> problems as the outputs from each stage are divided by 4 to avoid >> overflow. The input data and the twiddle factor coeffs are 16 bits >> wide (Q15). My problem is that when the data arrives at the last two >> butterfly stages of the FFT, it is non-existant, due to all the >> scaling beforehand. All inputs to these stages are zero. Thus I get >> nothing at the output. >> >> How can I work around this? > > For radix-2 algorithms scaling by a factor of 1/4 on each pass seems > a little extreme. Even scaling by a factor of 1/2 on each pass may be > too much.
it is possible that a pass of the FFT can scale a state up by a factor of sqrt(2)+1 which is a little more than 2. but that wouldn't happen very often, i think.
> Perhaps you're using a radix-4 algorithm? > > (For radix 2) to preserve the rms signal amplitude you should scale by > a factor of 1/sqrt(2) on each pass, ideally. But since that is usually > awkward to implement most people would scale by 1/2 on every other pass.
i did precisely that for an old DSP56001 FFT (no sticky S bit that you find in the 56002, 563xx and later chips) and it *did* clip at some frequencies when reasonably loud tones was the input. sometimes the clipping happened at an earlier pass and that caused some spurious looking spikes in the spectrum at other frequencies. but for broadbanded, that is a good way to do it unless you have the ability to do block-floating point (requiring the sticky S bit).
> Also remember that although this scaling preserves rms amplitude across > all bins, the effect of the FFT on sinusoids is to concentrate their > energy into a single bin (worst case), so with this scaling the maximum > magnitude can still grow by sqrt(256) (4 bits).
yup.
> So for a 256 point FFT using 16 bit fixed point arithmetic you only really > have 12 useful bits of dynamic range in the input. If these 12 bits are > the bottom 12 bits then I think scaling by 1/2 on every other pass is > the right thing to do.
probably is. it's nice to agree with you, Adrian. r b-j
"Adrian Hey" <ahey@NoSpicedHam.iee.org> wrote in message
news:bdb4a9$o20$1$8300dec7@news.demon.co.uk...
> Bob wrote: > > > Hello, > > > > I have constructed a 256 pt complex fft. My scaling is causing > > problems as the outputs from each stage are divided by 4 to avoid > > overflow. The input data and the twiddle factor coeffs are 16 bits > > wide (Q15). My problem is that when the data arrives at the last two > > butterfly stages of the FFT, it is non-existant, due to all the > > scaling beforehand. All inputs to these stages are zero. Thus I get > > nothing at the output. > > > > How can I work around this? > > For radix-2 algorithms scaling by a factor of 1/4 on each pass seems > a little extreme. Even scaling by a factor of 1/2 on each pass may be > too much. Perhaps you're using a radix-4 algorithm? > > (For radix 2) to preserve the rms signal amplitude you should scale by > a factor of 1/sqrt(2) on each pass, ideally. But since that is usually > awkward to implement most people would scale by 1/2 on every other pass.
Scaling by 1/sqrt(2) would allow overflow, which would require continual checking and saturation on an integer machine (a major complication) as well as distortion in the output. Scaling by 1/2 on every other pass would do the same. Not a good idea.
> > Also remember that although this scaling preserves rms amplitude across > all bins, the effect of the FFT on sinusoids is to concentrate their > energy into a single bin (worst case), so with this scaling the maximum > magnitude can still grow by sqrt(256) (4 bits). > > So for a 256 point FFT using 16 bit fixed point arithmetic you only really > have 12 useful bits of dynamic range in the input. If these 12 bits are > the bottom 12 bits then I think scaling by 1/2 on every other pass is > the right thing to do. If these 12 bits are the top 12 bits then I think > scaling by 1/2 after every pass is the right thing to do. (In effect > this will introduce an extra shift right of 4 bits).
Putting the signal in the bottom 12 bits is not a good idea because of excessive rounding error. Normally want the maximum modulus of the input to be <16 bits to avoid overflow with 1/2 scaling at each stage. Shifting 16 bit real and imag inputs right 1 bit prior to use would guarantee this for arbitrary inputs. A good reference is O&S "Digital Signal Processing" around page 453. Dirk Dirk A. Bell DSP Consultant
"robert bristow-johnson" <rbj@surfglobal.net> wrote in message
news:BB1E92AA.1A4E%rbj@surfglobal.net...
> In article bdb4a9$o20$1$8300dec7@news.demon.co.uk, Adrian Hey at > ahey@NoSpicedHam.iee.org wrote on 06/24/2003 23:06: >
<snipped>
> > it is possible that a pass of the FFT can scale a state up by a factor of > sqrt(2)+1 which is a little more than 2. but that wouldn't happen very > often, i think. >
<snipped> The max modulus gain in a radix-2 stage is 2. I am guessing where you are getting 1+sqrt(2) is that you are looking at a situation like: 1 + (1+ j*1) *(Cos(w)+j*Sin(w)) can fall onto the real axis (purely real) at length 1+sqrt(2), which is true, but the input modulus was as large as sqrt(2)=mag(1+1*j) so the modulus gain was not really 1+sqrt(2). Dirk Dirk A. Bell DSP Consultant
robert bristow-johnson wrote:
 
>> So for a 256 point FFT using 16 bit fixed point arithmetic you only >> really have 12 useful bits of dynamic range in the input. If these 12 >> bits are the bottom 12 bits then I think scaling by 1/2 on every other >> pass is the right thing to do. > > probably is. > > > it's nice to agree with you, Adrian.
Glad you think so. I'm not sure that I agree with either of us now. Thinking about this situation there seem to be other perfectly reasonable (perhaps better) approaches to making best use of the precision available, like doing the first 4 passes unscaled and the last 4 scaled by 1/2. I've never tried that though. Regards -- Adrian Hey
Reading your answers folks....thank you all very much.

Dirk

> Try scaling your input components (max of combined real and imaginary parts) > to be < 0.5 FS > If you have 8 stages for a 256-pt FFT change your stage output scaling to > divide by 2.
...This means dividing the real and imaginary inputs by 4....OK ? Is it possible to scale the twiddle factors by dividing their values also by as well. If I do this what will the effect be ? I would still plan to divide each output by 2 as well. Adrian/Robert It is a radix-2 implementation, but I wanted to make sure I avoided any possibilities of overflow. What do you think of scaling the Twiddle Factors ? Thank you Bob "Dirk Bell" <dirkman@erols.com> wrote in message news:<bdar56$mmp$1@bob.news.rcn.net>...
> Short answer (other factors not considered): (note:FS=Full Scale) > > What you are doing is called "unconditional block scaling". > > Try scaling your input components (max of combined real and imaginary parts) > to be < 0.5 FS > If you have 8 stages for a 256-pt FFT change your stage output scaling to > divide by 2. > If you are doing integers multiplies be sure the extra "sign" bit is taken > into account (could be doing an unintentional divide by 2 if not taken into > account, if so could replace previous divide by 2). > Be sure any complex numbers that you are adding to complex products have the > same scaling. > > Test on less than full scale input with a single complex frequency ( like > 0.25FS, j*0.25FS, -0.25FS,-j*0.25FS, ...) without any further input scaling > to generally validate internal scaling and the rest of the algorithm (this > is not a rigorous test but can point out the presence of problems). When you > evaluate the results, include all scaling so you can compare actual values > in "computed vs expected" rather than general output characteristics. > > Test with input scaling in place and a variety of inputs. > > Let me know if that helps. There are some other things to be aware of, but > see if that gets you basically working. > > Dirk > > Dirk A. Bell > DSP Consultant > > "Bob" <stenasc@yahoo.com> wrote in message > news:20540d3a.0306241505.6f28b72b@posting.google.com... > > Hello, > > > > I have constructed a 256 pt complex fft. My scaling is causing > > problems as the outputs from each stage are divided by 4 to avoid > > overflow. The input data and the twiddle factor coeffs are 16 bits > > wide (Q15). My problem is that when the data arrives at the last two > > butterfly stages of the FFT, it is non-existant, due to all the > > scaling beforehand. All inputs to these stages are zero. Thus I get > > nothing at the output. > > > > How can I work around this? > > > > Many thanks > > > > Bob
Dirk Bell wrote:

> Scaling by 1/sqrt(2) would allow overflow, which would require continual > checking and saturation on an integer machine (a major complication) as > well as distortion in the output. Scaling by 1/2 on every other pass > would do the same. Not a good idea.
Only if the 12 bit signal was in the top end of the 16 bits, not the bottom end (as I suggested).
> Putting the signal in the bottom 12 bits is not a good idea because of > excessive rounding error. Normally want the maximum modulus of the input > to be <16 bits to avoid overflow with 1/2 scaling at each stage. Shifting > 16 bit real and imag inputs right 1 bit prior to use would guarantee this > for arbitrary inputs.
Actually, I don't think rounding error is particularly significant. But if the shift rights (or multiplies by 1/2) can be achieved at acceptable cost I agree it's probably best to minimise rounding errors by doing as you suggest. (Use the top 12 bits and shift right on each pass). Now digging out my copy of OS to try and justify my assertion that rounding error isn't particularly significant using the shift on alternate passes technique, I realise that this isn't a case they analyse. They give an analysis for unscaled FFT such that overflow is avoided (effectively constraining our signal to the bottom 8 bits). They also give an analysis for the divide by 2 on each pass. But they don't cover the divide by 2 on alternate passes method. What a pity :-) Still, I have a hunch (half baked theory) that difference in quantisation noise between the two will only amount to 1.7 dB or so (< 1/3 bits worth). Regards -- Adrian Hey
Bob wrote:


> Adrian/Robert > It is a radix-2 implementation, but I wanted to make sure I avoided > any possibilities of overflow. What do you think of scaling the > Twiddle Factors ?
Not an easy solution, IMHO If T is the twiddle factor (|T|=1), DIT butterflies are like this.. X=A+(B.T) Y=A-(B.T) and DIF butterflies are like this.. X=(A+B) Y=(A-B).T (A and B are complex numbers from the previous pass) Scaling the twiddle factor alone will give you the wrong answer. If your going to do this (scaling the twiddle factors by 1/2 say), your DIT butterflies must become.. X=A.(1/2)+(B.(T/2)) Y=A.(1/2)-(B.(T/2)) Likewise for DIF. How difficult it is to do this efficiently probably depends on the processor you're using and how cleverly you use the registers & instruction set at your disposal. With most processors it's probably easier to post-scale the butterfly results (somehow). But you should take a look at application notes for your processor. FFT's are a common benchmark so it's likely that considerble effort has been made to win brownie points for the fastest FFT possible. Especially for the standard radix-2 algorithms. Regards -- Adrian Hey