# scaling fixed point fft

Started by June 24, 2003
```Hello,

I have constructed a 256 pt complex fft. My scaling is causing
problems as the outputs from each stage are divided by 4 to avoid
overflow. The input data and the twiddle factor coeffs are 16 bits
wide (Q15). My problem is that when the data arrives at the last two
butterfly stages of the FFT, it is non-existant, due to all the
scaling beforehand. All inputs to these stages are zero. Thus I get
nothing at the output.

How can I work around this?

Many thanks

Bob
```
```Short answer (other factors not considered): (note:FS=Full Scale)

What you are doing is called "unconditional block scaling".

Try scaling your input components (max of combined real and imaginary parts)
to be < 0.5 FS
If you have 8 stages for a 256-pt FFT change your stage output scaling to
divide by 2.
If you are doing integers multiplies be sure the extra "sign" bit is taken
into account (could be doing an unintentional divide by 2 if not taken into
account, if so could replace previous divide by 2).
Be sure any complex numbers that you are adding to complex products have the
same scaling.

Test on less than full scale input with a single complex frequency ( like
0.25FS, j*0.25FS, -0.25FS,-j*0.25FS, ...) without any further input scaling
to generally validate internal scaling and the rest of the algorithm (this
is not a rigorous test but can point out the presence of problems). When you
evaluate the results, include all scaling so you can compare actual values
in "computed vs expected" rather than general output characteristics.

Test with input scaling in place and a variety of inputs.

Let me know if that helps.  There are some other things to be aware of, but
see if that gets you basically working.

Dirk

Dirk A. Bell
DSP Consultant

"Bob" <stenasc@yahoo.com> wrote in message
> Hello,
>
> I have constructed a 256 pt complex fft. My scaling is causing
> problems as the outputs from each stage are divided by 4 to avoid
> overflow. The input data and the twiddle factor coeffs are 16 bits
> wide (Q15). My problem is that when the data arrives at the last two
> butterfly stages of the FFT, it is non-existant, due to all the
> scaling beforehand. All inputs to these stages are zero. Thus I get
> nothing at the output.
>
> How can I work around this?
>
> Many thanks
>
> Bob

```
```Bob wrote:

> Hello,
>
> I have constructed a 256 pt complex fft. My scaling is causing
> problems as the outputs from each stage are divided by 4 to avoid
> overflow. The input data and the twiddle factor coeffs are 16 bits
> wide (Q15). My problem is that when the data arrives at the last two
> butterfly stages of the FFT, it is non-existant, due to all the
> scaling beforehand. All inputs to these stages are zero. Thus I get
> nothing at the output.
>
> How can I work around this?

For radix-2 algorithms scaling by a factor of 1/4 on each pass seems
a little extreme. Even scaling by a factor of 1/2 on each pass may be
too much. Perhaps you're using a radix-4 algorithm?

(For radix 2) to preserve the rms signal amplitude you should scale by
a factor of 1/sqrt(2) on each pass, ideally. But since that is usually
awkward to implement most people would scale by 1/2 on every other pass.

Also remember that although this scaling preserves rms amplitude across
all bins, the effect of the FFT on sinusoids is to concentrate their
energy into a single bin (worst case), so with this scaling the maximum
magnitude can still grow by sqrt(256) (4 bits).

So for a 256 point FFT using 16 bit fixed point arithmetic you only really
have 12 useful bits of dynamic range in the input. If these 12 bits are
the bottom 12 bits then I think scaling by 1/2 on every other pass is
the right thing to do. If these 12 bits are the top 12 bits then I think
scaling by 1/2 after every pass is the right thing to do. (In effect
this will introduce an extra shift right of 4 bits).

Regards
--
```
```In article bdb4a9\$o20\$1\$8300dec7@news.demon.co.uk, Adrian Hey at
ahey@NoSpicedHam.iee.org wrote on 06/24/2003 23:06:

> Bob wrote:
>
>> Hello,
>>
>> I have constructed a 256 pt complex fft. My scaling is causing
>> problems as the outputs from each stage are divided by 4 to avoid
>> overflow. The input data and the twiddle factor coeffs are 16 bits
>> wide (Q15). My problem is that when the data arrives at the last two
>> butterfly stages of the FFT, it is non-existant, due to all the
>> scaling beforehand. All inputs to these stages are zero. Thus I get
>> nothing at the output.
>>
>> How can I work around this?
>
> For radix-2 algorithms scaling by a factor of 1/4 on each pass seems
> a little extreme. Even scaling by a factor of 1/2 on each pass may be
> too much.

it is possible that a pass of the FFT can scale a state up by a factor of
sqrt(2)+1 which is a little more than 2.  but that wouldn't happen very
often, i think.

> Perhaps you're using a radix-4 algorithm?
>
> (For radix 2) to preserve the rms signal amplitude you should scale by
> a factor of 1/sqrt(2) on each pass, ideally. But since that is usually
> awkward to implement most people would scale by 1/2 on every other pass.

i did precisely that for an old DSP56001 FFT (no sticky S bit that you find
in the 56002, 563xx and later chips) and it *did* clip at some frequencies
when reasonably loud tones was the input.  sometimes the clipping happened
at an earlier pass and that caused some spurious looking spikes in the
spectrum at other frequencies.  but for broadbanded, that is a good way to
do it unless you have the ability to do block-floating point (requiring the
sticky S bit).

> Also remember that although this scaling preserves rms amplitude across
> all bins, the effect of the FFT on sinusoids is to concentrate their
> energy into a single bin (worst case), so with this scaling the maximum
> magnitude can still grow by sqrt(256) (4 bits).

yup.

> So for a 256 point FFT using 16 bit fixed point arithmetic you only really
> have 12 useful bits of dynamic range in the input. If these 12 bits are
> the bottom 12 bits then I think scaling by 1/2 on every other pass is
> the right thing to do.

probably is.

it's nice to agree with you, Adrian.

r b-j

```
```"Adrian Hey" <ahey@NoSpicedHam.iee.org> wrote in message
news:bdb4a9\$o20\$1\$8300dec7@news.demon.co.uk...
> Bob wrote:
>
> > Hello,
> >
> > I have constructed a 256 pt complex fft. My scaling is causing
> > problems as the outputs from each stage are divided by 4 to avoid
> > overflow. The input data and the twiddle factor coeffs are 16 bits
> > wide (Q15). My problem is that when the data arrives at the last two
> > butterfly stages of the FFT, it is non-existant, due to all the
> > scaling beforehand. All inputs to these stages are zero. Thus I get
> > nothing at the output.
> >
> > How can I work around this?
>
> For radix-2 algorithms scaling by a factor of 1/4 on each pass seems
> a little extreme. Even scaling by a factor of 1/2 on each pass may be
> too much. Perhaps you're using a radix-4 algorithm?
>
> (For radix 2) to preserve the rms signal amplitude you should scale by
> a factor of 1/sqrt(2) on each pass, ideally. But since that is usually
> awkward to implement most people would scale by 1/2 on every other pass.

Scaling by 1/sqrt(2) would allow overflow, which would require continual
checking and saturation on an integer machine (a major complication) as well
as distortion in the output.  Scaling by 1/2 on every other pass would do
the same.  Not a good idea.

>
> Also remember that although this scaling preserves rms amplitude across
> all bins, the effect of the FFT on sinusoids is to concentrate their
> energy into a single bin (worst case), so with this scaling the maximum
> magnitude can still grow by sqrt(256) (4 bits).
>
> So for a 256 point FFT using 16 bit fixed point arithmetic you only really
> have 12 useful bits of dynamic range in the input. If these 12 bits are
> the bottom 12 bits then I think scaling by 1/2 on every other pass is
> the right thing to do. If these 12 bits are the top 12 bits then I think
> scaling by 1/2 after every pass is the right thing to do. (In effect
> this will introduce an extra shift right of 4 bits).

Putting the signal in the bottom 12 bits is not a good idea because of
excessive rounding error. Normally want the maximum modulus of the input to
be <16 bits to avoid overflow with 1/2 scaling at each stage. Shifting 16
bit real and imag inputs right 1 bit prior to use would guarantee this for
arbitrary inputs.

A good reference is O&S "Digital Signal Processing" around page 453.

Dirk

Dirk A. Bell
DSP Consultant

```
```"robert bristow-johnson" <rbj@surfglobal.net> wrote in message
news:BB1E92AA.1A4E%rbj@surfglobal.net...
> In article bdb4a9\$o20\$1\$8300dec7@news.demon.co.uk, Adrian Hey at
> ahey@NoSpicedHam.iee.org wrote on 06/24/2003 23:06:
>
<snipped>

>
> it is possible that a pass of the FFT can scale a state up by a factor of
> sqrt(2)+1 which is a little more than 2.  but that wouldn't happen very
> often, i think.
>

<snipped>

The max modulus gain in a radix-2 stage is 2.

I am guessing where you are getting 1+sqrt(2) is that you are looking at a
situation like:

1 + (1+ j*1) *(Cos(w)+j*Sin(w))  can fall onto the real axis (purely real)
at length 1+sqrt(2), which is true,

but the input modulus was as large as sqrt(2)=mag(1+1*j) so the modulus gain
was not really 1+sqrt(2).

Dirk

Dirk A. Bell
DSP Consultant

```
```robert bristow-johnson wrote:

>> So for a 256 point FFT using 16 bit fixed point arithmetic you only
>> really have 12 useful bits of dynamic range in the input. If these 12
>> bits are the bottom 12 bits then I think scaling by 1/2 on every other
>> pass is the right thing to do.
>
> probably is.
>
>
> it's nice to agree with you, Adrian.

Glad you think so. I'm not sure that I agree with either of us now.
reasonable (perhaps better) approaches to making best use of the
precision available, like doing the first 4 passes unscaled and the
last 4 scaled by 1/2. I've never tried that though.

Regards
--

```
```Reading your answers folks....thank you all very much.

Dirk

> Try scaling your input components (max of combined real and imaginary parts)
> to be < 0.5 FS
> If you have 8 stages for a 256-pt FFT change your stage output scaling to
> divide by 2.

...This means dividing the real and imaginary inputs by 4....OK ?

Is it possible to scale the twiddle factors by dividing their values
also by as well. If I do this what will the effect be ?

I would still plan to divide each output by 2 as well.

It is a radix-2 implementation, but I wanted to make sure I avoided
any possibilities of overflow. What do you think of scaling the
Twiddle Factors ?

Thank you

Bob

"Dirk Bell" <dirkman@erols.com> wrote in message news:<bdar56\$mmp\$1@bob.news.rcn.net>...
> Short answer (other factors not considered): (note:FS=Full Scale)
>
> What you are doing is called "unconditional block scaling".
>
> Try scaling your input components (max of combined real and imaginary parts)
> to be < 0.5 FS
> If you have 8 stages for a 256-pt FFT change your stage output scaling to
> divide by 2.
> If you are doing integers multiplies be sure the extra "sign" bit is taken
> into account (could be doing an unintentional divide by 2 if not taken into
> account, if so could replace previous divide by 2).
> Be sure any complex numbers that you are adding to complex products have the
> same scaling.
>
> Test on less than full scale input with a single complex frequency ( like
> 0.25FS, j*0.25FS, -0.25FS,-j*0.25FS, ...) without any further input scaling
> to generally validate internal scaling and the rest of the algorithm (this
> is not a rigorous test but can point out the presence of problems). When you
> evaluate the results, include all scaling so you can compare actual values
> in "computed vs expected" rather than general output characteristics.
>
> Test with input scaling in place and a variety of inputs.
>
> Let me know if that helps.  There are some other things to be aware of, but
> see if that gets you basically working.
>
> Dirk
>
> Dirk A. Bell
> DSP Consultant
>
> "Bob" <stenasc@yahoo.com> wrote in message
> > Hello,
> >
> > I have constructed a 256 pt complex fft. My scaling is causing
> > problems as the outputs from each stage are divided by 4 to avoid
> > overflow. The input data and the twiddle factor coeffs are 16 bits
> > wide (Q15). My problem is that when the data arrives at the last two
> > butterfly stages of the FFT, it is non-existant, due to all the
> > scaling beforehand. All inputs to these stages are zero. Thus I get
> > nothing at the output.
> >
> > How can I work around this?
> >
> > Many thanks
> >
> > Bob
```
```Dirk Bell wrote:

> Scaling by 1/sqrt(2) would allow overflow, which would require continual
> checking and saturation on an integer machine (a major complication) as
> well as distortion in the output.  Scaling by 1/2 on every other pass
> would do the same.  Not a good idea.

Only if the 12 bit signal was in the top end of the 16 bits, not the
bottom end (as I suggested).

> Putting the signal in the bottom 12 bits is not a good idea because of
> excessive rounding error. Normally want the maximum modulus of the input
> to be <16 bits to avoid overflow with 1/2 scaling at each stage. Shifting
> 16 bit real and imag inputs right 1 bit prior to use would guarantee this
> for arbitrary inputs.

Actually, I don't think rounding error is particularly significant.
But if the shift rights (or multiplies by 1/2) can be achieved at
acceptable cost I agree it's probably best to minimise rounding errors
by doing as you suggest. (Use the top 12 bits and shift right on each pass).

Now digging out my copy of OS to try and justify my assertion that rounding
error isn't particularly significant using the shift on alternate passes
technique, I realise that this isn't a case they analyse.

They give an analysis for unscaled FFT such that overflow is avoided
(effectively constraining our signal to the bottom 8 bits). They also
give an analysis for the divide by 2 on each pass. But they don't
cover the divide by 2 on alternate passes method. What a pity :-)

Still, I have a hunch (half baked theory) that difference in quantisation
noise between the two will only amount to 1.7 dB or so (< 1/3 bits worth).

Regards
--

```
```Bob wrote:

> It is a radix-2 implementation, but I wanted to make sure I avoided
> any possibilities of overflow. What do you think of scaling the
> Twiddle Factors ?

Not an easy solution, IMHO
If T is the twiddle factor (|T|=1), DIT butterflies are like this..
X=A+(B.T)
Y=A-(B.T)
and DIF butterflies are like this..
X=(A+B)
Y=(A-B).T
(A and B are complex numbers from the previous pass)

Scaling the twiddle factor alone will give you the wrong
by 1/2 say), your DIT butterflies must become..
X=A.(1/2)+(B.(T/2))
Y=A.(1/2)-(B.(T/2))
Likewise for DIF.

How difficult it is to do this efficiently probably depends on
the processor you're using and how cleverly you use the registers
& instruction set at your disposal. With most processors it's
probably easier to post-scale the butterfly results (somehow).

But you should take a look at application notes for your processor.
FFT's are a common benchmark so it's likely that considerble effort
has been made to win brownie points for the fastest FFT possible.
Especially for the standard radix-2 algorithms.

Regards
--