DSPRelated.com
Forums

Accuracy of Fixed-Point Frequency-Domain Filtering

Started by Randy Yates March 20, 2007
Hi Folks,

I believe I've read in the past that fixed-point FFT's can be
squirrelly in trying to get the scaling right to prevent overflow and
preserve precision on the output. This makes sense intuitively since a
simple sine wave at a bin center would create a whopping output at
that bin.

My question is this: given the propensity of fixed-point FFTs to be
sensitive to scaling, and that both forward and inverse FFTs need to
be performed to implement frequency-domain filtering, which can
potentially lead to even more quantization noise, would a fixed-point
frequency domain filter have a propensity to be noisier than its
equivalent direct convolution counterpart?

If so, what kind of noise would one expect?

Futher, wouldn't conversion of the time-domain coefficients to the
frequency domain also be a potential source of error? Especially for
very "tough" filters in which the quantization of the impulse response
causes significant degradation (e.g., in stop-band attenuation)?

Any insights/references/pointers/wild-guesses appreciated.
-- 
%  Randy Yates                  % "Though you ride on the wheels of tomorrow,
%% Fuquay-Varina, NC            %  you still wander the fields of your
%%% 919-577-9882                %  sorrow."
%%%% <yates@ieee.org>           % '21st Century Man', *Time*, ELO
http://home.earthlink.net/~yatescr
On Mar 19, 11:01 pm, Randy Yates <y...@ieee.org> wrote:

> > Any insights/references/pointers/wild-guesses appreciated.
i've got some ramblings.
> would a fixed-point > frequency domain filter have a propensity to be noisier than its > equivalent direct convolution counterpart?
long ago, ca 1990, i implemented overlap-add and overlap-save (a la O&S) in a 16-bit 68K (Mac) and they both sounded like canine feces. i did not do any noise shaping in the quantization, only naive rounding. i tried it with and without block-floating-point and with 16x16 to 32 bit signed multiplications (round and cast back to 16 bit), it still sounded bad. i don't even have a recollection that the block-floating-point sounded any better, but i think it should have. from an analysis POV, certainly the direct convolution where you accumulate your FIR terms into a double-wide accumulator before rounding (or better yet, noise-shaped quantization) will have less noise than OLA or OLS.
> Further, wouldn't conversion of the time-domain coefficients to the > frequency domain also be a potential source of error? Especially for > very "tough" filters in which the quantization of the impulse response > causes significant degradation (e.g., in stop-band attenuation)?
well, what bothers me about this most is that, fundamentally, for OLA or OLS to work, the frequency-domain coefficients (which are quantized to these fixed-point numbers) must be the N-point DFT of an FIR of length L, with N-L zeros padded at the end. that padding of N-L zeros is necessary to insure that there are N-L+1 "good" samples to save in OLS (there will be L-1 "bad" samples resulting each frame from the wrapping of the IR in the circular convolution) and in OLA (where there is L-1 padded zeros in each frame of data) that there is no bad wrapping or time-aliasing done in the circular convolution. okay, that's fine except the DFT of those idealized length L FIR is not exactly what you get when you round your coefficients. so the *actual* frequency response (defined by N points) is not that of the ideal length L FIR, but will correspond to an IR that will have some non-zero values in the N-L samples after the original end of the FIR. those IR samples should be small, but because they are non-zero, there will be some error resulting from the wrapping of the IR in circular convolution. error even in the "good" samples. another similar error that can crap things up is the quantization on the FFT "twiddle factors". i've seen some 68K asm code for FFT that made some of these twiddle factors into a pseudo-floating point. there was a "normalized" coefficient and a right shift amount that went with it. and cos(w) was represented as 1 - 2*(sin(w/2))^2 and the "1" incorporated into the butterfly structure. but, at 16 bits, you're still losing something for *every* pass of the FFT. i think, for audio use, fast convolution using block-floating-point FFT might sound okay with 24+ bits (the Lake DSP convolution thingie used a bunch of 56K processors but i dunno how clean it sounded). i might be more comfortable if it was 32 bit block-floating-point. for those who might ask, "block floating point" is fixed point but with a single exponent that applies to the whole frame of data. when the FFT is performed, in a particular pass, you can check that every butterfly result is less than 1/2 full scale in magnitude. if any butterfly exceeds that, you flag a sticky bit and then, in the following pass, the butterflies all have an addition factor of 1/2 applied to their outputs and you increment the common exponent by 1 (since the mantissas have all be reduced by 1/2). for every time the common exponent is increased, when you're all done and back in the time domain (or maybe in the final iFFT pass), you shift left by the exponent value (which then returns that exponent to zero). if you don't do block-floating-point, i think your fixed-point FFT can be expected to sound much worse. that's my spin on it. r b-j
On Mon, 19 Mar 2007 22:25:00 -0700, robert bristow-johnson wrote:

> i think, for audio use, fast convolution using block-floating-point > FFT might sound okay with 24+ bits (the Lake DSP convolution thingie > used a bunch of 56K processors but i dunno how clean it sounded). i > might be more comfortable if it was 32 bit block-floating-point.
I'm biassed, but I think that it sounded perfect :-) With careful scaling and rounding you can limit yourself to something like half a bit of lost SNR per FFT stage, particularly if you don't require the result to behave nicely when the input is full-scale sine waves. That works out to acceptable noise performace for 16-bit audio data and useful transform sizes with 24-bit words. You obviously start to lose useful SNR immediately if you're only using 16-bit words. Probably need deeper words now that audio data is usually deeper than 16 bits. The shape of the noise can be weird, though, as it can be smeared across the block and across frequency (as can saturation errors). Aside: block floating point is actually trickier in the frequency domain than it looks. Many DSPs have a "half carry" flag, which one might think would be just the ticket for knowing when to bump an exponent register and make a shift/no-shift decision. You have to remember, though, that complex arithmetic (as in FFT butterflies) can result in rotations that produce results sqrt(2) in real or imaginary parts even though both inputs are in 1,-1 in real/imaginary. So what you really need, for block-floating point FFT is a 1/(2sqrt(2)) flag, which I can't see happening.... Cheers, -- Andrew
On 20 Mar, 04:01, Randy Yates <y...@ieee.org> wrote:

> My question is this: given the propensity of fixed-point FFTs to be > sensitive to scaling, and that both forward and inverse FFTs need to > be performed to implement frequency-domain filtering, which can > potentially lead to even more quantization noise, would a fixed-point > frequency domain filter have a propensity to be noisier than its > equivalent direct convolution counterpart?
...
> Any insights/references/pointers/
Ouch, don't have any of those...
> wild-guesses appreciated.
Ah, it seems I'm qualified to come up with an opinion. My instinctive reaction is that the sheer number of FLOPs play a part here. You need on the order of N*log(N) flops to filter and N-length frame. So each sample in the output signal is affected by the noise accumulated through N*log(N) flops. Compare that to P+Q+2 flops for the direct form implementation of a filter with P zeros and Q poles, usually meaning that there are far fewer FLOPS involved per sample, in the direct form implementation. It makes sense to me that the rounding noise is less of a problem in the direct form convolution, as compared to the FFT/IFFT computation. Rune
Andrew Reilly wrote:
> On Mon, 19 Mar 2007 22:25:00 -0700, robert bristow-johnson wrote: > > i think, for audio use, fast convolution using block-floating-point > > FFT might sound okay with 24+ bits (the Lake DSP convolution thingie > > used a bunch of 56K processors but i dunno how clean it sounded). i > > might be more comfortable if it was 32 bit block-floating-point. > > I'm biassed, but I think that it sounded perfect :-) > > With careful scaling and rounding you can limit yourself to something like > half a bit of lost SNR per FFT stage, particularly if you don't require > the result to behave nicely when the input is full-scale sine waves.
What happens when a full scale sine wave is applied at the input? This does not seem unlikely for recorded music (or for distortion measurements, for example). BTW: At the 1996 AES in Copenhagen, I was talking to a guy with long black hair (IIRC) about the Lake Huron Convolution box at the Lake booth. You don't happen to know him, do you?
> That > works out to acceptable noise performace for 16-bit audio data and useful > transform sizes with 24-bit words. You obviously start to lose useful SNR > immediately if you're only using 16-bit words. Probably need deeper words > now that audio data is usually deeper than 16 bits. The shape of the > noise can be weird, though, as it can be smeared across the block and > across frequency (as can saturation errors). > > Aside: block floating point is actually trickier in the frequency domain > than it looks. Many DSPs have a "half carry" flag, which one might think > would be just the ticket for knowing when to bump an exponent register and > make a shift/no-shift decision. You have to remember, though, that > complex arithmetic (as in FFT butterflies) can result in rotations that > produce results sqrt(2) in real or imaginary parts even though both inputs > are in 1,-1 in real/imaginary. So what you really need, for > block-floating point FFT is a 1/(2sqrt(2)) flag, which I can't see > happening....
I don't envy the person who has to implement frequency domain filtering on fixed-point hardware. It's a pain even without having to worry about all the scaling issues :-). Regards, Andor
On Mar 20, 5:06 am, Andrew Reilly <andrew-newsp...@areilly.bpc-
users.org> wrote:
> On Mon, 19 Mar 2007 22:25:00 -0700, robert bristow-johnson wrote: > > i think, for audio use, fast convolution using block-floating-point > > FFT might sound okay with 24+ bits (the Lake DSP convolution thingie > > used a bunch of 56K processors but i dunno how clean it sounded). i > > might be more comfortable if it was 32 bit block-floating-point. > > I'm biassed, but I think that it sounded perfect :-)
well, you can thank Bill Gardner for that! :-\ (actually, i'm talking out of my anus, i don't know first-hand any of the particulars about that, but i have heard one side of it.)
> With careful scaling and rounding you can limit yourself to something like > half a bit of lost SNR per FFT stage, particularly if you don't require > the result to behave nicely when the input is full-scale sine waves. That > works out to acceptable noise performace for 16-bit audio data and useful > transform sizes with 24-bit words. You obviously start to lose useful SNR > immediately if you're only using 16-bit words. Probably need deeper words > now that audio data is usually deeper than 16 bits. The shape of the > noise can be weird, though, as it can be smeared across the block and > across frequency (as can saturation errors).
did the Lake DSP thingie do block floating point? if i recall, it was DSP56002 or later that that *did* have the sticky S bit.
> Aside: block floating point is actually trickier in the frequency domain > than it looks. Many DSPs have a "half carry" flag, which one might think > would be just the ticket for knowing when to bump an exponent register and > make a shift/no-shift decision. You have to remember, though, that > complex arithmetic (as in FFT butterflies) can result in rotations that > produce results sqrt(2) in real or imaginary parts even though both inputs > are in 1,-1 in real/imaginary. So what you really need, for > block-floating point FFT is a 1/(2sqrt(2)) flag, which I can't see > happening....
actually the value i came up with for the 56002 was 0.31066: http://groups.google.com/group/comp.dsp/msg/e4b9194c1e44a1ef?dmode=source actually, now that i look at the 5600x user manual, the sticky flag gets set when the result is greater than 1/4 F.S. (but trips up if the result is bigger than 3/4 F.S., which i think was a mistake of Mot). at one time i wondered about that, and later figured that out. it's not just the the sqrt(2) factor (from rotating the square 45 degrees) but also adding a term of the same range from the other side of the butterfly. and the result must not exceed 3/4 F.S. in the Mot 56002: 0.31066*(sqrt(2) + 1) = 0.75 so they set the threshold at 0.25, which is safe. but other than that, BFP should work pretty good, virtually as nicely as regular FP with the same number of mantissa bits. did the Lake DSP convolver use BFP? r b-j
"robert bristow-johnson" <rbj@audioimagination.com> writes:

> On Mar 20, 5:06 am, Andrew Reilly <andrew-newsp...@areilly.bpc- > users.org> wrote: >> On Mon, 19 Mar 2007 22:25:00 -0700, robert bristow-johnson wrote: >> > i think, for audio use, fast convolution using block-floating-point >> > FFT might sound okay with 24+ bits (the Lake DSP convolution thingie >> > used a bunch of 56K processors but i dunno how clean it sounded). i >> > might be more comfortable if it was 32 bit block-floating-point. >> >> I'm biassed, but I think that it sounded perfect :-) > > well, you can thank Bill Gardner for that! :-\ (actually, i'm > talking out of my anus, i don't know first-hand any of the particulars > about that, but i have heard one side of it.) > >> With careful scaling and rounding you can limit yourself to something like >> half a bit of lost SNR per FFT stage, particularly if you don't require >> the result to behave nicely when the input is full-scale sine waves. That >> works out to acceptable noise performace for 16-bit audio data and useful >> transform sizes with 24-bit words. You obviously start to lose useful SNR >> immediately if you're only using 16-bit words. Probably need deeper words >> now that audio data is usually deeper than 16 bits. The shape of the >> noise can be weird, though, as it can be smeared across the block and >> across frequency (as can saturation errors). > > did the Lake DSP thingie do block floating point? if i recall, it was > DSP56002 or later that that *did* have the sticky S bit. > >> Aside: block floating point is actually trickier in the frequency domain >> than it looks. Many DSPs have a "half carry" flag, which one might think >> would be just the ticket for knowing when to bump an exponent register and >> make a shift/no-shift decision. You have to remember, though, that >> complex arithmetic (as in FFT butterflies) can result in rotations that >> produce results sqrt(2) in real or imaginary parts even though both inputs >> are in 1,-1 in real/imaginary. So what you really need, for >> block-floating point FFT is a 1/(2sqrt(2)) flag, which I can't see >> happening.... > > > actually the value i came up with for the 56002 was 0.31066: > > http://groups.google.com/group/comp.dsp/msg/e4b9194c1e44a1ef?dmode=source > > actually, now that i look at the 5600x user manual, the sticky flag > gets set when the result is greater than 1/4 F.S. (but trips up if the > result is bigger than 3/4 F.S., which i think was a mistake of Mot). > at one time i wondered about that, and later figured that out. it's > not just the the sqrt(2) factor (from rotating the square 45 degrees) > but also adding a term of the same range from the other side of the > butterfly. and the result must not exceed 3/4 F.S. in the Mot 56002: > > 0.31066*(sqrt(2) + 1) = 0.75 > > so they set the threshold at 0.25, which is safe. > > but other than that, BFP should work pretty good, virtually as nicely > as regular FP with the same number of mantissa bits. did the Lake DSP > convolver use BFP? > > r b-j
So for 16-bit data/coefficients, you wouldn't use FP-FDF? -- % Randy Yates % "How's life on earth? %% Fuquay-Varina, NC % ... What is it worth?" %%% 919-577-9882 % 'Mission (A World Record)', %%%% <yates@ieee.org> % *A New World Record*, ELO http://home.earthlink.net/~yatescr
On Mar 20, 1:06 pm, Randy Yates <y...@ieee.org> wrote:
> > So for 16-bit data/coefficients, you wouldn't use FP-FDF?
i dunno what that is. what's "FDF"? r b-j
"robert bristow-johnson" <rbj@audioimagination.com> writes:

> On Mar 20, 1:06 pm, Randy Yates <y...@ieee.org> wrote: >> >> So for 16-bit data/coefficients, you wouldn't use FP-FDF? > > i dunno what that is. what's "FDF"?
Frequency Domain Filtering? -- % Randy Yates % "I met someone who looks alot like you, %% Fuquay-Varina, NC % she does the things you do, %%% 919-577-9882 % but she is an IBM." %%%% <yates@ieee.org> % 'Yours Truly, 2095', *Time*, ELO http://home.earthlink.net/~yatescr
On Mar 20, 1:50 pm, Randy Yates <y...@ieee.org> wrote:
> "robert bristow-johnson" <r...@audioimagination.com> writes: > > On Mar 20, 1:06 pm, Randy Yates <y...@ieee.org> wrote: > > >> So for 16-bit data/coefficients, you wouldn't use FP-FDF? > > > i dunno what that is. what's "FDF"? > > Frequency Domain Filtering?
oh. <blush> ;-) r b-j