Even when a filter's output is less than one, internal results can exceed it. That's the primary need for wide accumulators. Jerry -- Engineering is the art of making what you want from things you can get.
Round off spectrum spikes in 1/2 LSB rounding when performing fixed point filtering
Started by ●March 15, 2011
Reply by ●April 5, 20112011-04-05
Reply by ●April 5, 20112011-04-05
On Apr 5, 9:13�pm, Jerry Avins <j...@ieee.org> wrote:> Even when a filter's output is less than one, internal results can exceed it. That's the primary need for wide accumulators. > > Jerry > -- > Engineering is the art of making what you want from things you can get.From what I know, DF I has a property that any internal overflow will be wrapped back to the correct final value due to properties of 2- complement. Do correct me if I've mistaken.
Reply by ●April 6, 20112011-04-06
> On Apr 5, 9:13�pm, Jerry Avins <j...@ieee.org> wrote: > > > Even when a filter's output is less than one, internal results can exceed it. That's the primary need for wide accumulators.that's the primary use for guard bits on the left. but by "double wide accumulator" i meant the extra guard bits on the right which eliminate any quantization error from the multiply and multiply- accumulate instructions. On Apr 5, 8:46 pm, Zhi Ping Ang <angzhip...@gmail.com> wrote:> > From what I know, DF I has a property that any internal overflow will > be wrapped back to the correct final value due to properties of 2- > complement. Do correct me if I've mistaken.some of this needs correction. this is the fact from what you are referring: for twos-complement (doesn't matter if it's DF1 or DF2 or some other form), **if** you knew in advance that the final value would not overflow the bounds for your word and if there were intermediate sums (before the final sum) that *did* exceed the bounds, if the intermediate value *wraps* around rather than saturate, then that overflow does not hurt you. but you have to insure that there will be later additions that will bring the value back within bounds. otherwise wrap-around overflow is much worse than the clipping overflow that most DSPs do. but most DSPs have guard bits on the left so you can have the best of both worlds; if the end result *does* exceed the bounds, the DSP will know that and will (if instructed to) saturate or clip the value, which is distortion but *much* better than wrap-around distortion. and if the end result does not exceed the bounds, the DSP will know that and will return the correct unclipped value. now the problem with the DF2 and fixed-point is, *if* your biquad filter has highly resonant poles (that means poles that are getting rather close to the unit circle), then internally the signal will get boosted by many dB at frequencies in the neighborhood of the resonant frequency and then get cast back into single-width words to be used in the latter half of the biquad section where the feedforward coefficients (that are determined by the zeros and the overall gain) scale it. so the danger of the DF2 is that the signal is boosted and clipped by the poles before the zeros (that are likely very close to the poles to kinda "tame them down") can do anything about it. once there is clipping there is nothing that the zeros can do about the clipping, even though they will likely reduce the amplitude of the whole signal along with the clipped part. with the DF1, the effect of the zeros come before the poles, so the signal (or the part of it close to the resonant frequency) will get reduced in amplitude by the zeros before the poles boost it. now *if* you had to cast the intermediate word back to a single-word width, then the DF1 would have a corresponding flaw to the DF2, but instead of clipping, it would be the poles boosting the quantization noise added to the weakened (by the zeros) signal. but with a double-wide accumulator *and* using DF1, then there is no quantization of the intermediate signal coming from the feedforward subsection. so you avoid internal clipping (assuming that there is no overall clipping of the biquad filter) *and* you don't have the resonance-boosted quantization noise. also, there is only 1 quantization point per biquad section with DF1 and a double-wide accumulator whereas the DF2 would have 2 sources of quantization per biquad section. in addition (as i've illustrated with the little code example), you can perform "fraction saving" a.k.a. "first-order noise shaping with a zero at z=1" which will kill any error or limit-cycling at DC (and steers error noise and/or limit-cycling to the Nyquist frequency where we might be more tolerant of it). and this fraction saving is easy and cheap to do. r b-j
Reply by ●April 7, 20112011-04-07
On Apr 6, 11:06=A0am, robert bristow-johnson <r...@audioimagination.com> wrote:> > On Apr 5, 9:13=A0pm, Jerry Avins <j...@ieee.org> wrote: > > > > Even when a filter's output is less than one, internal results can ex=ceed it. That's the primary need for wide accumulators.> > that's the primary use for guard bits on the left. =A0but by "double > wide accumulator" i meant the extra guard bits on the right which > eliminate any quantization error from the multiply and multiply- > accumulate instructions. > > On Apr 5, 8:46 pm, Zhi Ping Ang <angzhip...@gmail.com> wrote: > > > > > From what I know, DF I has a property that any internal overflow will > > be wrapped back to the correct final value due to properties of 2- > > complement. Do correct me if I've mistaken. > > some of this needs correction. > > this is the fact from what you are referring: =A0for twos-complement > (doesn't matter if it's DF1 or DF2 or some other form), **if** you > knew in advance that the final value would not overflow the bounds for > your word and if there were intermediate sums (before the final sum) > that *did* exceed the bounds, if the intermediate value *wraps* around > rather than saturate, then that overflow does not hurt you. =A0but you > have to insure that there will be later additions that will bring the > value back within bounds. =A0otherwise wrap-around overflow is much > worse than the clipping overflow that most DSPs do.This has been guaranteed. I used the method of dividing the numerator coefficients by the absolute sum of the impulse response, so that the output y[n] is in the range +/- 1 given x[n] is in the range of +/- 1, but at the expense of my filter gain. The chances of a +/- 1 output happening is also very remote, as the input signs have to match with the signs of the impulse response.> > but most DSPs have guard bits on the left so you can have the best of > both worlds; if the end result *does* exceed the bounds, the DSP will > know that and will (if instructed to) saturate or clip the value, > which is distortion but *much* better than wrap-around distortion. > and if the end result does not exceed the bounds, the DSP will know > that and will return the correct unclipped value. > > now the problem with the DF2 and fixed-point is, *if* your biquad > filter has highly resonant poles (that means poles that are getting > rather close to the unit circle), then internally the signal will get > boosted by many dB at frequencies in the neighborhood of the resonant > frequency and then get cast back into single-width words to be used in > the latter half of the biquad section where the feedforward > coefficients (that are determined by the zeros and the overall gain) > scale it. =A0so the danger of the DF2 is that the signal is boosted and > clipped by the poles before the zeros (that are likely very close to > the poles to kinda "tame them down") can do anything about it. =A0once > there is clipping there is nothing that the zeros can do about the > clipping, even though they will likely reduce the amplitude of the > whole signal along with the clipped part. > > with the DF1, the effect of the zeros come before the poles, so the > signal (or the part of it close to the resonant frequency) will get > reduced in amplitude by the zeros before the poles boost it. =A0now *if* > you had to cast the intermediate word back to a single-word width, > then the DF1 would have a corresponding flaw to the DF2, but instead > of clipping, it would be the poles boosting the quantization noise > added to the weakened (by the zeros) signal.I do not get the explanation about the effect of poles and zeros coming before or after one another. Aren't we not multiplying inputs and previous outputs with filter coefficients? I do know that the coefficients affects the locations of the poles and zeros, but just can't make the logical link.> > but with a double-wide accumulator *and* using DF1, then there is no > quantization of the intermediate signal coming from the feedforward > subsection. =A0so you avoid internal clipping (assuming that there is no > overall clipping of the biquad filter) *and* you don't have the > resonance-boosted quantization noise. =A0also, there is only 1 > quantization point per biquad section with DF1 and a double-wide > accumulator whereas the DF2 would have 2 sources of quantization per > biquad section. > > in addition (as i've illustrated with the little code example), you > can perform "fraction saving" a.k.a. "first-order noise shaping with a > zero at z=3D1" which will kill any error or limit-cycling at DC (and > steers error noise and/or limit-cycling to the Nyquist frequency where > we might be more tolerant of it). =A0and this fraction saving is easy > and cheap to do. >I should say I'm pretty impressed with fraction saving. I've changed my filter implementation and now it exhibits a far stable magnitude spectra as compared to previous designs which does not use fraction saving. But the problem of spikes still appear in the spectra. Upon inspection of the impulse response, the output goes like this (an actual example of a filter with 16-bit fractional precision): {0, 3.05e-5, 3.05e05, 0, -3.05e-5 -3.05e-5, 0, 3.05e-5, 3.-5e-5...}. These spikes are much more pronounced for bandpass filters with center frequencies towards fs/4. I generated filter coefficients usign MATLAB. I suspect it is the way MATLAB double floating point values are converted into fixed point format. Are there any sort of guidelines about how floating points should be rounded to give fixed point filter coefficients?> r b-j






