I have a bank of 8th order digital bandpass filters with center frequencies between 100 Hz to 2.3 kHz at a sampling frequency of 8 kHz. Each of these is implemented by cascading 4 biquad filters in direct form I. Fixed point computation is used, which means each product term is rounded to a fixed word length before being accumulated to give the output value of the filter. Filter coefficients are also stored using the same word length. I experimented using truncation rounding and the usual rounding up/ down. The former incurs a maximum error of 1 LSB whereas the latter incurs at most 1/2 LSB error. Here comes the problem. When I analysed the magnitude spectrum of the filter which uses 1/2 LSB rounding (http://dhost.info/angzhiping/files/round.jpg), it has more spikes than the one which uses truncation rounding (http://dhost.info/ angzhiping/files/truncate.jpg). Although I only illustrate a word length of 18-bits, the spike locations are consistent and grows as the word length precision gradually decreases. I would like to know the origin of these spikes. Thank you!
Round off spectrum spikes in 1/2 LSB rounding when performing fixed point filtering
Started by ●March 15, 2011
Reply by ●March 16, 20112011-03-16
On Mar 15, 9:55�pm, Zhi Ping Ang <angzhip...@gmail.com> wrote:> I have a bank of 8th order digital bandpass filters with center > frequencies between 100 Hz to 2.3 kHz at a sampling frequency of 8 > kHz. Each of these is implemented by cascading 4 biquad filters in > direct form I.not a bad decision. the number of states per 8th-order filter is 10 (8+2), not 16 (2*8) as some might think. the output states of a biquad section can become the input states of the immediately following section.> Fixed point computation is used, which means each > product term is rounded to a fixed word length before being > accumulated to give the output value of the filter.now why would you do that? why not accumulate the product terms in a double-wide accumulator before quantization to a single-width word? when quantizing to a single-width word, there are simple and effective and reasonably cheap noise-shaping tricks that you can use. you might not even have to dither.> Filter coefficients are also stored using the same word length. > > I experimented using truncation rounding and the usual rounding up/ > down. The former incurs a maximum error of 1 LSB whereas the latter > incurs at most 1/2 LSB error. Here comes the problem. When I analysed > the magnitude spectrum of the filter which uses 1/2 LSB rounding > (http://dhost.info/angzhiping/files/round.jpg), it has more spikes > than the one which uses truncation rounding (http://dhost.info/ > angzhiping/files/truncate.jpg). Although I only illustrate a word > length of 18-bits, the spike locations are consistent and grows as the > word length precision gradually decreases. > > I would like to know the origin of these spikes. Thank you!what is your source signal? it might be that the round off is not random enough. can you illustrate with a simple snippet of C code, how you're doing this filter as Direct Form 1? r b-j
Reply by ●March 16, 20112011-03-16
On Mar 16, 2:55�am, Zhi Ping Ang <angzhip...@gmail.com> wrote:> I have a bank of 8th order digital bandpass filters with center > frequencies between 100 Hz to 2.3 kHz at a sampling frequency of 8 > kHz. Each of these is implemented by cascading 4 biquad filters in > direct form I. Fixed point computation is used, which means each > product term is rounded to a fixed word length before being > accumulated to give the output value of the filter. Filter > coefficients are also stored using the same word length. > > I experimented using truncation rounding and the usual rounding up/ > down. The former incurs a maximum error of 1 LSB whereas the latter > incurs at most 1/2 LSB error. Here comes the problem. When I analysed > the magnitude spectrum of the filter which uses 1/2 LSB rounding > (http://dhost.info/angzhiping/files/round.jpg), it has more spikes > than the one which uses truncation rounding (http://dhost.info/ > angzhiping/files/truncate.jpg). Although I only illustrate a word > length of 18-bits, the spike locations are consistent and grows as the > word length precision gradually decreases. > > I would like to know the origin of these spikes. Thank you!How did you analyze these filters? Cy spectrum analyzer? By computations? If the latter, could you please post the coefficients for the biquads? Rune
Reply by ●March 16, 20112011-03-16
Zhi Ping Ang wrote:> I have a bank of 8th order digital bandpass filters with center > frequencies between 100 Hz to 2.3 kHz at a sampling frequency of 8 > kHz. Each of these is implemented by cascading 4 biquad filters in > direct form I. Fixed point computation is used, which means each > product term is rounded to a fixed word length before being > accumulated to give the output value of the filter. Filter > coefficients are also stored using the same word length. > > I experimented using truncation rounding and the usual rounding up/ > down. The former incurs a maximum error of 1 LSB whereas the latter > incurs at most 1/2 LSB error. Here comes the problem. When I analysed > the magnitude spectrum of the filter which uses 1/2 LSB rounding > (http://dhost.info/angzhiping/files/round.jpg), it has more spikes > than the one which uses truncation rounding (http://dhost.info/ > angzhiping/files/truncate.jpg). Although I only illustrate a word > length of 18-bits, the spike locations are consistent and grows as the > word length precision gradually decreases. > > I would like to know the origin of these spikes. Thank you!Perhaps, you are observing the artifacts of the limit cycles in the filters. Using rounding instead of truncation increases the limit cycle behavior. Think of rounding/truncation as if it is a center notch step function; rounding has more "gain". Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com
Reply by ●March 16, 20112011-03-16
On 03/15/2011 09:55 PM, Zhi Ping Ang wrote:> I have a bank of 8th order digital bandpass filters with center > frequencies between 100 Hz to 2.3 kHz at a sampling frequency of 8 > kHz. Each of these is implemented by cascading 4 biquad filters in > direct form I. Fixed point computation is used, which means each > product term is rounded to a fixed word length before being > accumulated to give the output value of the filter. Filter > coefficients are also stored using the same word length. > > I experimented using truncation rounding and the usual rounding up/ > down. The former incurs a maximum error of 1 LSB whereas the latter > incurs at most 1/2 LSB error. Here comes the problem. When I analysed > the magnitude spectrum of the filter which uses 1/2 LSB rounding > (http://dhost.info/angzhiping/files/round.jpg), it has more spikes > than the one which uses truncation rounding (http://dhost.info/ > angzhiping/files/truncate.jpg). Although I only illustrate a word > length of 18-bits, the spike locations are consistent and grows as the > word length precision gradually decreases. > > I would like to know the origin of these spikes. Thank you!Hi Zhi, As Robert said, you shouldn't be rounding before accumulating. I suggest you fix that problem first, then take another look at the responses. -- Randy Yates Digital Signal Labs 919-577-9882 http://www.digitalsignallabs.com yates@digitalsignallabs.com
Reply by ●March 16, 20112011-03-16
On Mar 16, 11:38�am, robert bristow-johnson <r...@audioimagination.com> wrote:> On Mar 15, 9:55�pm, Zhi Ping Ang <angzhip...@gmail.com> wrote: > > > I have a bank of 8th order digital bandpass filters with center > > frequencies between 100 Hz to 2.3 kHz at a sampling frequency of 8 > > kHz. Each of these is implemented by cascading 4 biquad filters in > > direct form I. > > not a bad decision. �the number of states per 8th-order filter is 10 > (8+2), not 16 (2*8) as some might think. �the output states of a > biquad section can become the input states of the immediately > following section.Its more stable and accurate too. Since I'm representing the coefficients in fixed point, the coefficients that are associated with the higher order z terms will require more precision, otherwise the poles will deviate from their true positions. Therefore they are broken down into biquads.> > > Fixed point computation is used, which means each > > product term is rounded to a fixed word length before being > > accumulated to give the output value of the filter. > > now why would you do that? �why not accumulate the product terms in a > double-wide accumulator before quantization to a single-width word? > > when quantizing to a single-width word, there are simple and effective > and reasonably cheap noise-shaping tricks that you can use. �you might > not even have to dither.I'm looking for a low gate count implementation. Evaluating the product in double-width may cause more full adders to be instantiated as the carry-save tree required for single cycle multiplication will be larger. For the case of truncation, I am not sure how much full adders the RTL compiler is able to optimize away if I choose to discard least significant bits. For the case of rounding, the full precision has to be evaluated in order to determine the rounding behavior, hence your suggestion is applicable for this scenario.> > > Filter coefficients are also stored using the same word length. > > > I experimented using truncation rounding and the usual rounding up/ > > down. The former incurs a maximum error of 1 LSB whereas the latter > > incurs at most 1/2 LSB error. Here comes the problem. When I analysed > > the magnitude spectrum of the filter which uses 1/2 LSB rounding > > (http://dhost.info/angzhiping/files/round.jpg), it has more spikes > > than the one which uses truncation rounding (http://dhost.info/ > > angzhiping/files/truncate.jpg). Although I only illustrate a word > > length of 18-bits, the spike locations are consistent and grows as the > > word length precision gradually decreases. > > > I would like to know the origin of these spikes. Thank you! > > what is your source signal? �it might be that the round off is not > random enough.I passed an impulse input and took a 65536-point FFT of the impulse response to obtain the magnitude spectrum. I have yet to try with arbitrary input signals.> > can you illustrate with a simple snippet of C code, how you're doing > this filter as Direct Form 1? >The equation for a single stage is as follows: y[i] = B[0]*x[i] + B[1]*x[i-1] - A[1]*y[i-1] - A[2]*y[i-2]. mul() is fixed point multiplication and sum() is fixed point addition. Each function performs the specified rounding operation before returning the rounded value. acc and pdt are 64-bit integers in 2-complement. The upper 32- bit represents the integer, and the lower 32-bits represents the fractional. y_prev_2 = 0; /* y_prev_2 is an alias for y[i-2] */ y_prev_1 = 0; /* y_prev_1 is an alias for y[i-1] */ x_prev_1 = 0; /* x_prev_1 is an alias for x[i-1] */ for(i = 0; i < size; i++) /* size is the number of elements in x[i], y[i] */ { acc = mul(B[0], x[i]); pdt = mul(B[1], x_prev_1); acc = sum(acc, pdt); pdt = mul(A[1], y_prev_1); pdt = -pdt; acc = sum(acc, pdt); pdt = mul(A[2], y_prev_2); pdt = -pdt; acc = sum(acc, pdt); y[i] = acc; y_prev_2 = y_prev_1; y_prev_1 = acc; x_prev_1 = x[i]; }> r b-j
Reply by ●March 16, 20112011-03-16
What's more, if you are building with gates, you needn't go to full double precision. A few extra bits will prove to be all that are worthwhle. Jerry -- Engineering is the art of making what you want from things you can get.
Reply by ●March 17, 20112011-03-17
On Mar 16, 10:42�pm, Zhi Ping Ang <angzhip...@gmail.com> wrote:> On Mar 16, 11:38�am, robert bristow-johnson > > <r...@audioimagination.com> wrote: > > On Mar 15, 9:55�pm, Zhi Ping Ang <angzhip...@gmail.com> wrote: > > > > ... implemented by cascading 4 biquad filters in direct form I. > > > not a bad decision. ... > > Its more stable and accurate too. Since I'm representing the > coefficients in fixed point, the coefficients that are associated with > the higher order z terms will require more precision, otherwise the > poles will deviate from their true positions. Therefore they are > broken down into biquads.but it's a good decision, if you can make your accumulator wide enough, to implement those biquads as DF1 and not DF2. DF2 is sorta "poles before zeros" and DF1 is "zeros before poles". if the biquad has a decent degree of resonance, the subsection that is an all-pole filter will boost the signal a lot while the subsection that is all- zeros will attenuate the signal by around the same amount. if you put the poles first (as in DF2), the signal could go into saturation (in a fixed-point context) before the zeros have an opportunity to bring the signal back down (too late, damage done). in the DF1, you have the zeros subsection first. that will bring the signal down before the all-pole section brings it back up. now, in your case, then you have a counterpart problem that the DF2 has: the all-zero subsection brings the signal down closer to the noise floor and then the all-pole subsection brings the signal *and* the noise floor up. so the DF1 suffers problems due to the noise floor and the DF2 suffers due to the saturation ceiling. but if you use DF1 *and* a double-wide accumulator, there is no noise floor because you have done no quantization yet. that's why, for fixed-point, DF1 is better with filters that are highly resonant. but you should have *some* bits to the right of the eventual quantization point, even if it's not double width.> > > > Fixed point computation is used, which means each > > > product term is rounded to a fixed word length before being > > > accumulated to give the output value of the filter. > > > now why would you do that? �why not accumulate the product terms in a > > double-wide accumulator before quantization to a single-width word? > > > when quantizing to a single-width word, there are simple and effective > > and reasonably cheap noise-shaping tricks that you can use. �you might > > not even have to dither. > > I'm looking for a low gate count implementation. Evaluating the > product in double-width may cause more full adders to be instantiated > as the carry-save tree required for single cycle multiplication will > be larger.yes, it would. but i think it's worth it. at least consider maybe 8 extra bits to the right of the quantization point if you don't make use of a double-wide accumulator.> For the case of truncation, I am not sure how much full > adders the RTL compiler is able to optimize away if I choose to > discard least significant bits. For the case of rounding, the full > precision has to be evaluated in order to determine the rounding > behavior, hence your suggestion is applicable for this scenario.i would think your multiplier would require a helluva lot more gates than a full width adder.> > what is your source signal? �it might be that the round off is not > > random enough. > > I passed an impulse input and took a 65536-point FFT of the impulse > response to obtain the magnitude spectrum. I have yet to try with > arbitrary input signals. > > > > > can you illustrate with a simple snippet of C code, how you're doing > > this filter as Direct Form 1? > > The equation for a single stage is as follows: y[i] = B[0]*x[i] + > B[1]*x[i-1] - A[1]*y[i-1] - A[2]*y[i-2]. mul() is fixed point > multiplication and sum() is fixed point addition.mul() should return a double-wide result and sum() should add double- wide values. when you add them all up, then quantize once per biquad section.> Each function > performs the specified rounding operation before returning the rounded > value. acc and pdt are 64-bit integers in 2-complement. The upper 32- > bit represents the integer, and the lower 32-bits represents the > fractional. > > y_prev_2 = 0; /* y_prev_2 is an alias for y[i-2] */ > y_prev_1 = 0; /* y_prev_1 is an alias for y[i-1] */ > x_prev_1 = 0; /* x_prev_1 is an alias for x[i-1] */ > > for(i = 0; i < size; i++) /* size is the number of elements in x[i], > y[i] */ > { > � � acc = mul(B[0], x[i]); > � � pdt = mul(B[1], x_prev_1); > � � acc = sum(acc, pdt); > � � pdt = mul(A[1], y_prev_1); > � � pdt = -pdt; > � � acc = sum(acc, pdt); > � � pdt = mul(A[2], y_prev_2); > � � pdt = -pdt; > � � acc = sum(acc, pdt); > > � � y[i] = acc; > � � y_prev_2 = y_prev_1; > � � y_prev_1 = acc; > � � x_prev_1 = x[i]; > > }here's a trick we call "first-order noise shaping with a zero in the noise-to-output transfer function at DC or z=1." Randy Yates has a much simpler and concise name: "fraction saving". _____________________________________________________________ /* let's pretend that sizeof(long) = 2*sizeof(short) */ long mul(short, short); short getInput(void); void putOutput(short); short b0, b1, b2, a1, a2; short x0, x1=0, x2=0, y1=0, y2=0; long acc = 0; /* 2 bits to the left of the binary point */ int shift_count = 8*sizeof(short) - 2; /* * coefficients b0, b1, b2, a1, a2 are defined somehow here. * these coefficients are all scaled by pow(2,shift_count) . * we have defined a1 and a2 so that their signs are negative * in the transfer function. */ a1 -= (1<<shift_count); /* subtract "1" from a1 */ while (1) { x0 = getInput(); acc += mul(b0, x0); acc += mul(b1, x1); acc += mul(b2, x2); acc += mul(a1, y1); acc += mul(a2, y2); x2 = x1; x1 = x0; y2 = y1; y1 = (short)(acc >> shift_count); putOutput(y1); } _____________________________________________________________ now the accumulator is *not* cleared (it has y1 left in it, but we had subtracted 1 from a1 that causes y1 to be subtracted so it's the old "add and subtract the same quantity" trick). but what happened in the low-order word of the accumulator? the bits that were dropped by the truncation are added right back again. we saved the fraction and added that back in the accumulator for the following sample. that will give you infinite S/N at DC (it puts a zero at z=1 in the quantization-noise-to-output transfer function). quite effective and simple. the above is optimal only for single-section biquad filters. for your multi-section biquads, somehow you have to structure it so that y1 and y2 of the current section are used for x1 and x2 of the following section. r b-j
Reply by ●March 18, 20112011-03-18
This may or may not be applicable. What sort of rounding are you using in the case of a tie (i.e. remainder = +/- 0.5)? If a remainder of +0.5 is rounded up AND a remainder of -0.5 is rounded up, you get a DC bias -- this would show up as a spectral line. Unbiased rounding would round +0.5 up and -0.5 down, i.e. 12.5 rounds to 13 and -12.5 rounds to -13.
Reply by ●March 18, 20112011-03-18
On Mar 17, 11:54�pm, "Impoliticus" <swiston@n_o_s_p_a_m.uiuc.edu> wrote:> This may or may not be applicable. �What sort of rounding are you using in > the case of a tie (i.e. remainder = +/- 0.5)?it's always rounding down. not toward zero but toward -inf. it's just dropping the bits. but the bits that are dropped are zero-extended and added in the next sample.> �If a remainder of +0.5 is > rounded up AND a remainder of -0.5 is rounded up, you get a DC bias -- this > would show up as a spectral line. �Unbiased rounding would round +0.5 up > and -0.5 down, i.e. 12.5 rounds to 13 and -12.5 rounds to -13.that's rounding ties toward zero. it's not what "fraction saving" is. fraction saving is a form of "quantization noise shaping". it has increased the quantization error at the Nyquist frequency while decreasing quantization noise at DC. there's another example of its use here: http://www.dspguru.com/dsp/tricks/fixed-point-dc-blocking-filter-with-noise-shaping . (wouldn't that be a drag - if you make this DC blocking filter and it creates its own DC because of rounding-to-nearest?) r b-j






