First, let me qualify this post by stating that I am not a DSP expert - most of my work is implementing the signal processing, not designing the signal processing. I just finished reading Mr. Yates' document "Practical Considerations in Fixed-Point FIR Filter Implementations" (http://www.digitalsignallabs.com/fir.pdf) and there's a few things I don't quite understand: 1) According to the document (accumulator and data bit-width aside), the specified formula for determining the scale factor with the least quantization error for converting from real coefficients to integer coefficients is: 2^(FLOOR(log2((2^(M=E2=88=921) =E2=88=92 1) / ABS(max coeff value)))) , wh= ere M =3D coefficient bit width Why force the factor to a power of 2? As I see it, removing this constraint will yield less quantization error while still avoiding overflow of the coefficient bit-width. In that case, the formula for the scale factor would just be ROUND((2^(M=E2=88=921) =E2=88=92 1) / ABS(ma= x coeff value)). 2) Even if you do remove the power of 2 constraint, then this still doesn't guarantee the least quantization error. Consider the following case: real coefficients =3D [0.4, 0.7, 0.8, 0.7, 0.4] integer coefficient width =3D 5 By using the formula given above (without constraining to a power of 2), I come up with a scale factor of 16.66667. If I apply this to my coefficients I get new integer coefficients of [7, 12, 13, 12, 7]. Without integer quantization, these coefficients would be [6.666, 11.666, 13.333, 11.666, 6.666]. The quantization error is obvious. However, it's clear that if I had chosen my scale factor to be 10, then I would end up with integer coefficients of [4, 7, 8, 7, 4] and there would be zero quantization error. So, this indicates that simply setting the scale factor so that the maximum coefficient uses the maximum range of the coefficient bit-width will not necessarily result in the least quantization error. Now my question is this: How do we determine the scale factor that will result in the least amount of quantization error? On a related note, I've seen that Matlab's tool for generating fixed point filters almost never generates coefficients that utilize the full range of the coefficient bit-width. Is the tool finding the set of coefficients with the minimal quantization error? If so, how is it doing that? Thanks for the help. - Travis
Scaling coefficients (real to integer)
Started by ●March 9, 2007
Reply by ●March 9, 20072007-03-09
Travis wrote:> First, let me qualify this post by stating that I am not a DSP expert > - most of my work is implementing the signal processing, not designing > the signal processing. I just finished reading Mr. Yates' document > "Practical Considerations in Fixed-Point FIR Filter > Implementations" (http://www.digitalsignallabs.com/fir.pdf) and > there's a few things I don't quite understand: > > 1) According to the document (accumulator and data bit-width aside), > the specified formula for determining the scale factor with the least > quantization error for converting from real coefficients to integer > coefficients is: > > 2^(FLOOR(log2((2^(M−1) − 1) / ABS(max coeff value)))) , where M = > coefficient bit width > > Why force the factor to a power of 2? As I see it, removing this > constraint will yield less quantization error while still avoiding > overflow of the coefficient bit-width. In that case, the formula for > the scale factor would just be ROUND((2^(M−1) − 1) / ABS(max coeff > value)).Integers are represented with binary digits -- bits. The maximum values that can be represented are powers of two. Making the factor less than 2^n, where n is the number of available bits, can only increase the average quantization error.> 2) Even if you do remove the power of 2 constraint, then this still > doesn't guarantee the least quantization error. Consider the > following case: > real coefficients = [0.4, 0.7, 0.8, 0.7, 0.4] > integer coefficient width = 5 > By using the formula given above (without constraining to a power of > 2), I come up with a scale factor of 16.66667. If I apply this to my > coefficients I get new integer coefficients of [7, 12, 13, 12, 7]. > Without integer quantization, these coefficients would be [6.666, > 11.666, 13.333, 11.666, 6.666]. The quantization error is obvious. > However, it's clear that if I had chosen my scale factor to be 10, > then I would end up with integer coefficients of [4, 7, 8, 7, 4] and > there would be zero quantization error. So, this indicates that > simply setting the scale factor so that the maximum coefficient uses > the maximum range of the coefficient bit-width will not necessarily > result in the least quantization error. Now my question is this: How > do we determine the scale factor that will result in the least amount > of quantization error?Go back and rehash the difference between integer and fixed point. The difference is sometimes smudged in discussing the technique, but fixed point is usually meant. Remember: the data needs to be scaled along with the coefficients, and there is no opportunity to tailor the scaling to numbers not yet measured.> On a related note, I've seen that Matlab's tool for generating fixed > point filters almost never generates coefficients that utilize the > full range of the coefficient bit-width. Is the tool finding the set > of coefficients with the minimal quantization error? If so, how is it > doing that?Matlab? What's Matlab? :-) Jerry -- Engineering is the art of making what you want from things you can get. ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
Reply by ●March 9, 20072007-03-09
Hi Travis, Let me respond below. "Travis" <travis@spencertechnologies.com> writes:> First, let me qualify this post by stating that I am not a DSP expert > - most of my work is implementing the signal processing, not designing > the signal processing. I just finished reading Mr. Yates' document > "Practical Considerations in Fixed-Point FIR Filter > Implementations" (http://www.digitalsignallabs.com/fir.pdf) and > there's a few things I don't quite understand: > > 1) According to the document (accumulator and data bit-width aside), > the specified formula for determining the scale factor with the least > quantization error for converting from real coefficients to integer > coefficients is: > > 2^(FLOOR(log2((2^(M$B!](B1) $B!](B 1) / ABS(max coeff value)))) , where M = > coefficient bit widthNo, that's not what I stated. I stated b = floor(log_2((2^{M-1)-1) / max(abs(b_i)))) There is no "2^" in the beginning, and abs(max(x)) ~= max(abs(x)).> Why force the factor to a power of 2? As I see it, removing this > constraint will yield less quantization error while still avoiding > overflow of the coefficient bit-width.I agree. In fact, I've had to do exactly that recently since I needed a really tough, long fixed-point FIR filter. The problem that accompanies this approach, which may be deemed acceptable in some situations, is that it is then impossible to establish the correct gain by simply shifting the filter result.> In that case, the formula for > the scale factor would just be ROUND((2^(M$B!](B1) $B!](B 1) / ABS(max coeff > value)).But if you're not going to constrain the scale factor to be exactly representable by a factor of two, then why constrain it to be integer? Assuming for the moment that you do want the scale factor to be an integer, then replace the round with floor and correct the divisor to be max(abs(coeff value)) and I would agree. If you use round, you could still overflow. Example: a one-tap FIR with b = 16384 and M = 16. In this example, your equation gives a scale of two, which results in 32768, and that is not representable in a 16-bit two's complement number.> 2) Even if you do remove the power of 2 constraint, then this still > doesn't guarantee the least quantization error. Consider the > following case: > real coefficients = [0.4, 0.7, 0.8, 0.7, 0.4] > integer coefficient width = 5 > By using the formula given above (without constraining to a power of > 2), I come up with a scale factor of 16.66667. If I apply this to my > coefficients I get new integer coefficients of [7, 12, 13, 12, 7]. > Without integer quantization, these coefficients would be [6.666, > 11.666, 13.333, 11.666, 6.666]. The quantization error is obvious. > However, it's clear that if I had chosen my scale factor to be 10, > then I would end up with integer coefficients of [4, 7, 8, 7, 4] and > there would be zero quantization error. So, this indicates that > simply setting the scale factor so that the maximum coefficient uses > the maximum range of the coefficient bit-width will not necessarily > result in the least quantization error. > > Now my question is this: How do we determine the scale factor that > will result in the least amount of quantization error?By performing a least-squares minimization. -- % Randy Yates % "She tells me that she likes me very much, %% Fuquay-Varina, NC % but when I try to touch, she makes it %%% 919-577-9882 % all too clear." %%%% <yates@ieee.org> % 'Yours Truly, 2095', *Time*, ELO http://home.earthlink.net/~yatescr
Reply by ●March 9, 20072007-03-09
Thanks for the quick responses, I have a couple clarifying questions embedded below. On Mar 9, 3:39 pm, Randy Yates <y...@ieee.org> wrote:> No, that's not what I stated. I stated > > b =3D floor(log_2((2^{M-1)-1) / max(abs(b_i)))) > > There is no "2^" in the beginning, and abs(max(x)) ~=3D max(abs(x)).Yes, I was just substituting 'b' back into "B_i =3D round(b_i =C2=B7 2^b)" and calling "2^b" the scale factor.> > Why force the factor to a power of 2? As I see it, removing this > > constraint will yield less quantization error while still avoiding > > overflow of the coefficient bit-width. > > I agree. In fact, I've had to do exactly that recently since I needed > a really tough, long fixed-point FIR filter. > > The problem that accompanies this approach, which may be deemed > acceptable in some situations, is that it is then impossible to > establish the correct gain by simply shifting the filter result. >So the outlined approach only imposes the 'power of 2' constraint because that leads to simplified gain correction - correct? If the simplified method of gain correction is not needed then a reduced quantization error is achieved by not constraining to a power of 2.> > In that case, the formula for > > the scale factor would just be ROUND((2^(M=EF=BC=8D1) =EF=BC=8D 1) / AB=S(max coeff> > value)). > > But if you're not going to constrain the scale factor to be exactly > representable by a factor of two, then why constrain it to be integer?To achieve minimal quantization error without consideration of gain, as stated above. Is that not a valid goal?> Assuming for the moment that you do want the scale factor to be an > integer, then replace the round with floor and correct the divisor to > be max(abs(coeff value)) and I would agree. > > If you use round, you could still overflow. Example: a one-tap FIR > with b =3D 16384 and M =3D 16. In this example, your equation gives a > scale of two, which results in 32768, and that is not representable in > a 16-bit two's complement number.Yeah - my bad.> > > > 2) Even if you do remove the power of 2 constraint, then this still > > doesn't guarantee the least quantization error. > > Now my question is this: How do we determine the scale factor that > > will result in the least amount of quantization error? > > By performing a least-squares minimization.That's what I feared. Anybody have tips or scripts for doing this?
Reply by ●March 9, 20072007-03-09
Thanks for your input. My responses are below. On Mar 9, 1:59 pm, Jerry Avins <j...@ieee.org> wrote:> Integers are represented with binary digits -- bits. The maximum values > that can be represented are powers of two. Making the factor less than > 2^n, where n is the number of available bits, can only increase the > average quantization error.Didn't the example I gave show this to not be true?> Go back and rehash the difference between integer and fixed point. The > difference is sometimes smudged in discussing the technique, but fixed > point is usually meant. Remember: the data needs to be scaled along with > the coefficients, and there is no opportunity to tailor the scaling to > numbers not yet measured.To quote the article: "There are generally two methods of operating on fixed-point data used today - integer and fractional. ... In this article we shall utilize the integer method because I find it simpler and I am more familiar with it." I take that to mean that the technique presented should apply to integers ;).
Reply by ●March 9, 20072007-03-09
Travis wrote:> Thanks for your input. My responses are below. > > On Mar 9, 1:59 pm, Jerry Avins <j...@ieee.org> wrote: >> Integers are represented with binary digits -- bits. The maximum values >> that can be represented are powers of two. Making the factor less than >> 2^n, where n is the number of available bits, can only increase the >> average quantization error. > > Didn't the example I gave show this to not be true?What happens when you also scale the samples? Come to think about it, maybe you don't have to. If you don't, there may be a loss of dynamic range.>> Go back and rehash the difference between integer and fixed point. The >> difference is sometimes smudged in discussing the technique, but fixed >> point is usually meant. Remember: the data needs to be scaled along with >> the coefficients, and there is no opportunity to tailor the scaling to >> numbers not yet measured. > > To quote the article: > "There are generally two methods of operating on fixed-point data used > today - integer and fractional. ... In this article we shall utilize > the integer method because I find it simpler and I am more familiar > with it." > > I take that to mean that the technique presented should apply to > integers ;).You're right. Maybe we've fallen into a rut through habit. Jerry -- Engineering is the art of making what you want from things you can get. ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
Reply by ●March 9, 20072007-03-09
"Travis" <travis@spencertechnologies.com> writes:> Thanks for the quick responses, I have a couple clarifying questions > embedded below. > > On Mar 9, 3:39 pm, Randy Yates <y...@ieee.org> wrote: >> No, that's not what I stated. I stated >> >> b = floor(log_2((2^{M-1)-1) / max(abs(b_i)))) >> >> There is no "2^" in the beginning, and abs(max(x)) ~= max(abs(x)). > > Yes, I was just substituting 'b' back into "B_i = round(b_i · 2^b)" > and calling "2^b" the scale factor. > >> > Why force the factor to a power of 2? As I see it, removing this >> > constraint will yield less quantization error while still avoiding >> > overflow of the coefficient bit-width. >> >> I agree. In fact, I've had to do exactly that recently since I needed >> a really tough, long fixed-point FIR filter. >> >> The problem that accompanies this approach, which may be deemed >> acceptable in some situations, is that it is then impossible to >> establish the correct gain by simply shifting the filter result. >> > > So the outlined approach only imposes the 'power of 2' constraint > because that leads to simplified gain correction - correct? If the > simplified method of gain correction is not needed then a reduced > quantization error is achieved by not constraining to a power of 2.Yes, I would say that summarizes the situation.>> > In that case, the formula for >> > the scale factor would just be ROUND((2^(M-1) - 1) / ABS(max coeff >> > value)). >> >> But if you're not going to constrain the scale factor to be exactly >> representable by a factor of two, then why constrain it to be integer? > > To achieve minimal quantization error without consideration of gain, > as stated above. Is that not a valid goal?The goal is valid. But if you don't care about the gain error, then it seems pretty clear that a scale factor that can take on any real value is going to yield less coefficient quantization error than one constrained to integer values. Travis, thanks for questioning my assumptions and making me think harder. :) -- % Randy Yates % "...the answer lies within your soul %% Fuquay-Varina, NC % 'cause no one knows which side %%% 919-577-9882 % the coin will fall." %%%% <yates@ieee.org> % 'Big Wheels', *Out of the Blue*, ELO http://home.earthlink.net/~yatescr
Reply by ●March 9, 20072007-03-09
On Mar 9, 1:59 pm, Jerry Avins <j...@ieee.org> wrote:> Travis wrote: > > First, let me qualify this post by stating that I am not a DSP expert > > - most of my work is implementing the signal processing, not designing > > the signal processing. I just finished reading Mr. Yates' document > > "Practical Considerations in Fixed-Point FIR Filter > > Implementations" (http://www.digitalsignallabs.com/fir.pdf) and > > there's a few things I don't quite understand: > > > 1) According to the document (accumulator and data bit-width aside), > > the specified formula for determining the scale factor with the least > > quantization error for converting from real coefficients to integer > > coefficients is: > > > 2^(FLOOR(log2((2^(M=E2=88=921) =E2=88=92 1) / ABS(max coeff value)))) ,=where M =3D> > coefficient bit width > > > Why force the factor to a power of 2? As I see it, removing this > > constraint will yield less quantization error while still avoiding > > overflow of the coefficient bit-width. In that case, the formula for > > the scale factor would just be ROUND((2^(M=E2=88=921) =E2=88=92 1) / AB=S(max coeff> > value)). > > Integers are represented with binary digits -- bits. The maximum values > that can be represented are powers of two. Making the factor less than > 2^n, where n is the number of available bits, can only increase the > average quantization error.This is true for arbitrary values, but for a fixed set of coefficients, some scale factors (including to the maximum) will result in a greater average rounding error than other scale factors. See the OP's example.> Remember: the data needs to be scaled along with > the coefficients, and there is no opportunity to tailor the scaling to > numbers not yet measured.This is a the key factor. If the error from scaling and quantizing a fixed set of coefficients can be lowered by more than the statistical error introduced by rescaling arbitrary data by the inverse, then you might be able to gain a net win by choosing an oddball scale factor. IMHO. YMMV. -- rhn A.T nicholson d.0.t C-o-M http://www.nicholson.com/rhn/dsp.html
Reply by ●March 9, 20072007-03-09
Ron N. wrote:> On Mar 9, 1:59 pm, Jerry Avins <j...@ieee.org> wrote:...>> Remember: the data needs to be scaled along with >> the coefficients, and there is no opportunity to tailor the scaling to >> numbers not yet measured. > > This is a the key factor. If the error from scaling and quantizing > a fixed set of coefficients can be lowered by more than the > statistical > error introduced by rescaling arbitrary data by the inverse, then you > might be able to gain a net win by choosing an oddball scale factor.Yes. Travis eloquently pointed that out. He's right. Jerry -- Engineering is the art of making what you want from things you can get. ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
Reply by ●March 10, 20072007-03-10
On Mar 9, 7:31 pm, "Ron N." <rhnlo...@yahoo.com> wrote:> This is a the key factor. If the error from scaling and quantizing > a fixed set of coefficients can be lowered by more than the > statistical > error introduced by rescaling arbitrary data by the inverse, then you > might be able to gain a net win by choosing an oddball scale factor.I believe that the situation Ron describes is usually true for me - perhaps it would help if I explain my world. There are two aspects of my world which make gain error a non-factor: 1) I am primarily interested in the frequency domain. Thus, my primary goal is to achieve the best signal-to-noise I can. To achieve this goal, I want the response of my integer based filter to match the response of the designed, real filter as close as possible. This means minimizing quantization error in the integer filter. 2) My world has two parts: the first part is integer based, where the signal is digitized and preliminary integer processing is performed. The second part of my world is floating-point, where the more extensive processing is performed. So, even if gain correction was a consideration for me, I could do it in the floating-point part of my world where I don't have to worry about integer quantization error anymore. Hopefully this helps explain why somebody might be interested in minimizing quantization error without consideration of gain. That said, I still am unsure how to convert a real filter to an integer filter while maintaining minimal quantization error. Randy mentioned Least Squares Minimization, which is what I suspected all along, but the only way I can think to accomplish this is by a brute force comparison of quantization error across all possible scale factors. Is there a more elegant way to determine the optimal scale factor without having to check every possible value?






