# rephase my problem: how to reduce round-off error for hardware implementation?

Started by November 15, 2003
```Dear all,

I have asked question on DCT and quantization error and got some very good
answers... Now I have a somewhat redefined problem to attack and I hope
experts here can give me some thoughts:

In ASIC implementation(I am currently using VHDL) of DCT, how to efficiently
reduce round-off or finite word length error? For instance, a DCT transform
is Y=D*X*D', where D=

D =

0.3536    0.3536    0.3536    0.3536    0.3536    0.3536    0.3536
0.3536
0.4904    0.4157    0.2778
0.0975   -0.0975   -0.2778   -0.4157   -0.4904
0.4619    0.1913   -0.1913   -0.4619   -0.4619   -0.1913    0.1913
0.4619
0.4157   -0.0975   -0.4904   -0.2778    0.2778    0.4904
0.0975   -0.4157
0.3536   -0.3536   -0.3536    0.3536    0.3536   -0.3536   -0.3536
0.3536
0.2778   -0.4904    0.0975    0.4157   -0.4157   -0.0975
0.4904   -0.2778
0.1913   -0.4619    0.4619   -0.1913   -0.1913    0.4619   -0.4619
0.1913
0.0975   -0.2778    0.4157   -0.4904    0.4904   -0.4157
0.2778   -0.0975

In hardware implementation, I multiply D with 256(left shift 8 bits) and
round it off to

91    91    91    91    91    91    91    91
126   106    71    25   -25   -71  -106  -126
118    49   -49  -118  -118   -49    49   118
106   -25  -126   -71    71   126    25  -106
91   -91   -91    91    91   -91   -91    91
71  -126    25   106  -106   -25   126   -71
49  -118   118   -49   -49   118  -118    49
25   -71   106  -126   126  -106    71   -25

In final result, I divided the results by 65536 and get Y(i.e., discard the
lower 16 bits of my result to get Y)

But this leads to large round-off error. The perfect DCT(floating point in
Matlab) has a PSNR of

40.3490 dB

while the above hardware implementation gives PSNR of

39.4801 dB

So there are 1dB loss of quality.

I want to know is there any techniques to combat some a problem? Or the real
engineering solution in industry is just to let that error away?

Thanks a lot,

Walala

```
```Hi

> In ASIC implementation(I am currently using VHDL) of DCT, how to efficiently
> reduce round-off or finite word length error? For instance, a DCT transform
> is Y=D*X*D', where D=

> In hardware implementation, I multiply D with 256(left shift 8 bits) and
> round it off to
>
>     91    91    91    91    91    91    91    91
>    126   106    71    25   -25   -71  -106  -126
>    118    49   -49  -118  -118   -49    49   118
>    106   -25  -126   -71    71   126    25  -106
>     91   -91   -91    91    91   -91   -91    91
>     71  -126    25   106  -106   -25   126   -71
>     49  -118   118   -49   -49   118  -118    49
>     25   -71   106  -126   126  -106    71   -25
>
> In final result, I divided the results by 65536 and get Y(i.e., discard the
> lower 16 bits of my result to get Y)

Ah, the error is due to a fix-point implementation of the DCT. Sure,
several possibilities:

Try to estimate how large the coefficients can become. Try to enlarge
the number of fractional bits (i.e. try 12 instead of 8). Try to include
mathematical rounding in the shift-down step if this is feasible, i.e.
instead of a rightshift by 16 bits, first add 0x7fff, then rightshift.
The minimum number of bits required, and the addition of a rounding
constant require some experimentation, and it still remains open whether
these modifications are affordable (since I think you're playing with
hardware when using VHDL), but other than modifying the DCT process to
improve the result there's not much you can do. After the DCT step is
run, data has been lost due to the round-off, and that's gone then.

For suitable fix-point implementations, you might want to look into the
independet jpeg-group implementation. IIRC, it covers a nice integer
implementation.

> I want to know is there any techniques to combat some a problem? Or the real
> engineering solution in industry is just to let that error away?

Depends on what you want to do, and where you want to go. If you are the
hardware designer, and the design goal is to make the image quality as
good as possible, then yes. Otherwise, try to estimate how much the

Greetings,
Thomas

```
```"Thomas Richter" <thor@math.tu-berlin.de> wrote in message
news:3FB77F9F.1060906@math.tu-berlin.de...
> Hi
>
>
> > In ASIC implementation(I am currently using VHDL) of DCT, how to
efficiently
> > reduce round-off or finite word length error? For instance, a DCT
transform
> > is Y=D*X*D', where D=
>
> > In hardware implementation, I multiply D with 256(left shift 8 bits) and
> > round it off to
> >
> >     91    91    91    91    91    91    91    91
> >    126   106    71    25   -25   -71  -106  -126
> >    118    49   -49  -118  -118   -49    49   118
> >    106   -25  -126   -71    71   126    25  -106
> >     91   -91   -91    91    91   -91   -91    91
> >     71  -126    25   106  -106   -25   126   -71
> >     49  -118   118   -49   -49   118  -118    49
> >     25   -71   106  -126   126  -106    71   -25
> >
> > In final result, I divided the results by 65536 and get Y(i.e., discard
the
> > lower 16 bits of my result to get Y)
>
> Ah, the error is due to a fix-point implementation of the DCT. Sure,
> several possibilities:
>
> Try to estimate how large the coefficients can become. Try to enlarge
> the number of fractional bits (i.e. try 12 instead of 8). Try to include
>   mathematical rounding in the shift-down step if this is feasible, i.e.
> instead of a rightshift by 16 bits, first add 0x7fff, then rightshift.
> The minimum number of bits required, and the addition of a rounding
> constant require some experimentation, and it still remains open whether
>   these modifications are affordable (since I think you're playing with
> hardware when using VHDL), but other than modifying the DCT process to
> improve the result there's not much you can do. After the DCT step is
> run, data has been lost due to the round-off, and that's gone then.
>
> For suitable fix-point implementations, you might want to look into the
> independet jpeg-group implementation. IIRC, it covers a nice integer
> implementation.
>
> > I want to know is there any techniques to combat some a problem? Or the
real
> > engineering solution in industry is just to let that error away?
>
> Depends on what you want to do, and where you want to go. If you are the
> hardware designer, and the design goal is to make the image quality as
> good as possible, then yes. Otherwise, try to estimate how much the
>
> Greetings,
> Thomas
>

Hi, Thomas,

Thank you very much for your answer. I got a lot of information to digest
on...

But here is one more: is there any theoratical work to combat such rounding
off error? (i.e., not too engineering by tweaking some parameter settings?)

Thanks a lot, and have a good weekend,

-Walala

```
```walala wrote:

...

> But here is one more: is there any theoratical work to combat such rounding
> off error? (i.e., not too engineering by tweaking some parameter settings?)
>
> Thanks a lot, and have a good weekend,
>
> -Walala

Round-off error amounts to lost information. There may be ways to guess
what was discarded, but there is no way to get it back with certainty.
Round-off error is minimized by calculating with more pprecision, be
that decimal places or bits.

Jerry
--
Engineering is the art of making what you want from things you can get.


```
```In comp.compression walala <mizhael@yahoo.com> wrote:

> But here is one more: is there any theoratical work to combat such rounding
> off error? (i.e., not too engineering by tweaking some parameter settings?)

Sure, round-off is just another "slang" word for quantization. Thus, what you
find implemented here can be modelled by an ideal DCT, followed by a
quantization step. Once you can formulate this model, you can compute the
error, and you can compute the optimal rounding.

Going back to my previous post: In case your "samples" are i.i.d, and the
high-bitrate approximation holds, then the ideal quantizer is that with all
quantization buckets equal and reconstruction points in the middle. In your
model, this is equivalent to "mathematical rounding" (first add the half of
the interval size, then round). The error (MSE) is in the high-bitrate
approximation \Delta^2/12, where \Delta is the size of the quantization
bucket. As you can see: The smaller the bucket, the better the quality. Since
MSE is a square error, it is quadratic in the bucket size.

However, for practical applications, the high-bitrate approximation might not
hold, and you might need a better model. Typically, an i.i.d model with a
distribution that is centered around zero should hold, and in this case, a
reconstruction point shifted towards zero might be better. We've made good
success with a reconstruction point of 3/8th instead of 1/2.

Greetings,
Thomas

```
```>
> However, for practical applications, the high-bitrate approximation might
not
> hold, and you might need a better model. Typically, an i.i.d model with a
> distribution that is centered around zero should hold, and in this case, a
> reconstruction point shifted towards zero might be better. We've made good
> success with a reconstruction point of 3/8th instead of 1/2.
>

Dear Prof. Thomas,

scheme. Could you please point me to papers on that, and is there any
document detailing how you implement it and why you choose 3/8 particularly?

Thank you very mcuh,

-Walala

```
```Hi,

>> However, for practical applications, the high-bitrate approximation might
> not
>> hold, and you might need a better model. Typically, an i.i.d model with a
>> distribution that is centered around zero should hold, and in this case, a
>> reconstruction point shifted towards zero might be better. We've made good
>> success with a reconstruction point of 3/8th instead of 1/2.
>>

> Dear Prof. Thomas,

I'm only an assistent, pushing students around. (-;

> scheme. Could you please point me to papers on that, and is there any
> document detailing how you implement it and why you choose 3/8 particularly?

The very un-scientific method. "Trial and error". )-: What is known, however,
is that the statistics of wavelet high-passes is typically a "long tail
symmetric" function centered around the origin. This naturally asks for a
reconstruction point set off towards zero. There are even some model-
distributions that would allow you to compute *where exactly* the reconstruction
point would have to be. The reason why we choose 3/8th and not the precise
minimum was very simple: The 3/8th solution is quite easy to implement in
fix point. (*3, shift right three bits).

So long,
Thomas
```
```Jerry Avins wrote:

> Round-off error amounts to lost information. There may be ways to guess
> what was discarded, but there is no way to get it back with certainty.
> Round-off error is minimized by calculating with more pprecision, be
> that decimal places or bits.

Assuming that there was information there in the first place.

If the roundoff is much smaller than the random noise in the signal,
then there most likely wasn't any information there.  One exception that
I can think of is where the random noise can be averaged out and small
signals restored.

A discussion in another newsgroup on floating point multiplication is
related to this.  On some machines multiplying single precision floating
point numbers generates a double precision product.  Summing such
products can generate a more accurate result than summing the rounded
(or truncated) products.  Yet most programming languages expect single
precision products.

Dithering in A/D conversion could be considered a case where rounding
increases information.

-- glen

```
```> The very un-scientific method. "Trial and error". )-: What is known,
however,
> is that the statistics of wavelet high-passes is typically a "long tail
> symmetric" function centered around the origin. This naturally asks for a
> reconstruction point set off towards zero. There are even some model-
> distributions that would allow you to compute *where exactly* the
reconstruction
> point would have to be. The reason why we choose 3/8th and not the precise
> minimum was very simple: The 3/8th solution is quite easy to implement in
> fix point. (*3, shift right three bits).
>

Dear Prof. Thomas,

give me some more detailed explanations on how you do that?

For normal JPEG decoding, it is quite simple: just let's say we have
quantized and rounded coefficients Y, which is a 8x8 matrix...

We just need to element-wise multiply it with Q matrix, then take IDCT...

So X=IDCT2(Y.*Q)

That's all, very simple! But now with your 3/8th scheme, do we do

X=IDCT2((Y-3/8).*Q)   ???

I know this is come from Laplacian assumption of DCT coefficients... but I
just don't know how to do it in practice...

Thanks a lto,

-Walala

```
```I have no idea what you're talking about, but I noticed the typo in
the subject line and I *really* like it.

--th
```