DSPRelated.com
Forums

DC Blocking Filter with Noise Shaping

Started by Sinclair June 12, 2012
Hi All.

I wish to use such a filter, but am having difficulty understanding how the implementation I am looking at works. In essence, the following difference equation:

y[n] = (y[n-1] - (1-pole)*y[n-1] + x[n] - x[n-1] - e[n-1]) + e[n]

is implemented with 16-bit inputs and outputs, using an accumulator that contains both the new output y[n] and the new error e[n] at the end of each iteration. Here is the code in question:

short x[], y[];
long acc, A, prev_x, prev_y;
double pole;
unsigned long n, num_samples;
pole = 0.9999;
A = (long)(32768.0*(1.0 - pole));
acc = 0;
prev_x = 0;
prev_y = 0;
for (n=0; n<num_samples; n++){
    acc   -= prev_x;
    prev_x = (long)x[n]<<15;
    acc   += prev_x;
    acc   -= A*prev_y;
    prev_y = acc>>15;               // quantization happens here
    y[n]   = (short)prev_y;
    // acc has y[n] in upper 17 bits and -e[n] in lower 15 bits
}

Can anyone explain why 'A' has been computed in the manner shown? Why has (1-pole) been multiplied by 32768? As far as I can determine, 'A' needs to be stored in {16,15} format (word length, fraction length) in order for the code to work. The reason I say this is because of the size of the following operation in the difference equation:

(1-pole)*y[n-1] = {16,15}*{16,0}
                = {32,15}

That is, in order to end up with the top 17 bits holding y[n] and the bottom 15 bits holding e[n] as stated, (1-pole) needs to be {16,15}. If anyone can shed some light on whats going on I would appreciate it.
>Hi All. > >I wish to use such a filter, but am having difficulty understanding how
the=
> implementation I am looking at works. In essence, the following
difference=
> equation: > >y[n] =3D (y[n-1] - (1-pole)*y[n-1] + x[n] - x[n-1] - e[n-1]) + e[n] > >is implemented with 16-bit inputs and outputs, using an accumulator that
co=
>ntains both the new output y[n] and the new error e[n] at the end of each
i=
>teration. Here is the code in question: > >short x[], y[]; >long acc, A, prev_x, prev_y; >double pole; >unsigned long n, num_samples; >pole =3D 0.9999; >A =3D (long)(32768.0*(1.0 - pole)); >acc =3D 0; >prev_x =3D 0; >prev_y =3D 0; >for (n=3D0; n<num_samples; n++){ > acc -=3D prev_x; > prev_x =3D (long)x[n]<<15; > acc +=3D prev_x; > acc -=3D A*prev_y; > prev_y =3D acc>>15; // quantization happens here > y[n] =3D (short)prev_y; > // acc has y[n] in upper 17 bits and -e[n] in lower 15 bits >} > >Can anyone explain why 'A' has been computed in the manner shown? Why has
(=
>1-pole) been multiplied by 32768? As far as I can determine, 'A' needs to
b=
>e stored in {16,15} format (word length, fraction length) in order for the
=
>code to work.
Exactly! A must be Q15, that is the purpose of multiplying with 32768 (it is the same as left shifting 15 bits).
> The reason I say this is because of the size of the following= > operation in the difference equation: > >(1-pole)*y[n-1] =3D {16,15}*{16,0} > =3D {32,15} > >That is, in order to end up with the top 17 bits holding y[n] and the
botto=
>m 15 bits holding e[n] as stated, (1-pole) needs to be {16,15}. If anyone
c=
>an shed some light on whats going on I would appreciate it. >
Because y is signed you should write {15,0} in your notation. I suspect that the x and y variables are Q15. Your accumulator is then Q30. When saving the output the accumulator must be shifted right with 15 bits in order to get it into Q15. I don't think that the code implements the error part. After the statement prev_y = acc>>15; The error (e[n]) can be computed like this error = acc - (prev_y << 15); Also note that the output is not saturated. Cheers!
On 6/13/12 6:31 AM, niarn wrote:
> > I suspect that the x and y variables are Q15. Your accumulator is then > Q30. > When saving the output the accumulator must be shifted right with 15 bits > in order to get it into Q15. > > I don't think that the code implements the error part.
it doesn't need to. the error (or the negative of the error) is embedded into the lower 15 bits of acc and are carried over to the next sample (so you need not add back in -e[n-1] since it is already there.
> After the statement > prev_y = acc>>15; > The error (e[n]) can be computed like this > error = acc - (prev_y<< 15); > > Also note that the output is not saturated.
i suppose it should be, but the overflow might happen if the DC shifting (caused by the DC blocking action) causes some little spike to reach past the rails. essentially it is because y[n] is represented as *17* bits in the upper part of acc, and we have to force it into 16 bits to store it. so you're right. i hadn't worried too much about saturation in a DC blocking filter with no gain, but perhaps i shoulda. shoulda, woulda, coulda. short x[], y[]; long acc, A, prev_x, prev_y; // these are 32-bit values but only the upper part is used for prev_x and only the lower part is used for prev_y double pole; unsigned long n, num_samples; pole = 0.9999; A = (long)(32768.0*(1.0 - pole)); acc = 0; prev_x = 0; prev_y = 0; for (n=0; n<num_samples; n++) { // acc = 32768*y[n-1] - e[n-1] acc -= prev_x; // acc = 32768*y[n-1] - e[n-1] - 32768*x[n-1] prev_x = (long)x[n]<<15; // prev_x = 32768*x[n] (so now it's the current x, not previous) acc += prev_x; // acc = 32768*y[n-1] - e[n-1] - 32768*x[n-1] + 32768*x[n] // = 32768*( y[n-1] + x[n] - x[n-1] ) - e[n-1] acc -= A*prev_y; // acc = 32768*( y[n-1] + x[n] - x[n-1] ) - e[n-1] - A*y[n-1] // = 32768*( y[n-1]-A/32768*y[n-1] + x[n] - x[n-1] ) - e[n-1] // = 32768*( (1-A/32768)*y[n-1] + x[n] - x[n-1] ) - e[n-1] // = 32768*( pole*y[n-1] + x[n] - x[n-1] ) - e[n-1] prev_y = acc>>15; // quantization happens here y[n] = (short)prev_y; // y[n] = pole*y[n-1] + x[n]-x[n-1] + (e[n]-e[n-1])/32768 // e[n] is whatever has to be added to make the bottom 15 bits zero // acc has y[n] in upper 17 bits and -e[n] in lower 15 bits } -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
On 6/13/12 12:30 PM, robert bristow-johnson wrote:
...
> short x[], y[]; > long acc, A, prev_x, prev_y; > > // these are 32-bit values but only the upper part is used for prev_x > and only the lower part is used for prev_y > > double pole; > unsigned long n, num_samples; > pole = 0.9999; > A = (long)(32768.0*(1.0 - pole)); > acc = 0; > prev_x = 0; > prev_y = 0; > > > for (n=0; n<num_samples; n++) { > > // acc = 32768*y[n-1] - e[n-1] > > > acc -= prev_x; > > // acc = 32768*y[n-1] - e[n-1] - 32768*x[n-1] > > prev_x = (long)x[n]<<15; > > // prev_x = 32768*x[n] (so now it's the current x, not previous) > > acc += prev_x; > > // acc = 32768*y[n-1] - e[n-1] - 32768*x[n-1] + 32768*x[n] > // = 32768*( y[n-1] + x[n] - x[n-1] ) - e[n-1] > > acc -= A*prev_y; > > // acc = 32768*( y[n-1] + x[n] - x[n-1] ) - e[n-1] - A*y[n-1] > // = 32768*( y[n-1]-A/32768*y[n-1] + x[n] - x[n-1] ) - e[n-1] > // = 32768*( (1-A/32768)*y[n-1] + x[n] - x[n-1] ) - e[n-1] > // = 32768*( pole*y[n-1] + x[n] - x[n-1] ) - e[n-1] > > > prev_y = acc>>15; // quantization happens here > y[n] = (short)prev_y; > > // y[n] = pole*y[n-1] + x[n]-x[n-1] + (e[n]-e[n-1])/32768 > > // e[n] is whatever has to be added to make the bottom 15 bits zero > > > // acc has y[n] in upper 17 bits and -e[n] in lower 15 bits > }
Sinclair, here's some better code that accomplishes the same thing: /* y[n] = x[n] - Quantize{ w[n] } = x[n] - (w[n] + e[n]) w[n+1] = w[n] + (1-pole)*y[n] */ short x[], y[]; double pole = 0.9999; register long w, A, curr_y; register unsigned long n, num_samples; A = (long)(32768.0*(1.0 - pole)); w = 0; for (n=0; n<num_samples; n++) { curr_y = (long)x[n] - (w>>15); // quantization happens here w += A*curr_y; y[n] = (short)curr_y; } this was envisioned originally, as far as i know, by Tim Wescott. i think this is the mostest efficient way to block DC with an integer machine. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
>On 6/13/12 12:30 PM, robert bristow-johnson wrote: >... >> short x[], y[]; >> long acc, A, prev_x, prev_y; >> >> // these are 32-bit values but only the upper part is used for prev_x >> and only the lower part is used for prev_y >> >> double pole; >> unsigned long n, num_samples; >> pole = 0.9999; >> A = (long)(32768.0*(1.0 - pole)); >> acc = 0; >> prev_x = 0; >> prev_y = 0; >> >> >> for (n=0; n<num_samples; n++) { >> >> // acc = 32768*y[n-1] - e[n-1] >> >> >> acc -= prev_x; >> >> // acc = 32768*y[n-1] - e[n-1] - 32768*x[n-1] >> >> prev_x = (long)x[n]<<15; >> >> // prev_x = 32768*x[n] (so now it's the current x, not previous) >> >> acc += prev_x; >> >> // acc = 32768*y[n-1] - e[n-1] - 32768*x[n-1] + 32768*x[n] >> // = 32768*( y[n-1] + x[n] - x[n-1] ) - e[n-1] >> >> acc -= A*prev_y; >> >> // acc = 32768*( y[n-1] + x[n] - x[n-1] ) - e[n-1] - A*y[n-1] >> // = 32768*( y[n-1]-A/32768*y[n-1] + x[n] - x[n-1] ) - e[n-1] >> // = 32768*( (1-A/32768)*y[n-1] + x[n] - x[n-1] ) - e[n-1] >> // = 32768*( pole*y[n-1] + x[n] - x[n-1] ) - e[n-1] >> >> >> prev_y = acc>>15; // quantization happens here >> y[n] = (short)prev_y; >> >> // y[n] = pole*y[n-1] + x[n]-x[n-1] + (e[n]-e[n-1])/32768 >> >> // e[n] is whatever has to be added to make the bottom 15 bits zero >> >> >> // acc has y[n] in upper 17 bits and -e[n] in lower 15 bits >> } > >Sinclair, here's some better code that accomplishes the same thing: > >/* > > y[n] = x[n] - Quantize{ w[n] } > > = x[n] - (w[n] + e[n]) > > w[n+1] = w[n] + (1-pole)*y[n] >*/ > >short x[], y[]; >double pole = 0.9999; >register long w, A, curr_y; >register unsigned long n, num_samples; > >A = (long)(32768.0*(1.0 - pole)); >w = 0; > >for (n=0; n<num_samples; n++) > { > curr_y = (long)x[n] - (w>>15); // quantization happens here > w += A*curr_y; > y[n] = (short)curr_y; > } > > >this was envisioned originally, as far as i know, by Tim Wescott. i >think this is the mostest efficient way to block DC with an integer
machine.
> > >-- > >r b-j rbj@audioimagination.com > >"Imagination is more important than knowledge." >
Thanks for pointing out my error regarding the error part. Not only is 'your' solution efficient, it also has a flat noise floor. Is there a reference to where the solution was first 'envisioned'? Cheers!
On 6/14/12 3:02 PM, niarn wrote:

> > Thanks for pointing out my error regarding the error part.
wasn't an error. you just didn't see how it was included in the feedback without being explicitly calculated. but you're right about the saturation, but you need not test for it as long as the input remains more than 6 dB below fullscale. since the DC blocker has no gain, there should be no waveform (that is always less than FS/2) that can be DC-shifted and overflow.
> Not only is 'your' solution efficient, it also has a flat noise floor.
it *shouldn't* be flat. the roundoff error goes through a simple first-order "differentiator" with a zero right on z=1. the noise floor should be zero at DC and twice as big at Nyquist (compared to if no noise shaping was done). that's what the "noise shaping" is supposed to be.
> Is there a reference to where the solution was first 'envisioned'?
by me, at the dspguru site (and it was posted here just before Grant used it on the dspguru site). i dunno if anyone beat me to that idea or not. i s'pose some Bell Systems engineer thought of this in the mid 60s, but i dunno. Tim's method is *more* efficient *and* it is mathematically equivalent, as best as i can tell. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
On Fri, 15 Jun 2012 01:58:12 -0400, robert bristow-johnson wrote:

> On 6/14/12 3:02 PM, niarn wrote: > > >> Thanks for pointing out my error regarding the error part. > > wasn't an error. you just didn't see how it was included in the > feedback without being explicitly calculated. but you're right about > the saturation, but you need not test for it as long as the input > remains more than 6 dB below fullscale. since the DC blocker has no > gain, there should be no waveform (that is always less than FS/2) that > can be DC-shifted and overflow. > >> Not only is 'your' solution efficient, it also has a flat noise floor. > > it *shouldn't* be flat. the roundoff error goes through a simple > first-order "differentiator" with a zero right on z=1. the noise floor > should be zero at DC and twice as big at Nyquist (compared to if no > noise shaping was done). that's what the "noise shaping" is supposed to > be.
I believe that to be correct, but I'm too lazy to verify it. Certainly the notion of shoving any residual error into an accumulator where it -- well -- accumulates tends to result in a shaped noise floor.
>> Is there a reference to where the solution was first 'envisioned'?
I don't know how it came about _first_, but the first time I thought of it was in response to a post here; I mostly used the noise shaping part in low-pass filters, so I just revamped things slightly to make (high_pass = 1 - low_pass). My career does not depend on publishing in academic journals, so there's no "letters" section to cite.
> by me, at the dspguru site (and it was posted here just before Grant > used it on the dspguru site). i dunno if anyone beat me to that idea or > not. i s'pose some Bell Systems engineer thought of this in the mid > 60s, but i dunno.
If it's a good idea, and I was proud of thinking of it, then some smart engineer at Bell or where ever came up with it before 1960, and possibly before 1940.
> Tim's method is *more* efficient *and* it is mathematically equivalent, > as best as i can tell.
On 6/15/12 5:41 PM, Tim Wescott wrote:
> On Fri, 15 Jun 2012 01:58:12 -0400, robert bristow-johnson wrote: > >> On 6/14/12 3:02 PM, niarn wrote: >> >> >>> Thanks for pointing out my error regarding the error part. >> >> wasn't an error. you just didn't see how it was included in the >> feedback without being explicitly calculated. but you're right about >> the saturation, but you need not test for it as long as the input >> remains more than 6 dB below fullscale. since the DC blocker has no >> gain, there should be no waveform (that is always less than FS/2) that >> can be DC-shifted and overflow. >> >>> Not only is 'your' solution efficient, it also has a flat noise floor. >> >> it *shouldn't* be flat. the roundoff error goes through a simple >> first-order "differentiator" with a zero right on z=1. the noise floor >> should be zero at DC and twice as big at Nyquist (compared to if no >> noise shaping was done). that's what the "noise shaping" is supposed to >> be. > > I believe that to be correct, but I'm too lazy to verify it. Certainly > the notion of shoving any residual error into an accumulator where it -- > well -- accumulates tends to result in a shaped noise floor.
well the transfer function that comes from e[n] - e[n-1] should look pretty nonflat. by definition: Quantize{ w[n] } = w[n] + e[n] or e[n] = Quantize{ w[n] } - w[n] where e[n] is the quantization error. for "fraction saving" (a.k.a. 1st-order noise shaping where the quantization is always rounding down), e[n] is whatever you have to add to make those bits that are getting dropped go to zero. that means e[n] is negative or zero (i guess the word for that is "non-positive") because those bits represent a value that is unsigned and non-negative. the "let-the-bits-ride-in-the-accumulator" technique is something i thought of long ago for either 1st or 2nd-order filters. and it works very naturally for the "fraction saving" method where the quantization is very simple: mask off the bits below the quantization point. so, this 1st-order noise shaping says to take whatever quantization error that was *added* in the last sample (which is e[n-1]) and *subtract* it before quantizing the current output sample. but subtracting e[n-1] is the same as adding -e[n-1]. and since e[n-1] was whatever you had to add to make those dropped bits go to zero (last sample), then -e[n-1] *is* those bits that you dropped (zero-extended). so, for a IIR filter, let's say 1st-order LPF, you are taking your previous output, y[n-1] and feeding it back (scaled by a coefficient, in this case, the coefficient is the pole, p) y[n] = x[n] + p*y[n-1] and, just before quantizing, to do noise-shaping, you add in the negative of the quantization error from the previous sample: y[n] = Quantize{ x[n] + p*y[n-1] - e[n-1] } but remember, -e[n-1] is exactly whatever bits you dropped, but zero extended. what that extended-precision word was, immediately before quantization, both the bits that are dropped *and* the bits to the left that weren't dropped: y[n-1] - e[n-1] so, if i were to add in y[n-1] to -e[n-1], i would have precisely what the word was just before quantization. but if i add it one place, i need to subtract it somewhere else (the old add-and-subtract-the-same-quantity trick). so it's y[n] = Quantize{ x[n] + (p-1)*y[n-1] + (y[n-1]-e[n-1]) } or (i just like this form a little better): y[n] = Quantize{ x[n] - (1-p)*y[n-1] + (y[n-1]-e[n-1]) } but what's in the parenths, (y[n-1]-e[n-1]), is precisely what was left in the extended-precision accumulator before quantization in the previous sample. so why bother to clear the accumulator? and fetch the error state (-e[n-1]) and zero extend it? why bother with that at all? just start with that accumulator to begin with in the current sample. then we see that y[n] = Quantize{ x[n] - (1-p)*y[n-1] + (y[n-1]-e[n-1]) } = x[n] - (1-p)*y[n-1] + (y[n-1]-e[n-1]) + e[n] = x[n] + p*y[n-1] + (e[n] - e[n-1]) so you can see that if you apply the Z transform, you can see that e[n] goes through through a digital differentiator with noise transfer function that is 1 - z^-1 and the gain of that is zero at DC. so even though there is a DC bias to e[n] (it is always rounding down), that DC bias gets killed by the noise transfer function. you need not even adjust for that. this is very similar for 2nd-order IIR (biquads). the coefficient, a1, must be adjusted by 1 just as p was above. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
>it *shouldn't* be flat. the roundoff error goes through a simple >first-order "differentiator" with a zero right on z=1. the noise floor >should be zero at DC and twice as big at Nyquist (compared to if no >noise shaping was done). that's what the "noise shaping" is supposed to
be. True, right at DC and an excruciatingly narrow band around DC the noise floor goes to -inf. Apart from that it is flat as a pancake. I'm not sure I follow you. I believe that the error (neglecting the error from the multiplication) 'sees' the same transfer function as the input does.....
>> Is there a reference to where the solution was first 'envisioned'? > >by me, at the dspguru site (and it was posted here just before Grant >used it on the dspguru site). i dunno if anyone beat me to that idea or >not. i s'pose some Bell Systems engineer thought of this in the mid >60s, but i dunno. >
The reason I asked is that I once worked at a place where the exact same DC blocking algo was used. And that code went into production as early as 2001 and probably 'envisioned' in 2000. But unfortunately I don't know if the guy who was responsible for the code could have read your posts if they were in the public domain in 2000.