# Re: DSP Trick: Fixed Point DC Blocking Filter with Noise-Shaping

Started by March 7, 2008
```r b-j posted a trick back in 1999 to remove the DC of a signal, which
I recently ran across in DSP guru.  It was basically a differentiator
followed by a leaky integrator with some tricks to remove the
quantization noise.  I was wanting to give it a try but I saw
something in the 56K assembly code which looked incorrect.  The
original post can be found at http://groups.google.com/group/comp.dsp/msg/9ee3ca0606fbcb27.

do      #CHUNK_SIZE,filterLoop
sub     -x0,a           x:(r1),x0       a,y0            ;
truncated state y[n-1] -> y0, a = y[n-1] - x[n-1]
add      x0,a           y0,x:(r1)+                      ;
store y[n-1] to output,       a = y[n-1] - x[n-1] + x[n]
mac
y1,y0,a                                        ;
a = y[n-1] - x[n-1] + x[n] + (pole-1)*y[n-1]
;
= (x[n]-x[n-1]) + pole*y[n-1] = y[n]
filterLoop                                                      ; now
x[n] -> x[n-1], y[n] -> y[n-1] implicitly

From what I know of 56K assembly, the "sub -x0,a" is the same as "add
x0,a", which doesn't fit the algorithm so far as I can tell.  I
realize it has been eight (!) years, but I figured someone might still
be hanging around who has used this seemingly neat trick before :-).

Thanks,
Darrell
```
```On Mar 7, 3:09 pm, Darrell <darrel...@gmail.com> wrote:
> r b-j posted a trick back in 1999 to remove the DC of a signal, which
> I recently ran across in DSP guru.  It was basically a differentiator
> followed by a leaky integrator with some tricks to remove the
> quantization noise.  I was wanting to give it a try but I saw
> something in the 56K assembly code which looked incorrect.  The
> original post can be found athttp://groups.google.com/group/comp.dsp/msg/9ee3ca0606fbcb27.
>
>         do      #CHUNK_SIZE,filterLoop
>         sub     -x0,a           x:(r1),x0       a,y0            ;
> truncated state y[n-1] -> y0, a = y[n-1] - x[n-1]
>         add      x0,a           y0,x:(r1)+                      ;
> store y[n-1] to output,       a = y[n-1] - x[n-1] + x[n]
>         mac
> y1,y0,a                                        ;
> a = y[n-1] - x[n-1] + x[n] + (pole-1)*y[n-1]
>                                                                 ;
> = (x[n]-x[n-1]) + pole*y[n-1] = y[n]
> filterLoop                                                      ; now
> x[n] -> x[n-1], y[n] -> y[n-1] implicitly
>
> From what I know of 56K assembly, the "sub -x0,a" is the same as "add
> x0,a", which doesn't fit the algorithm so far as I can tell.  I
> realize it has been eight (!) years, but I figured someone might still
> be hanging around who has used this seemingly neat trick before :-).

i think i made a "transcription" mistake in posting.  the original
file is:

(watch out for word-wrap on the end-line comments that google groups
wraps)

;    this DC blocking filter is a differentiator (that kills the DC)

;    followed by a "leaky" integrator.  the differentiator has a

;    zero placed right on DC (z = 1) and the integrator has its pole

;    very close to DC (z = pole = approx. 1).  since the integrator

;    can itself generate DC because of a limit cycle problem, first

;    order noise shaping is used (with a zero at DC) to eliminate

;    any DC output from the integrator in the case where the input

;    that goes to silence.  for pipelining and efficiency reasons,

;    the output is delayed one sample with this code.

;

;    it is coded in a way to process the samples in-place but that

;    is easily changed.

;

CHUNK_SIZE    equ    4

NUM_INPUTS    equ    8

sampleArray   equ    \$50

stateArray    equ    \$80

pole          equ    0.9999     ; 0 <= pole < 1

; pole = [1 - sin(w)]/cos(w)

;     where w = 2*pi*F_cutoff/Fs

; w = arccos[2*pole/(pole^2 + 1)]

org    p:\$40

move                #<sampleArray,r1

move                #<stateArray,r6

move    #<2,n6

move    #>(pole-1.0),y1

move    #>(pole+1.0)/2.0,x1         ; forward coefficient

do    #NUM_INPUTS,channelLoop

;   move                y:(r6)+,r1      ; get pointer to input/output
CHUNK

move                y:(r6)+,x0      ; get feedforward state

move                y:(r6)+,a       ; get feedback state (hi word)

move                y:(r6)-n6,a0    ; get feedback state (lo word)

do    #CHUNK_SIZE,filterLoop
; truncated state y[n-1] -> y0, a = y[n-1] - (pole+1)/2*x[n-1]

mac    -x1,x0,a    x:(r1),x0    a,y0
; store y[n-1] to output,  a = y[n-1] - (pole+1)/2*x[n-1] + (pole+1)/
2*x[n]

mac     x1,x0,a    y0,x:(r1)+
; a = y[n-1] - (pole+1)/2*x[n-1] + (pole+1)/2*x[n] + (pole-1)*y[n-1]

;   = (pole+1)/2*(x[n]-x[n-1]) + pole*y[n-1] = y[n]

mac     y1,y0,a

filterLoop                              ; now x[n] -> x[n-1], y[n] ->
y[n-1] implicitly

move                x0,y:(r6)+      ; save feedforward state

move                a,y:(r6)+       ; save feedback state (hi
word)

move                a0,y:(r6)+       ; save feedback state (lo
word)

channelLoop

anyway, there is a slight scaling done to the input because this DC
block filter used BLT transform to be designed.  since x1 is so close
to 1, a simpler one to look at would be (with comments removed):

CHUNK_SIZE    equ    4

NUM_INPUTS    equ    8

sampleArray   equ    \$50

stateArray    equ    \$80

pole          equ    0.9999     ; 0 <= pole < 1

; pole = [1 - sin(w)]/cos(w)

;     where w = 2*pi*F_cutoff/Fs

; w = arccos[2*pole/(pole^2 + 1)]

org    p:\$40
move                #<sampleArray,r1
move                #<stateArray,r6
move    #<2,n6
move    #>(pole-1.0),y1
do    #NUM_INPUTS,channelLoop
move                y:(r6)+,x0
move                y:(r6)+,a
move                y:(r6)-n6,a0
do    #CHUNK_SIZE,filterLoop
sub     x0,a    x:(r1),x0    a,y0
mac     y1,y0,a
filterLoop
move                x0,y:(r6)+
move                a,y:(r6)+
move                a0,y:(r6)+
channelLoop

i am not sure, but Tim Wescott made it one instruction less and i
redid (speculatively) the 56K code.  try this:

if you get that one (with 2 instructions in the inner-most loop) to
work on a 56K or derivative, please lettuce know.

L8r,

r b-j
```
```On Mar 7, 5:54&#2013266080;pm, robert bristow-johnson <r...@audioimagination.com>
wrote:
> if you get that one (with 2 instructions in the inner-most loop) to
> work on a 56K or derivative, please lettuce know.

Thanks a bunch!  Will definitely let the group know when I get it
going :-).

Darrell
```
```robert bristow-johnson wrote:

I've also used it, adapted for an FPGA of course.
```
```On Mar 10, 9:43 pm, Ray Andraka <r...@andraka.com> wrote:
> robert bristow-johnson wrote:
??
>
> I've also used it, adapted for an FPGA of course.

Tim's Quicker DC Blocker?  (or, if this precedes Tim's post, the
equivalent by another name?)

r b-j
```
```robert bristow-johnson wrote:
> On Mar 10, 9:43 pm, Ray Andraka <r...@andraka.com> wrote:
>
>>robert bristow-johnson wrote:
>
> ??
>
>>I've also used it, adapted for an FPGA of course.
>
>
> Tim's Quicker DC Blocker?  (or, if this precedes Tim's post, the
> equivalent by another name?)
>
> r b-j

No, the one you posted on DSP_guru:
http://www.dspguru.com/comp.dsp/tricks/alg/dc_block.htm
```
```On Mar 11, 4:05 pm, Ray Andraka <r...@andraka.com> wrote:
> robert bristow-johnson wrote:
> > On Mar 10, 9:43 pm, Ray Andraka <r...@andraka.com> wrote:
>
> >>robert bristow-johnson wrote:
>
> > ??
>
> >>I've also used it, adapted for an FPGA of course.
>
> > Tim's Quicker DC Blocker?  (or, if this precedes Tim's post, the
> > equivalent by another name?)
>
> > r b-j
>
> No, the one you posted on DSP_guru:http://www.dspguru.com/comp.dsp/tricks/alg/dc_block.htm

oh, that's too bad.  i haven't done anything for real with Tim's
Quicker DC Blocker and am still curious if some other guinea pig has
tried it out.

r b-j
```
```On Mar 7, 1:09 pm, Darrell <darrel...@gmail.com> wrote:
> r b-j posted a trick back in 1999 to remove the DC of a signal, which
> I recently ran across in DSP guru.  It was basically a differentiator
> followed by a leaky integrator with some tricks to remove the
> quantization noise.  I was wanting to give it a try but I saw
> something in the 56K assembly code which looked incorrect.  The
> original post can be found athttp://groups.google.com/group/comp.dsp/msg/9ee3ca0606fbcb27.
>
>         do      #CHUNK_SIZE,filterLoop
>         sub     -x0,a           x:(r1),x0       a,y0            ;
> truncated state y[n-1] -> y0, a = y[n-1] - x[n-1]
>         add      x0,a           y0,x:(r1)+                      ;
> store y[n-1] to output,       a = y[n-1] - x[n-1] + x[n]
>         mac
> y1,y0,a                                        ;
> a = y[n-1] - x[n-1] + x[n] + (pole-1)*y[n-1]
>                                                                 ;
> = (x[n]-x[n-1]) + pole*y[n-1] = y[n]
> filterLoop                                                      ; now
> x[n] -> x[n-1], y[n] -> y[n-1] implicitly

Interesting. For years now I've been using a somewhat different DC
blocking structure in my chip designs. Something like this:

- DC Estimate is subtracted from input signal
- subtracter output goes to perfect integrator where residual DC
accumulates
- Output of integrator is scaled by shifting to control loop time
constant & residual error
- Output of shifter is DC estimate which feeds back to subtracter

Any thoughts on the comparative advantages of r b-j's approach vs this
one?

EB
```
```On Mar 10, 9:43&#2013266080;pm, Ray Andraka <r...@andraka.com> wrote:
> I've also used it, adapted for an FPGA of course.

Just out of curiosity, did you essentially do what Steve did to
convert the multiply to a shift?  Other than that I really don't see
anything that isn't FPGA friendly.

I have all three algorithms working (in Matlab, using the fixed point
toolbox).  With the data sets I'm using, I see about a 12 dB
improvement in suppression using Robert's initial approach, and about
a 9-10 dB improvement using the Tim/Steve approach.  As someone
(Robert?) observed, Steve's version is the same as Tim's, converted to
feed-forward form and using a shift for the multiply.  I really like
it for a HW implementation, since you are using 1-R you get a great
selection of R values.  There is nothing stopping one from doing the
same thing with Robert's version :-).

I haven't had time to dig in too deeply on what is going on with the
estimate and why Robert's seems to work better.  Just as an FYI, the
data set I am using has a DC that is varying slowly (by my standards!)
over time.  The total error is about 16 dB above the noise floor.  I'm
also modifying the R value to go from a fast attack to steady state
value (on the order of 0.995 for the initial 50 ms down to 0.9999 by
500 ms).

Darrell
```
```On Mar 13, 8:44 am, Darrell <darrel...@gmail.com> wrote:
> On Mar 10, 9:43 pm, Ray Andraka <r...@andraka.com> wrote:
>
> > I've also used it, adapted for an FPGA of course.
>
> Just out of curiosity, did you essentially do what Steve did to
> convert the multiply to a shift?  Other than that I really don't see
> anything that isn't FPGA friendly.
>
> I have all three algorithms working (in Matlab, using the fixed point
> toolbox).  With the data sets I'm using, I see about a 12 dB
> improvement in suppression using Robert's initial approach, and about
> a 9-10 dB improvement using the Tim/Steve approach.  As someone
> (Robert?) observed, Steve's version is the same as Tim's, converted to
> feed-forward form and using a shift for the multiply.  I really like
> it for a HW implementation, since you are using 1-R you get a great
> selection of R values.  There is nothing stopping one from doing the
> same thing with Robert's version :-).
>
> I haven't had time to dig in too deeply on what is going on with the
> estimate and why Robert's seems to work better.  Just as an FYI, the
> data set I am using has a DC that is varying slowly (by my standards!)
> over time.  The total error is about 16 dB above the noise floor.  I'm
> also modifying the R value to go from a fast attack to steady state
> value (on the order of 0.995 for the initial 50 ms down to 0.9999 by
> 500 ms).
>
> Darrell

Here is my VHDL version of the dc blocker (used in an FPGA). Notice
that it uses only shifts to implement multiplication.

entity dc_block_if is
Port ( data_in : in  signed(13 downto 0);
clk : in  STD_LOGIC;
reset : in STD_LOGIC;
data_out : out  signed(13 downto 0));
end dc_block_if;

architecture arch1 of dc_block_if is

signal error : unsigned(6 downto 0);
signal data_in_dly : signed(13 downto 0);
signal dat_out : signed(13 downto 0);

constant MAX : signed(22 downto 0) := "00011111111111111111111";
constant MIN : signed(22 downto 0) := "11100000000000000000000";

begin

-- Single pole, high-pass filter

data_out <= dat_out;

dcBlockProc : process (clk, reset)
variable vDiff: signed(20 downto 0);
variable vProd: signed(20 downto 0);
variable vOutScaled: signed(22 downto 0);
begin
if (reset = '1') then
error <= "0000000";
data_in_dly <= "00000000000000";
dat_out <= "00000000000000";
elsif rising_edge (clk) then
data_in_dly <= data_in;
vDiff := (data_in - data_in_dly) & "0000000";  -- scale
data up by 128 because pole is scaled by 128

-- multiply by pole value of 127
vProd := dat_out & "0000000";               -- multiply by
128
vProd := vProd - dat_out;                   -- subtract
dat_out to produce a multiply of 127

vOutScaled := (vProd(20) & vProd(20) & vProd) + vDiff;
vOutScaled := vOutScaled + signed('0' & error);

-- saturate on overflow
if (vOutScaled > MAX) then
vOutScaled := MAX;
elsif (vOutScaled < MIN) then
vOutScaled := MIN;
else
vOutScaled := vOutScaled;
end if;

error <= unsigned(vOutScaled(6 downto 0));   -- keep fractional
part by taking remainder of integer division
dat_out <= vOutScaled(20 downto 7);          -- divide by
128
end if;
end process;

end arch1;

Darol Klawetter
```