Reply by Al Clark June 21, 20122012-06-21
robert bristow-johnson <rbj@audioimagination.com> wrote in
news:jru7vc$33j$1@dont-email.me: 

> On 6/20/12 4:59 PM, Sinclair wrote: >> >> With regards to Robert's suggestion of placing a DC blocking filter >> immediately after the ADC, I have a question. If the ADC was 16-bit, >> which of the following approaches would you adopt, and why: >> >> a. Store the ADC data as uint16 and filter accordingly >> b. Store the ADC data as int32 and filter accordingly >> c. Store the ADC data as int16 and filter accordingly >> >> My feeling is that the answer is resource (RAM, CPU cycles) and >> application dependent > > nearly every single application of real-time DSP that i have ever done, > did the arithmetic in two-comp or in some form of floating point. i > would say that if the ADC is measuring a bipolar signal, even if it's a > unipolar number (like where 0x8000 is in the middle and might mean about > ground), then you should store the value as a signed int or long, but > not uint. if the actual number returned by the ADC is a uint, then it > *is* this "offset binary" format, which is the same as 2's comp but with > the MSB flipped. > > if, for whatever reason, the ADC is returning a value that is meant to > be unipolar, so 0x8000 is meant to be more positive than 0x7FFF, then i > guess you can store it as uint. but i doubt that you're doing that > because then a DC-blocking filter would not work with that. by > definition, the output of a DC-blocking filter is a signed, bipolar > value. > >> - ie, in the case of option (c), will 15-bits of resolution do? > > dunno how to answer that. > >> I am however interested in knowing what people with more experience >> think. >> >> Furthermore, the idea of combining the DC blocking filter with a >> lowpass filter appeals, but to be honest I lack the knowledge to do it. > > the DC blocking filter is just a first-order highpass with a very low > cutoff frequency. if the LPF is also a 1st-order IIR, then the result > is simply a 2nd-order bandpass filter. > >> I am currently using a 64-tap equiripple filter that I designed in >> MATLAB, which has been implemented in symmetric direct-form. If the DC >> blocker did not contain noise-shaping, I think I could combine the two >> transfer functions, but the noise shaping component throws me. >> > > well, then maybe your 64-tap FIR is more flat or some better response > than a 1st-order IIR. then i would still put the DC blocker first, > right onto the signal from the ADC. one thing is that you could > incorporate an HPF (the DC blocker) into your FIR, but it might make it > longer. in that case, the tap coefficients would have to add to zero > for it to block DC. > >> Is anyone able to recommend a decent text book or online resource that >> would point me in the right direction? > > about noise shaping and DC-blocking filters? no. i dunno any textbooks > that deal with that. that's sorta why i "published" the alg as a > dspguru trick. > >> If so I would appreciate it. > > ya know, that reminds me. Randy Yates and Rick Lyons dealt with the DC > blocking filter issues in the IEEE Sig Proc magazine. maybe someone > with the doc can send you a copy. i read it, but i cannot find it on my > computer. >
There is a brand new edition (2nd) of Streamlining Digital Signal Processing by Rick Lyons. It is full of tricks, many of which were contributed by members of this group, that were original published in the IEEE Sig Proc magazine. Randy's article is included. Rick Lyon's DSP primer: Understanding Digital Signal Processing (third edition) also discusses this topic in some detail. This would be an excellent reference for the original poster. Al Clark www.danvillesignal.com
Reply by robert bristow-johnson June 21, 20122012-06-21
On 6/20/12 4:59 PM, Sinclair wrote:
> > With regards to Robert's suggestion of placing a DC blocking filter immediately after the ADC, I have a question. If the ADC was 16-bit, which of the following approaches would you adopt, and why: > > a. Store the ADC data as uint16 and filter accordingly > b. Store the ADC data as int32 and filter accordingly > c. Store the ADC data as int16 and filter accordingly > > My feeling is that the answer is resource (RAM, CPU cycles) and application dependent
nearly every single application of real-time DSP that i have ever done, did the arithmetic in two-comp or in some form of floating point. i would say that if the ADC is measuring a bipolar signal, even if it's a unipolar number (like where 0x8000 is in the middle and might mean about ground), then you should store the value as a signed int or long, but not uint. if the actual number returned by the ADC is a uint, then it *is* this "offset binary" format, which is the same as 2's comp but with the MSB flipped. if, for whatever reason, the ADC is returning a value that is meant to be unipolar, so 0x8000 is meant to be more positive than 0x7FFF, then i guess you can store it as uint. but i doubt that you're doing that because then a DC-blocking filter would not work with that. by definition, the output of a DC-blocking filter is a signed, bipolar value.
> - ie, in the case of option (c), will 15-bits of resolution do?
dunno how to answer that.
> I am however interested in knowing what people with more experience think. > > Furthermore, the idea of combining the DC blocking filter with a lowpass filter appeals, but to be honest I lack the knowledge to do it.
the DC blocking filter is just a first-order highpass with a very low cutoff frequency. if the LPF is also a 1st-order IIR, then the result is simply a 2nd-order bandpass filter.
> I am currently using a 64-tap equiripple filter that I designed in MATLAB, which has been implemented in symmetric direct-form. If the DC blocker did not contain noise-shaping, I think I could combine the two transfer functions, but the noise shaping component throws me. >
well, then maybe your 64-tap FIR is more flat or some better response than a 1st-order IIR. then i would still put the DC blocker first, right onto the signal from the ADC. one thing is that you could incorporate an HPF (the DC blocker) into your FIR, but it might make it longer. in that case, the tap coefficients would have to add to zero for it to block DC.
> Is anyone able to recommend a decent text book or online resource that would point me in the right direction?
about noise shaping and DC-blocking filters? no. i dunno any textbooks that deal with that. that's sorta why i "published" the alg as a dspguru trick.
> If so I would appreciate it.
ya know, that reminds me. Randy Yates and Rick Lyons dealt with the DC blocking filter issues in the IEEE Sig Proc magazine. maybe someone with the doc can send you a copy. i read it, but i cannot find it on my computer. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
Reply by robert bristow-johnson June 21, 20122012-06-21
On 6/20/12 5:04 PM, glen herrmannsfeldt wrote:
> robert bristow-johnson<rbj@audioimagination.com> wrote: > > (snip, someone wrote) >>> Ummm, maybe in some strange CPU in another universe. >> In no machine I can think of will it make a difference. > >> really? sign-extension is an instruction that costs >> no CPU clocks? > > In machines that overlap instructions, it is hard to count > clock cycles.
yah. dat's a problem too.
> But what do you compare against, zero extending the value?
i was essentially expecting, in a regular ol' CPU with multiple word widths in its programming model, that both zero-extending (for unsigned shorts) and sign-extending (for signed shorts) cost about the same; one extra instruction cycle (however long that is, in clocks).
> There is at least one example, and likely many more, where > zero extending is slower than sign extending.
couldn't this 360 just move zero into a long register (maybe that would take two moves, but you would not do it to the least significant half) and then move the unsigned short into the lower half? two moves. sign extend requires either a nice sign-extend instruction, or a test and conditional add (i can't remember the processor, but i remember writing code to do that, was it the 6800 or 6502 or Z80?). anyway, sign extend without an instruction takes a few instructions. i think i remember this: CLRB LDAA X:0 ROLA SBCB #0 LDAA X:0 ; hafta reload the register i think this is how i had to do it for some ancient microprocessor. of course to zero-extend, it was pretty simple: CLRB LDAA X:0
> (The cost of zero extending on S/360, one instruction and few > clock cycles, is likely the reason for the 32767 byte I/O > limit instead of 65535. That limit is still there in many > cases for z/OS.)
even though i have rubber-banded punch cards together and stuck them in a slot where, when he got around to it, some low-paid computer operator in the glassed-off computer room, unbanded and serially fed the packs of cards into a card reader connected to a 360 or 370 (can't remember which). but i was just writing dinky little Fortran programs (or maybe ECAP or PCAP or DINAP or something like that), and not dealing at all with the assembly or with low-level I/O issues. that's as close to a 360/370 i ever got. otherwize i was pretty much just a simple Mot micro guy. maybe once i had an idea of what was going on inside an LSI-11 or PDP-11, but i can't remember if they had a sign-extend instruction. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
Reply by glen herrmannsfeldt June 20, 20122012-06-20
robert bristow-johnson <rbj@audioimagination.com> wrote:

(snip, someone wrote)
>> Ummm, maybe in some strange CPU in another universe. > In no machine I can think of will it make a difference.
> really? sign-extension is an instruction that costs > no CPU clocks?
In machines that overlap instructions, it is hard to count clock cycles. But what do you compare against, zero extending the value? There is at least one example, and likely many more, where zero extending is slower than sign extending. (The cost of zero extending on S/360, one instruction and few clock cycles, is likely the reason for the 32767 byte I/O limit instead of 65535. That limit is still there in many cases for z/OS.) -- glen
Reply by Sinclair June 20, 20122012-06-20
Hi All.

With regards to Robert's suggestion of placing a DC blocking filter immediately after the ADC, I have a question. If the ADC was 16-bit, which of the following approaches would you adopt, and why:

a. Store the ADC data as uint16 and filter accordingly
b. Store the ADC data as int32  and filter accordingly
c. Store the ADC data as int16  and filter accordingly

My feeling is that the answer is resource (RAM, CPU cycles) and application dependent - ie, in the case of option (c), will 15-bits of resolution do? I am however interested in knowing what people with more experience think.

Furthermore, the idea of combining the DC blocking filter with a lowpass filter appeals, but to be honest I lack the knowledge to do it. I am currently using a 64-tap equiripple filter that I designed in MATLAB, which has been implemented in symmetric direct-form. If the DC blocker did not contain noise-shaping, I think I could combine the two transfer functions, but the noise shaping component throws me.

Is anyone able to recommend a decent text book or online resource that would point me in the right direction? If so I would appreciate it.
Reply by robert bristow-johnson June 20, 20122012-06-20
On 6/20/12 1:13 PM, Randy Yates wrote:
> robert bristow-johnson<rbj@audioimagination.com> writes: > >> On 6/20/12 12:11 PM, Randy Yates wrote: >> >>> Or are you saying that the process of doing the promotion/conversion is >>> going to take time (whether implicit or not)? >>> >>> Ummm, maybe in some strange CPU in another universe. In no machine I can >>> think of will it make a difference. >> >> really? sign-extension is an instruction that costs no CPU clocks? > > Well, it doesn't on the TI 28x. See, e.g., the sub acc, loc16 versus > subl acc, loc32 instructions here: > > http://www.ti.com/mcu/docs/litabsmultiplefilelist.tsp?sectionId=96&tabId=1502&literatureNumber=spru430e&docCategoryId=6&familyId=2049 > > They are both 1 cycle, and in the loc16 case it can be signed extended > in that one cycle. > > Granted one case does not a proof make.
for me, most of the time when i write C code, it ends up running on a PC or Mac (and then it's really a big mystery exactly what is happening inside), or it's in an embedded processor like a 68K descendant or something like the Renesas SH3. and with these simple compile-link-and-load tools you can easily see what the generated code is. and you can see exactly what happens when an unsigned short is converted to long (move 0 into long register, move short into least-significant half) or when a signed short is converted to long (move and sign-extend). it turns out that most of these have just as short execution time (1 clock) when multiplying two longs as when multiplying two shorts. then it makes sense to just make all of these intermediate variables a long and cast it (from/to short) on the way in and on the way out. that's the main reason i left "A" as a long, even though only the least-signficant bits were in use. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
Reply by Randy Yates June 20, 20122012-06-20
robert bristow-johnson <rbj@audioimagination.com> writes:

> On 6/20/12 12:11 PM, Randy Yates wrote: > >> Or are you saying that the process of doing the promotion/conversion is >> going to take time (whether implicit or not)? >> >> Ummm, maybe in some strange CPU in another universe. In no machine I can >> think of will it make a difference. > > really? sign-extension is an instruction that costs no CPU clocks?
Well, it doesn't on the TI 28x. See, e.g., the sub acc, loc16 versus subl acc, loc32 instructions here: http://www.ti.com/mcu/docs/litabsmultiplefilelist.tsp?sectionId=96&tabId=1502&literatureNumber=spru430e&docCategoryId=6&familyId=2049 They are both 1 cycle, and in the loc16 case it can be signed extended in that one cycle. Granted one case does not a proof make. --Randy -- Randy Yates Digital Signal Labs http://www.digitalsignallabs.com
Reply by robert bristow-johnson June 20, 20122012-06-20
On 6/20/12 12:48 PM, gretzteam wrote:
>> On 6/20/12 12:11 PM, Randy Yates wrote: >> >>> Or are you saying that the process of doing the promotion/conversion is >>> going to take time (whether implicit or not)? >>> >>> Ummm, maybe in some strange CPU in another universe. In no machine I >>> can think of will it make a difference. >> >> really? sign-extension is an instruction that costs no CPU clocks? >> > > This is when I realize that although hardware design can be a pain - mostly > because Verilog is a horrible language on all aspects - it's very nice to > not have to bother about this 'CPU' instruction stuff:)
when you're doing this in hardware, then you *know* that an N-bit word times an M-bit word results in precisely an (N+M) bit word before you decide to toss bits away. and sign-extension or zero-extension (for unsigned ints) costs no clock pulses. and, if you put in the logic for a barrel shifter, neither does a shift-by-N-bits operation cost you any clocks. but when you write C code that runs in some machine, you gotta put up with both the requirements and limitations of the processor and with the compiler/language. that's why i just promote numbers to long in C. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
Reply by Randy Yates June 20, 20122012-06-20
Randy Yates <yates@digitalsignallabs.com> writes:
> [...] > you could get a bit better precision by doing something like > > acc -= (A * prev_y) >> 4;
Wups. To ensure you don't overflow, you'd need to do something like this: acc -= (A * (long long)prev_y) >> 4; Then this would possibly (probably) cost you more cycles. Bottom line: I think Tim's code is about as good as you're gonna get with a compiler. I guess that means I'm changing my stance and agreeing with you, Robert. -- Randy Yates Digital Signal Labs http://www.digitalsignallabs.com
Reply by Randy Yates June 20, 20122012-06-20
robert bristow-johnson <rbj@audioimagination.com> writes:

> On 6/20/12 8:51 AM, Jerry Avins wrote: >> On 6/19/2012 10:20 PM, robert bristow-johnson wrote: >>> >>> Tim, i just don't know what else to do. everytime this DC-blocking with >>> noise-shaping issue comes up, i *always* point them to your algorithm, >>> *including* writing the code so it should be easy for them to see that >>> your alg is so much simpler and more efficient. >>> >>> then i *always* reiterate that "Tim's method is better." >> >> ... >> >>> is *better* and *more* efficient than the code i supplied to dspguru. >>> that's what you should be using, Rodney. >>> >>> maybe i'll give up plugging your new improved alg, Tim. but it does >>> sorta grate me to see it passed over without comment. >> >> Robert, >> >> Why don't you ask Grant to replace your code with Tim's appropriately >> credited, or at least a link to it? >> > > i think i did long ago when Tim first put it up. i can't remember if > i got a response and Grant does not seem to hang out here anymore. > > i'll write him again. > > > On 6/20/12 7:40 AM, Randy Yates wrote: >> robert bristow-johnson<rbj@audioimagination.com> writes: >>> [...] >>> so i repeat, this alg: >>> >>> ______________ >>> >>> /* >>> >>> y[n] = x[n] - Quantize{ w[n] } >>> >>> = x[n] - (w[n] + e[n]) >>> >>> w[n+1] = w[n] + (1-pole)*y[n] >>> */ >>> >>> short x[], y[]; >>> double pole = 0.9999; >>> long w, A, curr_y; >>> unsigned long n, num_samples; >>> >>> A = (long)(32768.0*(1.0 - pole)); >>> w = 0; >>> >>> for (n=0; n<num_samples; n++) >>> { >>> curr_y = (long)x[n] - (w>>15); // quantization happens here >>> w += A*curr_y; >>> y[n] = (short)curr_y; >>> } >>> >>> ______________ >>> >>> is *better* and *more* efficient than the code i supplied to dspguru. >>> that's what you should be using, Rodney. >> >> I would agree it is more efficient. I don't agree it is "better" in all >> ways. > > it's more efficient, and it's mathematically equivalent. right down > to the bit.
Yup, I made a mistake. Although now I would say it's not equivalent but rather superior! The whole equivalent thing hinges on the fact that round(n + x) = n + round(x), when n is an integer and x is a real. The fact is, you could use a better representation for A than (16,15). However, if you stick with (16,15), it isn't any worse than doing the update with P. And if you use a better representation, such as (12,19), you could get a bit better precision by doing something like acc -= (A * prev_y) >> 4; --RY -- Randy Yates Digital Signal Labs http://www.digitalsignallabs.com