There are 6 messages in this thread.
You are currently looking at messages 1 to .
Is this discussion worth a thumbs up?
Hi guys,
I have implemented a floating point lowpass filter (sample rate 500hz,
cutoff 30hz), which is running on an XMega... It works fine, but
unfortunatly too slow. Now I'm thinking about converting the coefficents
into fixed point coefficents. Simply extending the coeeficents is not
possible, since my input values have 20bit. How can I accelerate the
filter?
// Coefficents calculation
Fo = 30; //(30hz)
Ts = 0.002; //(500Hz)
Wo = 2. * PI * Fo;
C1 = 1. / tanf(Ts * Wo / 2.);
M = 256. * (float)sqrt(26.) / 500.;
N = 34118./10000.;
N0 = 1.;
N1 = 4.;
N2 = 6.;
N3 = 4.;
N4 = 1.;
D0 = C1 * (C1 * (C1 * (C1 + M) + N) + M) + 1.;
D1 = - (C1 * (C1 * (C1 * (4. * C1 + 2.* M) ) - 2. * M) - 4.);
D2 = C1 * C1 * (6. * C1 * C1 - 2. * N) + 6.;
D3 = -(C1 * (C1 * (C1 * (4. * C1 - 2. * M)) + 2. * M) - 4.);
D4 = C1 * (C1 * (C1 * (C1 - M) + N) - M) + 1.;
uint32_t LPFilter(uint32_t value) {
RFreq[0] = value;
// Filter Algorithm
RFilt[0] = (N0 * RFreq[0] + N1 * RFreq[1] + N2 * RFreq[2] + N3 * RFreq[3]
+ N4 * RFreq[4] - D1 * RFilt[1] - D2 * RFilt[2] - D3 * RFilt[3] - D4 *
RFilt[4]) / D0;
// Shift samples for next LP Filter circle
for(uint8_t indx = 4; indx > 0; indx--) {
RFreq[indx] = RFreq[indx-1];
RFilt[indx] = RFilt[indx-1];
}
return (uint32_t)RFilt[0];
}
______________________________On Wed, 18 Jul 2012 17:31:42 -0500 "ad5TkO" <1@dsprelated> wrote: > Hi guys, > > I have implemented a floating point lowpass filter (sample rate 500hz, > cutoff 30hz), which is running on an XMega... It works fine, but > unfortunatly too slow. Now I'm thinking about converting the coefficents > into fixed point coefficents. Simply extending the coeeficents is not > possible, since my input values have 20bit. How can I accelerate the > filter? > > // Coefficents calculation > Fo = 30; //(30hz) > Ts = 0.002; //(500Hz) > Wo = 2. * PI * Fo; > C1 = 1. / tanf(Ts * Wo / 2.); > M = 256. * (float)sqrt(26.) / 500.; > N = 34118./10000.; > N0 = 1.; > N1 = 4.; > N2 = 6.; > N3 = 4.; > N4 = 1.; > D0 = C1 * (C1 * (C1 * (C1 + M) + N) + M) + 1.; > D1 = - (C1 * (C1 * (C1 * (4. * C1 + 2.* M) ) - 2. * M) - 4.); > D2 = C1 * C1 * (6. * C1 * C1 - 2. * N) + 6.; > D3 = -(C1 * (C1 * (C1 * (4. * C1 - 2. * M)) + 2. * M) - 4.); > D4 = C1 * (C1 * (C1 * (C1 - M) + N) - M) + 1.; > > > uint32_t LPFilter(uint32_t value) { > > RFreq[0] = value; > > // Filter Algorithm > RFilt[0] = (N0 * RFreq[0] + N1 * RFreq[1] + N2 * RFreq[2] + N3 * RFreq[3] > + N4 * RFreq[4] - D1 * RFilt[1] - D2 * RFilt[2] - D3 * RFilt[3] - D4 * > RFilt[4]) / D0; > > > // Shift samples for next LP Filter circle > for(uint8_t indx = 4; indx > 0; indx--) { > RFreq[indx] = RFreq[indx-1]; > RFilt[indx] = RFilt[indx-1]; > } > > return (uint32_t)RFilt[0]; > } > > You've got a major problem here in that you're trying to use an 8-bit processor to do crunching on big numbers. That means that even in fixed-point, your calculations are dominated by horrible, complex cross product multiplies. You could use 16-bit coefficients and 24-bit data, and then write the following function in assembly, taking advantage of the bounded range of the coefficients and that you never need the uppermost byte of the data. int32_t product(int16_t coef, int32_t data); But it's still going to be a mess o' math. If you can run close to fast enough in emulated floating point, you might get the speed pickup you need, but you're probably looking at, guesstimated, a 2x speed improvement? The other thing to do is fold your multiplies. Don't do A*C + B*C when you can do (A+B)*C. Addition's cheapish; multiplication is brutal. Me, I'd migrate over to a Cortex-M3. 32-bit math would make a huge difference, and the per chip cost is about the same. -- Rob Gaddi, Highland Technology -- www.highlandtechnology.com Email address domain is currently out of order. See above to fix.______________________________
"Rob Gaddi" <r...@technologyhighland.invalid> wrote in message news:2...@rg.highlandtechnology.com... > On Wed, 18 Jul 2012 17:31:42 -0500 > "ad5TkO" <1@dsprelated> wrote: > >> I have implemented a floating point lowpass filter (sample rate 500hz, >> cutoff 30hz), which is running on an XMega... It works fine, but >> unfortunatly too slow. Now I'm thinking about converting the coefficents >> into fixed point coefficents. > You've got a major problem here in that you're trying to use an 8-bit > processor to do crunching on big numbers. That means that even in > fixed-point, your calculations are dominated by horrible, complex cross > product multiplies. > Me, I'd migrate over to a Cortex-M3. 32-bit math would make a huge > difference, and the per chip cost is about the same. AVR has hardware 8 x 8 multiplication. Running at 20 MIPS, it is indeed quite capable as DSP. Doing a lowpass filter like OP's should be no problem, especially if the coefficients could be represented as 8 bit numbers. VLV______________________________
On Wed, 18 Jul 2012 17:31:42 -0500, ad5TkO wrote:
> Hi guys,
>
> I have implemented a floating point lowpass filter (sample rate 500hz,
> cutoff 30hz), which is running on an XMega... It works fine, but
> unfortunatly too slow. Now I'm thinking about converting the coefficents
> into fixed point coefficents. Simply extending the coeeficents is not
> possible, since my input values have 20bit. How can I accelerate the
> filter?
>
> // Coefficents calculation
> Fo = 30; //(30hz)
> Ts = 0.002; //(500Hz)
> Wo = 2. * PI * Fo;
> C1 = 1. / tanf(Ts * Wo / 2.);
> M = 256. * (float)sqrt(26.) / 500.;
> N = 34118./10000.;
> N0 = 1.;
> N1 = 4.;
> N2 = 6.;
> N3 = 4.;
> N4 = 1.;
> D0 = C1 * (C1 * (C1 * (C1 + M) + N) + M) + 1.; D1 = - (C1 * (C1 * (C1 *
> (4. * C1 + 2.* M) ) - 2. * M) - 4.); D2 = C1 * C1 * (6. * C1 * C1 - 2. *
> N) + 6.; D3 = -(C1 * (C1 * (C1 * (4. * C1 - 2. * M)) + 2. * M) - 4.); D4
> = C1 * (C1 * (C1 * (C1 - M) + N) - M) + 1.;
>
>
> uint32_t LPFilter(uint32_t value) {
>
> RFreq[0] = value;
>
> // Filter Algorithm
> RFilt[0] = (N0 * RFreq[0] + N1 * RFreq[1] + N2 * RFreq[2] + N3 *
> RFreq[3]
> + N4 * RFreq[4] - D1 * RFilt[1] - D2 * RFilt[2] - D3 * RFilt[3] - D4 *
> RFilt[4]) / D0;
>
>
> // Shift samples for next LP Filter circle for(uint8_t indx = 4;
indx >
> 0; indx--) {
> RFreq[indx] = RFreq[indx-1];
> RFilt[indx] = RFilt[indx-1];
> }
>
> return (uint32_t)RFilt[0];
> }
You don't show what type your RFreq and RFilt are, but I see you casting
things to "float" elsewhere -- if so, then you've got 25 bits worth of
precision in your mantissa, and you're throwing nine of it away in your
filter. That leaves you with 16 bits effective -- so why are you worried
about 20 bit input data?
If you _are_ using double, then I guess the 31 bits you're leaving
yourself is OK -- but why not do what everyone else in the world does,
and split your filter into a pair of 2nd-order filters in cascade?
Depending on your needs, you may be able to get by with implementing your
gains as simple shifts -- i.e., if you can get by with coefficient values
that are of the sort 1 - 2^(-n) and 1 + 2^(-n) for your two 2nd-order
filters, then you won't have to use a multiply at all.
Or, do like Vladimir suggests and use the processor's 8x8 multiply,
although that will require you to do a spot of assembly language
programming.
With a cutoff at 30Hz and a sampling rate of 500Hz, you _ought_ to be
able to get by with 8-bit coefficients, and 32 bit data paths should work
with cascaded filters and your 20-bit input data.
--
My liberal friends think I'm a conservative kook.
My conservative friends think I'm a liberal kook.
Why am I not happy that they have found common ground?
Tim Wescott, Communications, Control, Circuits & Software
http://www.wescottdesign.com
______________________________Hello, you're using a Butterworth prototype, which is one extreme corner of the design space. If you can compromise a little on passband ripple and group delay, you can get the same job done with a 3rd order filter, and save 1/4 of multiplications. Try an elliptic filter. Matlab can design one, or Freeware Octave (needs octaveForge). For example: [b, a] = ellip(3, 0.02, 55, 2*0.03); 3rd order filter 0.02 dB ripple (in this case it will be one-sided) 55 dB stopband rejection 'as close as possible' (definition of elliptic filter) and 0.03 is 30 Hz / 500 Hz / 2 The pole radius is 0.882, which is only slightly worse than the original 0.868. The group delay variations a little worse than the Butterworth prototype, but not much as the passband ripple is so small. This would deteriorate, if I'd allow more passband ripple to gain steeper rejection. Example: Frequency response http://www.dsprelated.com/blogimages/MarkusNentwig/comp.dsp/120919_analogFilter/freqresp1.png http://www.dsprelated.com/blogimages/MarkusNentwig/comp.dsp/120919_analogFilter/freqresp2.png Group delay http://www.dsprelated.com/blogimages/MarkusNentwig/comp.dsp/120919_analogFilter/grpDelay.png The black 'arbitrary' trace is a quick-and-dirty filter that was designed with numerical methods. Example coefficients: Numerator: bEllip 0.00550177914321390 0.00408766656529893 0.00408766656529893 0.00550177914321390 Denominator: aEllip 1.000000000000000 -2.461871751233348 2.077011591152998 -0.595960948502624 -markus______________________________
On Thu, 19 Jul 2012 06:08:11 -0500, mnentwig wrote: > Hello, > > you're using a Butterworth prototype, which is one extreme corner of the > design space. > > If you can compromise a little on passband ripple and group delay, you > can get the same job done with a 3rd order filter, and save 1/4 of > multiplications. Actually, this would be a good time to ask the OP what he really wants. I'm guessing that he's pulling things out of a cookbook or off of web pages, given that he's trying to make a direct-form 4th-order filter where at the very least he should be cascading two 2nd-order ones. For all we know, he could do fine with a 1st-order filter with a 10Hz cutoff, or a CIC, or something else that's way easier. -- My liberal friends think I'm a conservative kook. My conservative friends think I'm a liberal kook. Why am I not happy that they have found common ground? Tim Wescott, Communications, Control, Circuits & Software http://www.wescottdesign.com______________________________