DSPRelated.com
Forums

help with 10-point DCT

Started by YG October 8, 2003
hello,

i know that it is possible but
i don't know how. radix-2 DCT are well
covered but i can't find radix-5 references
for DCT or FFT....

can anybody drop me the algorithm ?

thanks in advance,
YG

YG wrote:
> i know that it is possible but > i don't know how. radix-2 DCT are well > covered but i can't find radix-5 references > for DCT or FFT.... > > can anybody drop me the algorithm ?
The code generator in our FFTW (www.fftw.org) can generate efficient hard-coded DCTs of arbitrary size. (It is actually able to derive them on the fly by starting with a complex FFT and performing algebraic simplifications for real-symmetric inputs/outputs. See our web page for references.) You should understand that four types of DCT (types I-IV) are common (and types V-VIII exist as well), so you need to figure out which one you want. FFTW can generate code for types I-IV. By "the DCT", however, most people (esp. in signal processing) mean type II (and "the IDCT" is type III). For reference, I attach below the output we generate for a size-10 type-II DCT. Realize, however, that choices of normalization differ for these transforms; for the definition of the transform generated by FFTW, see: http://www.fftw.org/doc/1d-Real-even-DFTs--DCTs-.html You may have to multiply some inputs/outputs by factors of 1/2 or 1/sqrt(2) or whatever, depending upon your definition. You should be able to do this without increasing the number of operations much by combining these factors with existing multiplications (and may even save an operation or two if some of the multiplications turn into unity). Cordially, Steven G. Johnson /*****************************************************************************/ /* some definitions from ifftw.h */ typedef double R; /* input/output type */ typedef double E; /* local variables */ #define K(x) ((E) x) /* define constants */ /* Generated by: /homee/stevenj/cvs/fftw3/genfft/gen_r2r -redft10 -with-istride 1 -with-ostride 1 -n 10 -name dctII_10 */ /* * This function contains 38 FP additions, 22 FP multiplications, * (or, 32 additions, 16 multiplications, 6 fused multiply/add), * 30 stack variables, and 20 memory accesses */ void dctII_10(const R * I, R * O) { const E KP2_828427124 = K(+2.828427124746190097603377448419396157139343751); const E KP1_414213562 = K(+1.414213562373095048801688724209698078569671875); const E KP2_000000000 = K(+2.000000000000000000000000000000000000000000000); const E KP4_000000000 = K(+4.000000000000000000000000000000000000000000000); const E KP250000000 = K(+0.250000000000000000000000000000000000000000000); const E KP559016994 = K(+0.559016994374947424102293417182819058860154590); const E KP293892626 = K(+0.293892626146236564584352977319536384298826219); const E KP475528258 = K(+0.475528258147576786058219666689691071702849317); { E T17; E T7; E T27; E T16; E T20; E T28; E T22; E T14; E T30; E T15; E T25; E T31; E T21; E T26; { E T6; E T19; E T3; E T18; T17 = I[2]; { E T4; E T5; E T1; E T2; T4 = I[9]; T5 = I[5]; T6 = T4 - T5; T19 = T4 + T5; T1 = I[6]; T2 = I[1]; T3 = T1 - T2; T18 = T1 + T2; } T7 = FMA(KP475528258, T3, KP293892626 * T6); T27 = KP559016994 * (T18 - T19); T16 = FNMS(KP475528258, T6, KP293892626 * T3); T20 = T18 + T19; T28 = FNMS(KP250000000, T20, T17); } { E T13; E T24; E T10; E T23; T22 = I[7]; { E T11; E T12; E T8; E T9; T11 = I[0]; T12 = I[4]; T13 = T11 - T12; T24 = T11 + T12; T8 = I[3]; T9 = I[8]; T10 = T8 - T9; T23 = T8 + T9; } T14 = FMA(KP475528258, T10, KP293892626 * T13); T30 = KP559016994 * (T24 - T23); T15 = FNMS(KP293892626, T10, KP475528258 * T13); T25 = T23 + T24; T31 = FNMS(KP250000000, T25, T22); } O[6] = KP4_000000000 * (T7 + T14); O[2] = KP4_000000000 * (T15 - T16); T21 = T17 + T20; T26 = T22 + T25; O[0] = KP2_000000000 * (T21 + T26); O[5] = KP1_414213562 * (T26 - T21); { E T34; E T29; E T32; E T33; T34 = KP2_828427124 * (T7 - T14); T29 = T27 + T28; T32 = T30 - T31; T33 = KP1_414213562 * (T29 + T32); O[1] = T33 - T34; O[4] = KP2_000000000 * (T32 - T29); O[9] = T34 + T33; } { E T35; E T36; E T37; E T38; T35 = KP2_828427124 * (T16 + T15); T36 = T28 - T27; T37 = T31 + T30; T38 = KP1_414213562 * (T36 - T37); O[3] = T35 - T38; O[8] = KP2_000000000 * (T36 + T37); O[7] = T35 + T38; } } }
Steven G. Johnson wrote:
> /* some definitions from ifftw.h */ > typedef double R; /* input/output type */ > typedef double E; /* local variables */ > #define K(x) ((E) x) /* define constants */
Whoops, you also need: #define FMA(a, b, c) (((a) * (b)) + (c)) #define FMS(a, b, c) (((a) * (b)) - (c)) #define FNMA(a, b, c) (- (((a) * (b)) + (c))) #define FNMS(a, b, c) ((c) - ((a) * (b))) (This is the default version. We have some alternate hackish definitions of these macros for gcc on the powerpc to trick it into doing the right thing; the problem is to prevent common-subexpression elimination from destroying FMA operations...unfortunately, there's no portable way of doing this because the C99 standard messed up and included only one of the four FMA operations. But if your architecture has fma instructions, you should also pass "-fma -scheduler-hack" to the FFTW generator to tell it to expose more FMA operations.) Steven
hi !

thanks for the fftw run !
if at least i had the courage to compile it at home ....

Steven G. Johnson wrote:
> Steven G. Johnson wrote: > >> /* some definitions from ifftw.h */ >> typedef double R; /* input/output type */ >> typedef double E; /* local variables */ >> #define K(x) ((E) x) /* define constants */ > > > Whoops, you also need: > > #define FMA(a, b, c) (((a) * (b)) + (c)) > #define FMS(a, b, c) (((a) * (b)) - (c)) > #define FNMA(a, b, c) (- (((a) * (b)) + (c))) > #define FNMS(a, b, c) ((c) - ((a) * (b))) >
thanks :-)))
> (This is the default version. We have some alternate hackish > definitions of these macros for gcc on the powerpc to trick it into > doing the right thing; the problem is to prevent common-subexpression > elimination from destroying FMA operations...unfortunately, there's no > portable way of doing this because the C99 standard messed up and > included only one of the four FMA operations. But if your architecture > has fma instructions, you should also pass "-fma -scheduler-hack" to the > FFTW generator to tell it to expose more FMA operations.)
hmmmm ...... this is for an ARM7TDMI, so it'll be all integers and i'll give a good try at binDCT rotations/lifting. does the binDCT "patents" prevent the FFTW code to include the dyadic trick ? My code will be really, really naughty but it has to work. seems i'll have to re-hack your output, but at least i have something to work with. thanks again !!!
> Steven >
YG