hi !
thanks for the fftw run !
if at least i had the courage to compile it at home ....
Steven G. Johnson wrote:
> Steven G. Johnson wrote:
>
>> /* some definitions from ifftw.h */
>> typedef double R; /* input/output type */
>> typedef double E; /* local variables */
>> #define K(x) ((E) x) /* define constants */
>
>
> Whoops, you also need:
>
> #define FMA(a, b, c) (((a) * (b)) + (c))
> #define FMS(a, b, c) (((a) * (b)) - (c))
> #define FNMA(a, b, c) (- (((a) * (b)) + (c)))
> #define FNMS(a, b, c) ((c) - ((a) * (b)))
>
thanks :-)))
> (This is the default version. We have some alternate hackish
> definitions of these macros for gcc on the powerpc to trick it into
> doing the right thing; the problem is to prevent common-subexpression
> elimination from destroying FMA operations...unfortunately, there's no
> portable way of doing this because the C99 standard messed up and
> included only one of the four FMA operations. But if your architecture
> has fma instructions, you should also pass "-fma -scheduler-hack" to the
> FFTW generator to tell it to expose more FMA operations.)
hmmmm ......
this is for an ARM7TDMI, so it'll be all integers
and i'll give a good try at binDCT rotations/lifting.
does the binDCT "patents" prevent the FFTW code to include the
dyadic trick ?
My code will be really, really naughty but it has to work.
seems i'll have to re-hack your output, but at least
i have something to work with.
thanks again !!!
> Steven
>
YG
Reply by Steven G. Johnson●October 9, 20032003-10-09
Steven G. Johnson wrote:
> /* some definitions from ifftw.h */
> typedef double R; /* input/output type */
> typedef double E; /* local variables */
> #define K(x) ((E) x) /* define constants */
Whoops, you also need:
#define FMA(a, b, c) (((a) * (b)) + (c))
#define FMS(a, b, c) (((a) * (b)) - (c))
#define FNMA(a, b, c) (- (((a) * (b)) + (c)))
#define FNMS(a, b, c) ((c) - ((a) * (b)))
(This is the default version. We have some alternate hackish
definitions of these macros for gcc on the powerpc to trick it into
doing the right thing; the problem is to prevent common-subexpression
elimination from destroying FMA operations...unfortunately, there's no
portable way of doing this because the C99 standard messed up and
included only one of the four FMA operations. But if your architecture
has fma instructions, you should also pass "-fma -scheduler-hack" to the
FFTW generator to tell it to expose more FMA operations.)
Steven
Reply by Steven G. Johnson●October 9, 20032003-10-09
YG wrote:
> i know that it is possible but
> i don't know how. radix-2 DCT are well
> covered but i can't find radix-5 references
> for DCT or FFT....
>
> can anybody drop me the algorithm ?
The code generator in our FFTW (www.fftw.org) can generate efficient
hard-coded DCTs of arbitrary size. (It is actually able to derive them
on the fly by starting with a complex FFT and performing algebraic
simplifications for real-symmetric inputs/outputs. See our web page for
references.)
You should understand that four types of DCT (types I-IV) are common
(and types V-VIII exist as well), so you need to figure out which one
you want. FFTW can generate code for types I-IV. By "the DCT",
however, most people (esp. in signal processing) mean type II (and "the
IDCT" is type III).
For reference, I attach below the output we generate for a size-10
type-II DCT. Realize, however, that choices of normalization differ for
these transforms; for the definition of the transform generated by FFTW,
see:
http://www.fftw.org/doc/1d-Real-even-DFTs--DCTs-.html
You may have to multiply some inputs/outputs by factors of 1/2 or
1/sqrt(2) or whatever, depending upon your definition. You should be
able to do this without increasing the number of operations much by
combining these factors with existing multiplications (and may even save
an operation or two if some of the multiplications turn into unity).
Cordially,
Steven G. Johnson
/*****************************************************************************/
/* some definitions from ifftw.h */
typedef double R; /* input/output type */
typedef double E; /* local variables */
#define K(x) ((E) x) /* define constants */
/* Generated by: /homee/stevenj/cvs/fftw3/genfft/gen_r2r -redft10
-with-istride 1 -with-ostride 1 -n 10 -name dctII_10 */
/*
* This function contains 38 FP additions, 22 FP multiplications,
* (or, 32 additions, 16 multiplications, 6 fused multiply/add),
* 30 stack variables, and 20 memory accesses
*/
void dctII_10(const R * I, R * O)
{
const E KP2_828427124 =
K(+2.828427124746190097603377448419396157139343751);
const E KP1_414213562 =
K(+1.414213562373095048801688724209698078569671875);
const E KP2_000000000 =
K(+2.000000000000000000000000000000000000000000000);
const E KP4_000000000 =
K(+4.000000000000000000000000000000000000000000000);
const E KP250000000 =
K(+0.250000000000000000000000000000000000000000000);
const E KP559016994 =
K(+0.559016994374947424102293417182819058860154590);
const E KP293892626 =
K(+0.293892626146236564584352977319536384298826219);
const E KP475528258 =
K(+0.475528258147576786058219666689691071702849317);
{
E T17;
E T7;
E T27;
E T16;
E T20;
E T28;
E T22;
E T14;
E T30;
E T15;
E T25;
E T31;
E T21;
E T26;
{
E T6;
E T19;
E T3;
E T18;
T17 = I[2];
{
E T4;
E T5;
E T1;
E T2;
T4 = I[9];
T5 = I[5];
T6 = T4 - T5;
T19 = T4 + T5;
T1 = I[6];
T2 = I[1];
T3 = T1 - T2;
T18 = T1 + T2;
}
T7 = FMA(KP475528258, T3, KP293892626 * T6);
T27 = KP559016994 * (T18 - T19);
T16 = FNMS(KP475528258, T6, KP293892626 * T3);
T20 = T18 + T19;
T28 = FNMS(KP250000000, T20, T17);
}
{
E T13;
E T24;
E T10;
E T23;
T22 = I[7];
{
E T11;
E T12;
E T8;
E T9;
T11 = I[0];
T12 = I[4];
T13 = T11 - T12;
T24 = T11 + T12;
T8 = I[3];
T9 = I[8];
T10 = T8 - T9;
T23 = T8 + T9;
}
T14 = FMA(KP475528258, T10, KP293892626 * T13);
T30 = KP559016994 * (T24 - T23);
T15 = FNMS(KP293892626, T10, KP475528258 * T13);
T25 = T23 + T24;
T31 = FNMS(KP250000000, T25, T22);
}
O[6] = KP4_000000000 * (T7 + T14);
O[2] = KP4_000000000 * (T15 - T16);
T21 = T17 + T20;
T26 = T22 + T25;
O[0] = KP2_000000000 * (T21 + T26);
O[5] = KP1_414213562 * (T26 - T21);
{
E T34;
E T29;
E T32;
E T33;
T34 = KP2_828427124 * (T7 - T14);
T29 = T27 + T28;
T32 = T30 - T31;
T33 = KP1_414213562 * (T29 + T32);
O[1] = T33 - T34;
O[4] = KP2_000000000 * (T32 - T29);
O[9] = T34 + T33;
}
{
E T35;
E T36;
E T37;
E T38;
T35 = KP2_828427124 * (T16 + T15);
T36 = T28 - T27;
T37 = T31 + T30;
T38 = KP1_414213562 * (T36 - T37);
O[3] = T35 - T38;
O[8] = KP2_000000000 * (T36 + T37);
O[7] = T35 + T38;
}
}
}
Reply by YG●October 8, 20032003-10-08
hello,
i know that it is possible but
i don't know how. radix-2 DCT are well
covered but i can't find radix-5 references
for DCT or FFT....
can anybody drop me the algorithm ?
thanks in advance,
YG