hi !

thanks for the fftw run !
if at least i had the courage to compile it at home ....

Steven G. Johnson wrote:
> Steven G. Johnson wrote:
> 
>> /* some definitions from ifftw.h */
>> typedef double R; /* input/output type */
>> typedef double E; /* local variables */
>> #define K(x) ((E) x) /* define constants */
> 
> 
> Whoops, you also need:
> 
> #define FMA(a, b, c) (((a) * (b)) + (c))
> #define FMS(a, b, c) (((a) * (b)) - (c))
> #define FNMA(a, b, c) (- (((a) * (b)) + (c)))
> #define FNMS(a, b, c) ((c) - ((a) * (b)))
> 

thanks :-)))

> (This is the default version.  We have some alternate hackish 
> definitions of these macros for gcc on the powerpc to trick it into 
> doing the right thing; the problem is to prevent common-subexpression 
> elimination from destroying FMA operations...unfortunately, there's no 
> portable way of doing this because the C99 standard messed up and 
> included only one of the four FMA operations.  But if your architecture 
> has fma instructions, you should also pass "-fma -scheduler-hack" to the 
> FFTW generator to tell it to expose more FMA operations.)

hmmmm ......

this is for an ARM7TDMI, so it'll be all integers
and i'll give a good try at binDCT rotations/lifting.
does the binDCT "patents" prevent the FFTW code to include the
dyadic trick ?

My code will be really, really naughty but it has to work.
seems i'll have to re-hack your output, but at least
i have something to work with.

thanks again !!!

> Steven
> 

YG

Steven G. Johnson wrote:
> /* some definitions from ifftw.h */
> typedef double R; /* input/output type */
> typedef double E; /* local variables */
> #define K(x) ((E) x) /* define constants */

Whoops, you also need:

#define FMA(a, b, c) (((a) * (b)) + (c))
#define FMS(a, b, c) (((a) * (b)) - (c))
#define FNMA(a, b, c) (- (((a) * (b)) + (c)))
#define FNMS(a, b, c) ((c) - ((a) * (b)))

(This is the default version.  We have some alternate hackish 
definitions of these macros for gcc on the powerpc to trick it into 
doing the right thing; the problem is to prevent common-subexpression 
elimination from destroying FMA operations...unfortunately, there's no 
portable way of doing this because the C99 standard messed up and 
included only one of the four FMA operations.  But if your architecture 
has fma instructions, you should also pass "-fma -scheduler-hack" to the 
FFTW generator to tell it to expose more FMA operations.)

Steven

YG wrote:
> i know that it is possible but
> i don't know how. radix-2 DCT are well
> covered but i can't find radix-5 references
> for DCT or FFT....
> 
> can anybody drop me the algorithm ?

The code generator in our FFTW (www.fftw.org) can generate efficient 
hard-coded DCTs of arbitrary size.  (It is actually able to derive them 
on the fly by starting with a complex FFT and performing algebraic 
simplifications for real-symmetric inputs/outputs.  See our web page for 
references.)

You should understand that four types of DCT (types I-IV) are common 
(and types V-VIII exist as well), so you need to figure out which one 
you want.  FFTW can generate code for types I-IV.  By "the DCT", 
however, most people (esp. in signal processing) mean type II (and "the 
IDCT" is type III).

For reference, I attach below the output we generate for a size-10 
type-II DCT.  Realize, however, that choices of normalization differ for 
these transforms; for the definition of the transform generated by FFTW, 
see:

	http://www.fftw.org/doc/1d-Real-even-DFTs--DCTs-.html

You may have to multiply some inputs/outputs by factors of 1/2 or 
1/sqrt(2) or whatever, depending upon your definition.  You should be 
able to do this without increasing the number of operations much by 
combining these factors with existing multiplications (and may even save 
an operation or two if some of the multiplications turn into unity).

Cordially,
Steven G. Johnson


/*****************************************************************************/

/* some definitions from ifftw.h */
typedef double R; /* input/output type */
typedef double E; /* local variables */
#define K(x) ((E) x) /* define constants */

/* Generated by: /homee/stevenj/cvs/fftw3/genfft/gen_r2r -redft10 
-with-istride 1 -with-ostride 1 -n 10 -name dctII_10 */

/*
  * This function contains 38 FP additions, 22 FP multiplications,
  * (or, 32 additions, 16 multiplications, 6 fused multiply/add),
  * 30 stack variables, and 20 memory accesses
  */

void dctII_10(const R * I, R * O)
{
     const E KP2_828427124 =
         K(+2.828427124746190097603377448419396157139343751);
     const E KP1_414213562 =
         K(+1.414213562373095048801688724209698078569671875);
     const E KP2_000000000 =
         K(+2.000000000000000000000000000000000000000000000);
     const E KP4_000000000 =
         K(+4.000000000000000000000000000000000000000000000);
     const E KP250000000 =
         K(+0.250000000000000000000000000000000000000000000);
     const E KP559016994 =
         K(+0.559016994374947424102293417182819058860154590);
     const E KP293892626 =
         K(+0.293892626146236564584352977319536384298826219);
     const E KP475528258 =
         K(+0.475528258147576786058219666689691071702849317);
     {
         E T17;
         E T7;
         E T27;
         E T16;
         E T20;
         E T28;
         E T22;
         E T14;
         E T30;
         E T15;
         E T25;
         E T31;
         E T21;
         E T26;
         {
             E T6;
             E T19;
             E T3;
             E T18;
             T17 = I[2];
             {
                 E T4;
                 E T5;
                 E T1;
                 E T2;
                 T4 = I[9];
                 T5 = I[5];
                 T6 = T4 - T5;
                 T19 = T4 + T5;
                 T1 = I[6];
                 T2 = I[1];
                 T3 = T1 - T2;
                 T18 = T1 + T2;
             }
             T7 = FMA(KP475528258, T3, KP293892626 * T6);
             T27 = KP559016994 * (T18 - T19);
             T16 = FNMS(KP475528258, T6, KP293892626 * T3);
             T20 = T18 + T19;
             T28 = FNMS(KP250000000, T20, T17);
         }
         {
             E T13;
             E T24;
             E T10;
             E T23;
             T22 = I[7];
             {
                 E T11;
                 E T12;
                 E T8;
                 E T9;
                 T11 = I[0];
                 T12 = I[4];
                 T13 = T11 - T12;
                 T24 = T11 + T12;
                 T8 = I[3];
                 T9 = I[8];
                 T10 = T8 - T9;
                 T23 = T8 + T9;
             }
             T14 = FMA(KP475528258, T10, KP293892626 * T13);
             T30 = KP559016994 * (T24 - T23);
             T15 = FNMS(KP293892626, T10, KP475528258 * T13);
             T25 = T23 + T24;
             T31 = FNMS(KP250000000, T25, T22);
         }
         O[6] = KP4_000000000 * (T7 + T14);
         O[2] = KP4_000000000 * (T15 - T16);
         T21 = T17 + T20;
         T26 = T22 + T25;
         O[0] = KP2_000000000 * (T21 + T26);
         O[5] = KP1_414213562 * (T26 - T21);
         {
             E T34;
             E T29;
             E T32;
             E T33;
             T34 = KP2_828427124 * (T7 - T14);
             T29 = T27 + T28;
             T32 = T30 - T31;
             T33 = KP1_414213562 * (T29 + T32);
             O[1] = T33 - T34;
             O[4] = KP2_000000000 * (T32 - T29);
             O[9] = T34 + T33;
         }
         {
             E T35;
             E T36;
             E T37;
             E T38;
             T35 = KP2_828427124 * (T16 + T15);
             T36 = T28 - T27;
             T37 = T31 + T30;
             T38 = KP1_414213562 * (T36 - T37);
             O[3] = T35 - T38;
             O[8] = KP2_000000000 * (T36 + T37);
             O[7] = T35 + T38;
         }
     }
}