Reply by Paul Russell September 21, 20032003-09-21
Eric Raas wrote:
> > Does anybody use these functions? Searching Google Groups shows no hits ( > other than this post) for fft2d_rip, fft2d_zrip, fft_rip, and fft_zrip. > I did find some discussion on web forums but nothing very extensive. > There is a paper out describing (presumably) these algorithms which one > should be able to find by searching for g4fft.pdf >
The routines described in g4fft.pdf are Apple's early attempt at optimised FFT's for AltiVec, and you can download the source for these from developer.apple.com. The routines in vDSP are somewhat higher performance and were licensed from a third party. Paul
Reply by Eric Raas September 21, 20032003-09-21
..

>> interleaved complex: >> riririririririri -> riririririririri0000000000000000 >> >> split complex: >> rrrrrrrriiiiiiii -> rrrrrrrr00000000iiiiiiii00000000 >> > > Thanks - for some reason (and perhaps I have some other bug) this > approach does not seem to work for the 2D case I am dealing with. For > example
.. I finally got this to work using a simpler workaround - instead of using the fft function for real input (fft2d_zrip) I just used the simpler complex version (fft2d_zip) which does not produce oddly packed output arrays. I have to copy my real image data into a complex variable, but the overhead does not seem to be that great. Still I wonder whether there is a performance difference between the real and complex forms of Apple's vDSP FFT functions. I guess the complex form takes a bit more memory, but my images are generally smaller than 512x512. The output of fft2d_zip produces a 2D array in which the four quadrants are rearranged according to a fairly obvious scheme (the center of frequency space maps to the four corners of the array - is clear using my handy sinc test pattern). Does anybody use these functions? Searching Google Groups shows no hits ( other than this post) for fft2d_rip, fft2d_zrip, fft_rip, and fft_zrip. I did find some discussion on web forums but nothing very extensive. There is a paper out describing (presumably) these algorithms which one should be able to find by searching for g4fft.pdf ER
Reply by Eric Raas September 20, 20032003-09-20
> Just treat the real and imaginary arrays separately. For example, if > you are doing a 2X interpolation: > > interleaved complex: > riririririririri -> riririririririri0000000000000000 > > split complex: > rrrrrrrriiiiiiii -> rrrrrrrr00000000iiiiiiii00000000 >
Thanks - for some reason (and perhaps I have some other bug) this approach does not seem to work for the 2D case I am dealing with. For example interpolating riri riri in the row direction by a factor of two by zero-padding like this: riri riri 0000 0000 does not seem to work (lots of severe artifact with alternating bands at Nyquist). However this: riri 0000 0000 riri Almost works (looks like interpolated image with some artifact). When I use a test image consisting of a real-valued 2D sinc function ( should have 2DFT equal to rect function) it looks like the array is stored in a peculiar way - the first N/4 rows contain what looks like the zero'th to N/4'th rows in ascending order. The next N/4 rows appear to contain the remaining rows, but in descending rather than ascending order (the outermost rows of the frequency domain are at the center of the image, and the top and bottom rows appear to contain rows that in reality would be adjacent). Placing rows of padding zeros at the *middle* of the array appears to almost work, but there is still a distortion of the image. My SINC <-> RECT test also suggests that the first column of the 2D FFT output is different from the other columns. I know that Apple uses a scheme to exploit the Hermitian symmetry of real functions, and this is documented briefly in the vDSP pdf docs. However what I see does not seem to correspond to the scheme described in the document. Thanks again for the suggestion though - I was not sure if the functions ctoz and ztoc simply convert between interleaved or split complex representations, or whether they might also reverse some of the special packing output Apple's 1D and 2D FFT routines. Your post suggests that those conversion functions only change the interleaving - they do not account for FFT packing (still not sure about this though - again wish the docs were better). Will keep plugging... I think the 'center-out' scheme described above is almost it.
Reply by Tim Olson September 20, 20032003-09-20
In article <20030919224434922-0400@news.verizon.net>,
 Eric Raas <foo@bar.net> wrote:

| The FFT routine 
| in question makes use of what Apple calls a "split complex vector" 
| format, which is a reordering of array elements to improve performance.  
| I know that such reordering is fairly common in high performance FFT's, 
| but Apple's scheme is not very well documented. 

The reordering should just be collecting all of the real components in 
one array, followed by all the imaginary components in a second array.

| 1) can anyone give a hint on how to correctly zero pad split complex 
| output of the vDSP FFT to achieve sinc interpolation?

Just treat the real and imaginary arrays separately.  For example, if 
you are doing a 2X interpolation:

interleaved complex:
riririririririri     -> riririririririri0000000000000000

split complex:
rrrrrrrriiiiiiii     -> rrrrrrrr00000000iiiiiiii00000000


   -- Tim Olson
Reply by Eric Raas September 19, 20032003-09-19
Hi -

I am trying to use Apple's AltiVec FFT routine fft2d_zrip in a scheme 
for doing 2D sinc interpolation by zero-padding the 2D DFT of an image.  
I am able to do forward and reverse 2D FFT's and recover the original 
image, but when I zero pad I get clearly wrong results.  The FFT routine 
in question makes use of what Apple calls a "split complex vector" 
format, which is a reordering of array elements to improve performance.  
I know that such reordering is fairly common in high performance FFT's, 
but Apple's scheme is not very well documented.  

It is possible to go between a more conventional 'interleaved' complex 
format and the split format using the vDSP functions ctoz and ztoc, and 
I had planned to use these to convert the 2D FFT back to an interleaved 
format to simplify zero-padding in a preliminary version.  However there 
is no indication of what a 2D FFT will look like after processing the 
output with ztoc (the function that converts split to interleaved).

So my questions are:

1) can anyone give a hint on how to correctly zero pad split complex 
output of the vDSP FFT to achieve sinc interpolation?

2) am I overlooking any easier way to do interpolation using AltiVec 
functions.

thanks for any suggestions,  code fragment is posted below

ER


   // This code fragment seems to work ok - it should return the 
original image
   //  I can't figure out how to properly zero pad though

  // Set the input vector of length n: [(1+j1),...,(1+j1)], where j^2 = -
1.
  UInt32 ix;
  for ( ix = 0; ix < n; ix++ ) {
    origReal[ix] = (float)bytes[ix];
}  }

  // Treat originalRealInput as interleaved complex number. change the 
format to split
  // so that we can pass it to fft functions [sic]
  ctoz( (COMPLEX*) origReal, 2, &A, 1, nOver2 );
  
  // Set up the required memory for the FFT routines and check its 
availability.
  FFTSetup setup = create_fftsetup ( MAX (log2nr,log2nc ), FFT_RADIX2 );
  if( setup == NULL ) {
    NSLog(@"\nFFT_Setup failed to allocate enough memory.\n" );
    exit(EXIT_FAILURE);
}  }

  // Compute 2D FFT of original image
  fft2d_zrip ( setup, &A, rowStride, columnStride, log2nc, log2nr, FFT_
FORWARD );

  // Temporarily convert back to interleaved complex format
  COMPLEX *tempA = (COMPLEX*)calloc(n , sizeof(COMPLEX));
  ztoc( &A, 1, tempA, 2, nOver2);
  
  // Zero pad FFT the interleaved complex FFT data
  // But how?	
  
  // Now restore to split format
  COMPLEX_SPLIT A2;
  A2.realp = (float*)calloc(nOver2, sizeof(float));
  A2.imagp = (float*)calloc(nOver2, sizeof(float));
  ctoz( tempA, 2, &A2, 1, nOver2 );
  
  // Now compute inverse FFT
  fft2d_zrip ( setup, &A2, rowStride, columnStride, log2nc, log2nr, FFT_
INVERSE );
  
  // scale the result
  float scale = 1.0 / (float) (2*n);    // get scale factor
  vsmul( A2.realp, 1, &scale, A2.realp, 1, nOver2);
  vsmul( A2.imagp, 1, &scale, A2.imagp, 1, nOver2);

  // The output signal is now in a split real form.  Use the function
  // ztoc to get a split real vector.
  ztoc( &A2, 1, ( COMPLEX * ) newReal, 2, nOver2);