Eric Raas wrote:
> 
> Does anybody use these functions?  Searching Google Groups shows no hits (
> other than this post) for fft2d_rip, fft2d_zrip, fft_rip, and fft_zrip.  
> I did find some discussion on web forums but nothing very extensive.  
> There is a paper out  describing (presumably) these algorithms which one 
> should be able to find by searching for g4fft.pdf
> 

The routines described in g4fft.pdf are Apple's early attempt at 
optimised FFT's for AltiVec, and you can download the source for these 
from developer.apple.com.

The routines in vDSP are somewhat higher performance and were licensed 
from a third party.

Paul

..

>> interleaved complex:
>> riririririririri     -> riririririririri0000000000000000
>> 
>> split complex:
>> rrrrrrrriiiiiiii     -> rrrrrrrr00000000iiiiiiii00000000
>> 
> 
> Thanks - for some reason (and perhaps I have some other bug) this 
> approach does not seem to work for the 2D case I am dealing with.  For 
> example

..

I finally got this to work using a simpler workaround - instead of using 
the fft function for real input (fft2d_zrip) I just used the simpler 
complex version (fft2d_zip) which does not produce oddly packed output 
arrays.  I have to copy my real image data into a complex variable, but 
the overhead does not seem to be that great.  Still I wonder whether 
there is a performance difference between the real and complex forms of 
Apple's vDSP FFT functions.  I guess the complex form takes a bit more 
memory, but my images are generally smaller than 512x512.

The output of fft2d_zip produces a 2D array in which the four quadrants 
are rearranged according to a fairly obvious scheme (the center of 
frequency space maps to the four corners of the array - is clear using 
my handy sinc test pattern).

Does anybody use these functions?  Searching Google Groups shows no hits (
other than this post) for fft2d_rip, fft2d_zrip, fft_rip, and fft_zrip.  
I did find some discussion on web forums but nothing very extensive.  
There is a paper out  describing (presumably) these algorithms which one 
should be able to find by searching for g4fft.pdf

ER

> Just treat the real and imaginary arrays separately.  For example, if 
> you are doing a 2X interpolation:
> 
> interleaved complex:
> riririririririri     -> riririririririri0000000000000000
> 
> split complex:
> rrrrrrrriiiiiiii     -> rrrrrrrr00000000iiiiiiii00000000
> 

Thanks - for some reason (and perhaps I have some other bug) this 
approach does not seem to work for the 2D case I am dealing with.  For 
example

interpolating

riri
riri

in the row direction by a factor of two by zero-padding like this:

riri
riri
0000
0000

does not seem to work (lots of severe artifact with alternating bands at 
Nyquist).  

However this:

riri
0000
0000
riri

Almost works (looks like interpolated image with some artifact).

When I use a test image consisting of a real-valued 2D sinc function (
should have 2DFT equal to rect function) it looks like the array is 
stored in a peculiar way - the first N/4 rows contain what looks like 
the zero'th  to N/4'th  rows in ascending order.  The next N/4 rows 
appear to contain the remaining rows, but in descending rather than 
ascending order (the outermost rows of the frequency domain are at the 
center of the image, and the top and bottom rows appear to contain rows 
that in reality would be adjacent).  

Placing rows of padding zeros at the *middle* of the array appears to 
almost work, but there is still a distortion of the image.  My SINC <-> 
RECT test also suggests that the first column of the 2D FFT output is 
different from the other columns.  I know that Apple uses a scheme to 
exploit the Hermitian symmetry of real functions, and this is documented 
briefly in the vDSP pdf docs.  However what I see does not seem to 
correspond to the scheme described in the document.

Thanks again for the suggestion though - I was not sure if the functions 
ctoz and ztoc simply convert between interleaved or split complex 
representations, or whether they might also reverse some of the special 
packing output Apple's 1D and 2D FFT routines.  Your post suggests that 
those conversion functions only change the interleaving - they do not 
account for FFT packing (still not sure about this though - again wish 
the docs were better).

Will keep plugging...  I think the 'center-out' scheme described above 
is almost it.

In article <20030919224434922-0400@news.verizon.net>,
 Eric Raas <foo@bar.net> wrote:

| The FFT routine 
| in question makes use of what Apple calls a "split complex vector" 
| format, which is a reordering of array elements to improve performance.  
| I know that such reordering is fairly common in high performance FFT's, 
| but Apple's scheme is not very well documented. 

The reordering should just be collecting all of the real components in 
one array, followed by all the imaginary components in a second array.

| 1) can anyone give a hint on how to correctly zero pad split complex 
| output of the vDSP FFT to achieve sinc interpolation?

Just treat the real and imaginary arrays separately.  For example, if 
you are doing a 2X interpolation:

interleaved complex:
riririririririri     -> riririririririri0000000000000000

split complex:
rrrrrrrriiiiiiii     -> rrrrrrrr00000000iiiiiiii00000000


   -- Tim Olson

Hi -

I am trying to use Apple's AltiVec FFT routine fft2d_zrip in a scheme 
for doing 2D sinc interpolation by zero-padding the 2D DFT of an image.  
I am able to do forward and reverse 2D FFT's and recover the original 
image, but when I zero pad I get clearly wrong results.  The FFT routine 
in question makes use of what Apple calls a "split complex vector" 
format, which is a reordering of array elements to improve performance.  
I know that such reordering is fairly common in high performance FFT's, 
but Apple's scheme is not very well documented.  

It is possible to go between a more conventional 'interleaved' complex 
format and the split format using the vDSP functions ctoz and ztoc, and 
I had planned to use these to convert the 2D FFT back to an interleaved 
format to simplify zero-padding in a preliminary version.  However there 
is no indication of what a 2D FFT will look like after processing the 
output with ztoc (the function that converts split to interleaved).

So my questions are:

1) can anyone give a hint on how to correctly zero pad split complex 
output of the vDSP FFT to achieve sinc interpolation?

2) am I overlooking any easier way to do interpolation using AltiVec 
functions.

thanks for any suggestions,  code fragment is posted below

ER


   // This code fragment seems to work ok - it should return the 
original image
   //  I can't figure out how to properly zero pad though

  // Set the input vector of length n: [(1+j1),...,(1+j1)], where j^2 = -
1.
  UInt32 ix;
  for ( ix = 0; ix < n; ix++ ) {
    origReal[ix] = (float)bytes[ix];
}  }

  // Treat originalRealInput as interleaved complex number. change the 
format to split
  // so that we can pass it to fft functions [sic]
  ctoz( (COMPLEX*) origReal, 2, &A, 1, nOver2 );
  
  // Set up the required memory for the FFT routines and check its 
availability.
  FFTSetup setup = create_fftsetup ( MAX (log2nr,log2nc ), FFT_RADIX2 );
  if( setup == NULL ) {
    NSLog(@"\nFFT_Setup failed to allocate enough memory.\n" );
    exit(EXIT_FAILURE);
}  }

  // Compute 2D FFT of original image
  fft2d_zrip ( setup, &A, rowStride, columnStride, log2nc, log2nr, FFT_
FORWARD );

  // Temporarily convert back to interleaved complex format
  COMPLEX *tempA = (COMPLEX*)calloc(n , sizeof(COMPLEX));
  ztoc( &A, 1, tempA, 2, nOver2);
  
  // Zero pad FFT the interleaved complex FFT data
  // But how?	
  
  // Now restore to split format
  COMPLEX_SPLIT A2;
  A2.realp = (float*)calloc(nOver2, sizeof(float));
  A2.imagp = (float*)calloc(nOver2, sizeof(float));
  ctoz( tempA, 2, &A2, 1, nOver2 );
  
  // Now compute inverse FFT
  fft2d_zrip ( setup, &A2, rowStride, columnStride, log2nc, log2nr, FFT_
INVERSE );
  
  // scale the result
  float scale = 1.0 / (float) (2*n);    // get scale factor
  vsmul( A2.realp, 1, &scale, A2.realp, 1, nOver2);
  vsmul( A2.imagp, 1, &scale, A2.imagp, 1, nOver2);

  // The output signal is now in a split real form.  Use the function
  // ztoc to get a split real vector.
  ztoc( &A2, 1, ( COMPLEX * ) newReal, 2, nOver2);