"James Van Buskirk" <not_valid@comcast.net> wrote in message news:YOednbtmQ_OJY-XfRVn-hA@comcast.com...> Actually, for symmetric or antisymmetric inputs one would > take RF(n)=(RF(n)*X(n))*X_inverse(n) because computing > the action of X_inverse(n) is a no-op for such data. It > is left as an exercise to the reader to show that RF(n)*X(n) > is also block diagonal.<gack> I was originally thinking about another factorization where the X matrix would be symmetric. Since that is not the case in the given factorization, the correct factorization for symmetric or antisymmetric data should have been: RF(n) = (RF(n)*transpose(X(n)))*transpose(X_inverse(n)) and then I should have said that computing the action of transpose(X_inverse(n)) is a no-op for symmetric or antisymmetric data and that RF(n)*transpose(X(n)) is block diagonal. Sorry for the inaccuracies. -- write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, & 6.0134700243160014d-154/),(/'x'/)); end
FFT By Decimation In Time paper or succinct explanitory resource
Started by ●May 1, 2005
Reply by ●May 4, 20052005-05-04
Reply by ●May 6, 20052005-05-06
(A real-symmetric/anti-symmetric DFT is also called a type-I DCT/DST, up to normalization factors.) You're right that the standard algorithm (ala Numerical Recipes) to compute a real-even DFT of length N (N/2+1 non-redundant data points) via a real-data FFT of length N (or, equivalently, a complex FFT of length N/2 by the usual trick) suffers from accuracy problems. In particular, we've found it to have rms errors that grow as O(sqrt(N)) rather than O(sqrt(log(N))) for the ordinary FFT. Better algorithms are mostly based on taking an ordinary FFT algorithm (e.g. Cooley-Tukey) and pruning out the redundant computations for symmetric data. Many of the DCT algorithms in the literature boil down to this, and anything based on this approach will be as accurate as the ordinary FFT. For example, a particularly simple form is achieved if you apply the split-radix algorithm; after pruning the redundant computations for a real-symmetric DFT of length N (N/2+1 data points), you are left with a real-data DFT of length N/4 and a real-symmetric DFT of length N/2 (N/4+1 data points), the latter being computed recursively by the same algorithm. A DIF version of this approach was described by Duhamel in 1986 (IEEE Trans. ASSP 34, 285), and the next release of FFTW will implement a DIT version. The main idea, in FFTW's case, being to convert to ordinary FFTs (which are highly optimized) as quickly as possible (the recursive DCT-I's apply to a smaller and smaller fraction of the data). (I don't recall reading the Tolimieri paper you mention, but it sounds basically like a radix-2 variant of this approach. I'm not sure what you mean when you say it doesn't take advantage of the symmetries - if the sub-transforms are real-even DFTs, then that is exploiting the symmetry right there.) FFTW's generator can also automatically generate DCT-I codelets (the symbolic optimizations in the generator are powerful enough to prune out the redundant computations directly from any complex-data FFT algorithm), if you just want a fast DCT-I of a small size. (See also our recent paper in Proc. IEEE 93, 216 (2005).) Cordially, Steven G. Johnson
Reply by ●May 6, 20052005-05-06
Steve, thanks for the reply. Please see my comments below. stev...@alum.mit.edu wrote:> You're right that the standard algorithm (ala Numerical Recipes) to > compute a real-even DFT of length N (N/2+1 non-redundant data points) > via a real-data FFT of length N (or, equivalently, a complex FFT of > length N/2 by the usual trick) suffers from accuracy problems. In > particular, we've found it to have rms errors that grow as O(sqrt(N)) > rather than O(sqrt(log(N))) for the ordinary FFT.Primarily, I'm not concerned with accuracy problems but with execution speed. Here is my take on it: The DFT is a unitary transform acting on a N-dimensional complex vector space H. We can view H as a 2N-dimensional real vector space. DFT : H -> H. If we are interested in real vectors only, then we can reduce the DFT to map the real vectors (call this space V) to the hermitian symmetric complex vectors (call this space W). Both V and W are N-dimensional real vector spaces, as well as subsets of H. Such an algorithm I call rFFT, and I expect it to take about c1 N log N steps, where c1 is some constant. If we are only interested in mapping real symmetric / antisymmetric vectors (call these vector spaces V_s / V_a) to symmetric real / imaginary vectors (W_r / W_i), we are further reducing the DFT to map between two N/2-dimensional real vector spaces: srDFT :=3D DFT| : V_s ---> W_r, |V_s arDFT :=3D DFT| : V_a ---> W_i, |V_a where dim(V_s) =3D dim(W_r) =3D dim(V_a) =3D dim(W_i) =3D N/2. Therefore, t= he maps srDFT and arDFT can be computed by the rFFT algorithm in c1 N/2 log N/2 steps. We do however need the transforms A_s : V ---> V_s, A_a : V ---> V_a, B_r : W_r ---> W, B_i : W_i ---> W. One can immagine that A_s and A_a are 2N x N/2 matrices, while B_r and B_i are N/2 x 2N matrices (remember that V and W are subsets of a 2N-dimensional vector space). Now the algorithm described in [2] constructs the maps A_s, A_a, B_r and B_i and shows that each of them can be computed in c2 N steps (they are sparse matrices). Therefore, the maps A_s =B0 srDFT =B0 B_r H ---> V ---> V_s ---> W_r ---> W ---> H and A_a =B0 arDFT =B0 B_i H ---> V ---> V_a ---> W_i ---> W ---> H can both be computed in (c1 N/2 log N/2 + c2 N) steps. Contrast this with the algorithm in [3] which takes 2 c1 N/2 log N/2. As it turns out, c2 is so small that this matters considerably already for small N. That's what I mean by not taking advantage of the symmetries.> FFTW's generator can also automatically generate DCT-I codelets (the > symbolic optimizations in the generator are powerful enough to prune > out the redundant computations directly from any complex-data FFT > algorithm), if you just want a fast DCT-I of a small size. (See also > our recent paper in Proc. IEEE 93, 216 (2005).)Cool. However, the reason why I want to use the given rFFT code without modifications is that it is highly optimized for the architecture at hand. It computes a single-precision real FFT on 1024 points on a 60 MHz processor (general purpose DSP) in what amounts to 144 mflops (I have computed this number from the defintion given on your webpage). Contrast this with the FFTW2 code which requires about 3250 mflops on a 2 GHz G5, or about 1750 mflops on a 2.8 GHz P4. This means that the 60 MHz processor is about 12 times faster in absolute time than the 2.8 GHz P4, and about 22 times faster in absolute timwe than the 2 GHz G5!> > Cordially, > Steven G. JohnsonBest regards, Andor [2] L. R. Rabiner, "On the Use of Symmetry in FFT Computation", IEEE Trans. on Acoustics, Speech, and Signal Processing, Vol. ASSP-27, No. 3, June 1979 [3] C. Lu, R. Tolimieri, "New Algorithms for the FFT computation of Symmetric and Translational Complex Conjugate Sequences",=20 Proc. ICASSP'92, pp. 13-16, March 1992
Reply by ●May 6, 20052005-05-06
I wrote: " Contrast this with the FFTW2 code which requires about 3250 mflops on a 2 GHz G5, or about 1750 mflops on a 2.8 GHz P4. This means that the 60 MHz processor is about 12 times faster in absolute time than the 2.8 GHz P4, and about 22 times faster in absolute timwe than the 2 GHz G5! " I just noticed that I misunderstood the "mflops" definition! The 60 MHz processor is 12 times _slower_ than the 2.8 GHz P4 and 22 times slower than the 2 GHz G5. I thought those numbers looked a bit strange ... sorry for the mixup. Andor.
Reply by ●May 6, 20052005-05-06
Made another mistake, please correct:> The DFT is a unitary transform acting on a N-dimensional complexvector> space H. We can view H as a 2N-dimensional real vector space. > > DFT : H -> H. > > If we are interested in real vectors only, then we can reduce the DFT > to map the real vectors (call this space V) to the hermitiansymmetric> complex vectors (call this space W). Both V and W are N-dimensional > real vector spaces, as well as subsets of H.Let's just say that H is M-dimensional, and in the following set N=M/2.> Such an algorithm I call rFFT, and I expect it to take about > c1 N log N steps, where c1 is some constant.So this should be c1 N log N = c1 M/2 log M/2, etc.
Reply by ●May 7, 20052005-05-07
Andor wrote:> can both be computed in (c1 N/2 log N/2 + c2 N) steps. Contrast this > with the algorithm in [3] which takes 2 c1 N/2 log N/2.If you're saying that the Tolimieri paper takes asymptotically twice as many operations as other real-symmetric FFT algorithms, then I believe you've miscounted. I just looked up the Tolimieri paper, and found the recurrence F(N) = 5/4 N + 2 F(N/2) + O(1) for the number of flops (adds+multiplies) of the algorithm they described, and from this you get that F(N) ~ 5/4 N lg N asymptotically, versus ~ 5 N lg N for the radix-2 Cooley-Tukey FFT algorithm of complex data. This is exactly the factor of 4 savings that one would expect from real data + symmetry. Of course, the Tolimieri paper is not optimal because it is based on radix 2. The complex split-radix algorithm has flops that go as ~ 4 N lg N and thus one can achieve ~ N lg N flops for real-symmetric data (c.f. Duhamel, etc.), which is 20% better than the 5/4 of Tolimier (but not a factor of 2). (In fact, in [1] we describe how to do the real-symmetric DFT in asymptotically 17/18 N lg N flops, which gains you another 5.6%, related to JVB's work.) Steven [1] http://www.fftw.org/newsplit.pdf
Reply by ●May 12, 20052005-05-12
stevenj@alum.mit.edu wrote:> Andor wrote: > > can both be computed in (c1 N/2 log N/2 + c2 N) steps. Contrastthis> > with the algorithm in [3] which takes 2 c1 N/2 log N/2. > > If you're saying that the Tolimieri paper takes asymptotically twiceas> many operations as other real-symmetric FFT algorithms, then Ibelieve> you've miscounted. > > I just looked up the Tolimieri paper, and found the recurrence F(N) = > 5/4 N + 2 F(N/2) + O(1) for the number of flops (adds+multiplies) of > the algorithm they described, and from this you get that F(N) ~ 5/4 N > lg N asymptotically, versus ~ 5 N lg N for the radix-2 Cooley-TukeyFFT> algorithm of complex data. This is exactly the factor of 4 savingsthat> one would expect from real data + symmetry.I see your point now. One has to apply Tolimieri's algorithm recursively. I still prefer Rabiner's algorithm (respectively Cooley's), as it does not need recursion. Instead it shows how to compute the DFT of a real symmetric vector of length N using some pre- and post-processing and a DFT of a real length N/2 vector (which can be computed using a N/4 point complex FFT).> Of course, the Tolimieri paper is not optimal because it is based on > radix 2. The complex split-radix algorithm has flops that go as ~ 4N> lg N and thus one can achieve ~ N lg N flops for real-symmetric data > (c.f. Duhamel, etc.), which is 20% better than the 5/4 of Tolimier(but> not a factor of 2). > > (In fact, in [1] we describe how to do the real-symmetric DFT in > asymptotically 17/18 N lg N flops, which gains you another 5.6%, > related to JVB's work.) > > Steven > > [1] http://www.fftw.org/newsplit.pdfCongratulations to James and you guys and thanks for the link! Best regards, Andor






