Hi I am looking for an optimal FFT algorithm with low computation complexity and significant simplicity for hardware implementation. there are several FFT algorithm like Cooley-Tukey or Prime-factor or Split-radix ... each have several version with different complexity. would you mind to help me for finding better algorithm ?
Optimal FFT algorithm for hardware implementation ?
Started by ●May 2, 2008
Reply by ●May 2, 20082008-05-02
>Hi > >I am looking for an optimal FFT algorithm with low computation >complexity and significant simplicity for hardware implementation. >there are several FFT algorithm like Cooley-Tukey or Prime-factor or >Split-radix ... each have several version with different complexity. > >would you mind to help me for finding better algorithm ? >The short answer is that "optimal" depends upon your platform for implementation. Go to www.fftw.org. The FFTW code is adaptable and often provides the fastest implementation (on average) for a variety of sizes. Before execution it tests a variety of potential plans and determines the fastest and uses that. In my experience, what often seems to be the fastest is not due to memory/cache issues as well as instruction scheduling, etc. Mark
Reply by ●May 2, 20082008-05-02
On May 2, 9:04 am, "markt" <tak...@pericle.com> wrote:> >Hi > > >I am looking for an optimal FFT algorithm with low computation > >complexity and significant simplicity for hardware implementation. > >there are several FFT algorithm like Cooley-Tukey or Prime-factor or > >Split-radix ... each have several version with different complexity. > > >would you mind to help me for finding better algorithm ? > > The short answer is that "optimal" depends upon your platform for > implementation. > > Go towww.fftw.org. The FFTW code is adaptable and often provides the > fastest implementation (on average) for a variety of sizes. Before > execution it tests a variety of potential plans and determines the fastest > and uses that. In my experience, what often seems to be the fastest is not > due to memory/cache issues as well as instruction scheduling, etc. > > MarkHi Mark Thank you for reply, I didn't speak about target hardware platform because I want to design my own FFT processor, I don't want to use FFT in my application because FFT is my application :) I found intresting C implementions in FFTW, did you think they have required computation and implementation simlicity for my work ?
Reply by ●May 3, 20082008-05-03
The FFTW source code and design will be a good source but in general for new hardware design it may not provide too much insight because it was designed to be an optimal FFT for current processors (ie Intel, etc). There is an interesting forward in IEEE SP Magazine from the last couple months that touches on the subject of the FFTW. If you are experimenting with new hardware design you have much more flexibility in your data flow, computation, etc. You will have to balance resources, size, power consumption, etc. If you do some searches you will find interesting implementation based on CORDIC, Quaterion blocks, and others. Good Luck.
Reply by ●May 4, 20082008-05-04
The FFTW source code and design will be a good source but in general for new hardware design it may not provide too much insight because it was designed to be an optimal FFT for current processors (ie Intel, etc). There is an interesting forward in IEEE SP Magazine from the last couple months that touches on the subject of the FFTW. If you are experimenting with new hardware design you have much more flexibility in your data flow, computation, etc. You will have to balance resources, size, power consumption, etc. If you do some searches you will find interesting implementation based on CORDIC, Quaterion blocks, and others. Good Luck.
Reply by ●May 5, 20082008-05-05
>I found intresting C implementions in FFTW, did you think they have >required computation and implementation simlicity for my work ?Hard to say. As Chris noted, if you're developing your own hardware processor it's a different game. FFTW optimizes itself based on what it finds to be the fastest for your specific platform. It has a variety of builtins that are "best" for most of the mainstream processors. I particularly worked on porting the FFTW code to work with the SIMD processor in a MIPS-based system on a chip (Broadcom's quad-core 1480). If you don't have a specific architecture, then FFTW isn't really applicable in that sense. FFTW has just about every major algorithm in its code-base, but it is written through OCAML (as I recall), and difficult to decipher for a port. When I ported the SIMD code to the 1480, I had to actually take the PowerPC code and substitute all of my assembly primitives, with some modification in the routines, to get that to work. Of course, after all that work Code Sourcery comes out with a public port NOW, rather than 18 months ago. ;) Mark
Reply by ●May 7, 20082008-05-07
On May 5, 8:54�am, "markt" <tak...@pericle.com> wrote:> >I found intresting C implementions in FFTW, did you think they have > >required computation and implementation simlicity for my work ? > > Hard to say. �As Chris noted, if you're developing your own hardware > processor it's a different game. �FFTW optimizes itself based on what it > finds to be the fastest for your specific platform. �It has a variety of > builtins that are "best" for most of the mainstream processors. �I > particularly worked on porting the FFTW code to work with the SIMD > processor in a MIPS-based system on a chip (Broadcom's quad-core 1480). �If > you don't have a specific architecture, then FFTW isn't really applicable > in that sense. �FFTW has just about every major algorithm in its code-base, > but it is written through OCAML (as I recall), and difficult to decipher > for a port. �When I ported the SIMD code to the 1480, I had to actually > take the PowerPC code and substitute all of my assembly primitives, with > some modification in the routines, to get that to work. �Of course, after > all that work Code Sourcery comes out with a public port NOW, rather than > 18 months ago. ;) > > MarkHi Mark Thanks for your reply, seems you done hard but interesting work, congratulation! ARH
Reply by ●May 7, 20082008-05-07
Hi Chris Sorry for my late reply, I am new to FFT and DSP algorithms, so I try to found more about CORDIC algorithm in wiki and I found that this algorithm developed to calculate hyperbolic and trigonometric functions no hardware multiplier is available but I have no limitation in my design so I want to use multiplier for upgrading matrix multiplication performance. Regarding to Quaternion I found no related text in the net, I think it would be an extension of complex numbers ! I think it would be better for me to forget optimize FFT algorithm, because these algorithms have complexity which aren�t suitable for starting a design. So I prefer to start with a very simple FFT algorithm, would you mind help me to figure out this from jungle of information in the net ? Regards ARH On May 4, 4:21 pm, chris <chris.fel...@gmail.com> wrote:> The FFTW source code and design will be a good source but in general > for new hardware design it may not provide too much insight because > it > was designed to be an optimal FFT for current processors (ie Intel, > etc). There is an interesting forward in IEEE SP Magazine from the > last couple months that touches on the subject of the FFTW. > > If you are experimenting with new hardware design you have much more > flexibility in your data flow, computation, etc. You will have to > balance resources, size, power consumption, etc. If you do some > searches you will find interesting implementation based on CORDIC, > Quaterion blocks, and others. > > Good Luck.
Reply by ●May 7, 20082008-05-07
>I think it would be better for me to forget optimize FFT algorithm, >because these algorithms have complexity which aren=92t suitable for >starting a design. So I prefer to start with a very simple FFT >algorithm, would you mind help me to figure out this from jungle of >information in the net ?There's another thread in the first page or two here that gets in to some of the concepts involved with optimizing an FFT. Steve Johnson, one of the authors of FFTW, has made several posts in this regard. The term optimal, ultimately, needs to be referred to some parameter such as FLOPS or speed. Fewer FLOPS does not always equal faster speed for a variety of reasons. The simplest FFT design (sort of) is a standard radix-2 decomposition assuming you have a power of 2 data set. Every power of 2 in data size requires another column of radix-2 butterflies, but the butterflies are easy to construct (the "hard" part is determining the mapping from column to column). Mark
Reply by ●May 8, 20082008-05-08
Thanks again, Chris also sent a useful mail for me which mentioned similar solution, for gathering all of your respect informations together I was paste his email here: Chris: " If you are not too worried about hardware resources and want a fairly straight forward implementation, use the cooley-tukey method. This is the original FFT algorithm. It is usually implemented in code with 2 for loops. For hardware you can unroll the loops and have it fully parallel. For an FFT, depending on the radix, there is usually two parts. FIrst a 2-point DFT (again depending on radix) then the basic butterfly algebra. If you start with a simple 8 point FFT, you can implement 4 parallel 2-point FFT and then the butterfly. The book "DSP Integrated Circuits" by Lars Wanhammar, has a case study at the end of each chapter, piece by piece, developing different FFT cores. I don't know of any particular online resources, the online Numerical Recipes (http://www.nrbook.com/nr3/) and DSPGuide (http:// www.dspguide.com/) will be good general information (not hardware specific) online resources. Hope that helps."






