I am doing a project in matlab to compare the computational complexities of two systems the problem I have is that I need to work out how many multiply and adds a 150 point ifft uses in matlab from what I can work out MATLAB uses the Prime Factor algorithm to do this but I am having trouble working out how many MACS are done. Does matlab have the ability to feed back this information? Thanks Pikey
Calculate MACs in MATLAB
Started by ●April 29, 2008
Reply by ●April 29, 20082008-04-29
>I am doing a project in matlab to compare the computational complexitiesof>two systems the problem I have is that I need to work out how manymultiply>and adds a 150 point ifft uses in matlab from what I can work out MATLAB >uses the Prime Factor algorithm to do this but I am having troubleworking>out how many MACS are done. Does matlab have the ability to feed backthis>information? > >Thanks > >Pikey >Well, MATLAB used to allow you to calculate the number of flops (floating point operations), but as the help on flops indicates, since the inclusion of LAPACK it is no longer practical. You can time the two functions using tic (before the function call) and toc (after the function call) and compare times, which should give you an approximation of relative complexity assuming the arrays have the same memory structure. Mark
Reply by ●April 29, 20082008-04-29
>>I am doing a project in matlab to compare the computational complexities >of >>two systems the problem I have is that I need to work out how many >multiply >>and adds a 150 point ifft uses in matlab from what I can work outMATLAB>>uses the Prime Factor algorithm to do this but I am having trouble >working >>out how many MACS are done. Does matlab have the ability to feed back >this >>information? >> >>Thanks >> >>Pikey >> > >Well, MATLAB used to allow you to calculate the number of flops(floating>point operations), but as the help on flops indicates, since theinclusion>of LAPACK it is no longer practical. You can time the two functionsusing>tic (before the function call) and toc (after the function call) and >compare times, which should give you an approximation of relative >complexity assuming the arrays have the same memory structure. > >MarkAgree the tic toc is useful but I am doing one method by hand and the other practically on matlab so I can't use it. Pikey
Reply by ●April 30, 20082008-04-30
>Agree the tic toc is useful but I am doing one method by hand and the >other practically on matlab so I can't use it.Well, you can time your functions then use an approximation to calculate the equivalent FLOP count. FFTW uses MFLOPS = 5*N*log2(N)/time (in us) for complex data, divide by 2 for real data. You can force MATLAB to use FFTW, though I do not know if there are any timing differences between FFTs within MATLAB, or whether the command-line executable for FFTW is faster or slower. Mark
Reply by ●May 1, 20082008-05-01
>>Agree the tic toc is useful but I am doing one method by hand and the >>other practically on matlab so I can't use it. > >Well, you can time your functions then use an approximation to calculate >the equivalent FLOP count. FFTW uses MFLOPS = 5*N*log2(N)/time (in us)for>complex data, divide by 2 for real data. You can force MATLAB to useFFTW,>though I do not know if there are any timing differences between FFTs >within MATLAB, or whether the command-line executable for FFTW is fasteror>slower. > >Mark >thats ok for an approximation I think I might ignore the matlab part and strive to find out what the least number of MACS for a 150 point fft using complex data which will solve my problem(s) If any one know the answer I would appreciate it cheers Pikey
Reply by ●May 1, 20082008-05-01
>thats ok for an approximation I think I might ignore the matlab part and >strive to find out what the least number of MACS for a 150 point fftusing>complex data which will solve my problem(s) If any one know the answer I >would appreciate it cheers >Pikey5*N*log2(N) is the asymptotic number of FLOPS for a radix-2 Cooley-Tukey algorithm (approaches this value for large N). Different algorithms will perform (speed) differently, sometimes using more FLOPS but at faster speeds. This is usually due to memory architecture and the number of reads/writes that your algorithm requires. There are many ways to do a 150 point FFT, so there's no single answer. N = 150 decomposes into 5*5*3*2, and you can configure your FFT in just about any combination of these primes, each probably has a different number of FLOPS. Good luck. :) Mark
Reply by ●May 2, 20082008-05-02
On Apr 29, 1:53 pm, "pikey" <pi...@amor888.fsnet.co.uk> wrote:> I am doing a project in matlab to compare the computational complexities of > two systems the problem I have is that I need to work out how many multiply > and adds a 150 point ifft uses in matlab from what I can work out MATLAB > uses the Prime Factor algorithm to do this but I am having trouble working > out how many MACS are done. Does matlab have the ability to feed back this > information?You don't need to try to reverse-engineer what algorithm Matlab's FFT is using; you can just download the source code from fftw.org (or just read the papers on the web site, which describe the algorithms used). Actually, "doc fft" within Matlab has some information on the algorithms too. Short answer: FFTW uses the prime-factor algorithm for small sizes (<= 16) within its codelets at the leaves of its recursion. Larger composite sizes are broken down into smaller sizes using mixed-radix Cooley-Tukey. It is O(N log N) for all N. If you call FFTW from C, there is a function fftw_flops to get an count of arithmetic operations if you really need that for some reason. (Some other posters have suggested timing the FFT, but that's pretty useless except for very crude estimates of flop counts.) Running it on my machine, FFTW 3.1.2 reports that it requires 3100 additions and 1480 multiplications for an N=150 complex-data FFT. If you really care about N=150, you can generate a hard-coded routine using FFTW's code generator. For complex data it generates a routine with 3012 additions and 1304 multiplications. (This is not optimal, although it should be reasonably close. If I remember correctly, it turned out that the algorithm with the lowest-known arithmetic count for size 5 actually performed worse in practice than an algorithm with more arithmetic, as the latter pipelined better last we checked.) Regards, Steven G. Johnson
Reply by ●May 2, 20082008-05-02
On May 2, 1:27 pm, "Steven G. Johnson" <stev...@alum.mit.edu> wrote:> If you really care about N=150, you can generate a hard-coded routine > using FFTW's code generator. For complex data it generates a routine > with 3012 additions and 1304 multiplications. (This is not optimal, > although it should be reasonably close. If I remember correctly, it > turned out that the algorithm with the lowest-known arithmetic count > for size 5 actually performed worse in practice than an algorithm with > more arithmetic, as the latter pipelined better last we checked.)Actually, I take that back; the difference was for sizes 7 and 11. For sizes 3 and 5 our kernels match the lowest known flop count, as far as I can tell (although there is no proof that any of these algorithms are optimal). Steven
Reply by ●May 5, 20082008-05-05
>(although there is no proof that any of these >algorithms are optimal).Yeah, as you note, even "optimal" in terms of the least amount of FLOPS is not necessarily "optimal" in terms of speed. I had a professor once that used to get quite ticked every time he heard somebody mention "optimal" without reference to some criteria. Set your minimum FLOP count algorithm up in such a way that your next instruction is always waiting for completion of the current instruction (a pipeline stall) and you'll find it may not work very well compared to another algorithm with more required FLOPS, but a better overall flow. Mark
Reply by ●May 6, 20082008-05-06
Steven G. Johnson wrote: (snip)> Short answer: FFTW uses the prime-factor algorithm for small sizes (<= > 16) within its codelets at the leaves of its recursion. Larger > composite sizes are broken down into smaller sizes using mixed-radix > Cooley-Tukey. It is O(N log N) for all N.An interesting statement. O() notation gives the order as N goes to infinity, and it may or may not be of any use for small N. Now, is infinity prime or composite? -- glen