# Calculate MACs in MATLAB

Started by April 29, 2008
```I am doing a project in matlab to compare the computational complexities of
two systems the problem I have is that I need to work out how many multiply
and adds a 150 point ifft uses in matlab from what I can work out MATLAB
uses the Prime Factor algorithm to do this but I am having trouble working
out how many MACS are done. Does matlab have the ability to feed back this
information?

Thanks

Pikey
```
```>I am doing a project in matlab to compare the computational complexities
of
>two systems the problem I have is that I need to work out how many
multiply
>and adds a 150 point ifft uses in matlab from what I can work out MATLAB
>uses the Prime Factor algorithm to do this but I am having trouble
working
>out how many MACS are done. Does matlab have the ability to feed back
this
>information?
>
>Thanks
>
>Pikey
>

Well, MATLAB used to allow you to calculate the number of flops (floating
point operations), but as the help on flops indicates, since the inclusion
of LAPACK it is no longer practical.  You can time the two functions using
tic (before the function call) and toc (after the function call) and
compare times, which should give you an approximation of relative
complexity assuming the arrays have the same memory structure.

Mark
```
```>>I am doing a project in matlab to compare the computational complexities
>of
>>two systems the problem I have is that I need to work out how many
>multiply
>>and adds a 150 point ifft uses in matlab from what I can work out
MATLAB
>>uses the Prime Factor algorithm to do this but I am having trouble
>working
>>out how many MACS are done. Does matlab have the ability to feed back
>this
>>information?
>>
>>Thanks
>>
>>Pikey
>>
>
>Well, MATLAB used to allow you to calculate the number of flops
(floating
>point operations), but as the help on flops indicates, since the
inclusion
>of LAPACK it is no longer practical.  You can time the two functions
using
>tic (before the function call) and toc (after the function call) and
>compare times, which should give you an approximation of relative
>complexity assuming the arrays have the same memory structure.
>
>Mark
Agree the tic toc is useful but I am doing one method by hand and the
other practically on matlab so I can't use it.

Pikey

```
```>Agree the tic toc is useful but I am doing one method by hand and the
>other practically on matlab so I can't use it.

Well, you can time your functions then use an approximation to calculate
the equivalent FLOP count.  FFTW uses MFLOPS = 5*N*log2(N)/time (in us) for
complex data, divide by 2 for real data.  You can force MATLAB to use FFTW,
though I do not know if there are any timing differences between FFTs
within MATLAB, or whether the command-line executable for FFTW is faster or
slower.

Mark
```
```>>Agree the tic toc is useful but I am doing one method by hand and the
>>other practically on matlab so I can't use it.
>
>Well, you can time your functions then use an approximation to calculate
>the equivalent FLOP count.  FFTW uses MFLOPS = 5*N*log2(N)/time (in us)
for
>complex data, divide by 2 for real data.  You can force MATLAB to use
FFTW,
>though I do not know if there are any timing differences between FFTs
>within MATLAB, or whether the command-line executable for FFTW is faster
or
>slower.
>
>Mark
>
thats ok for an approximation I think I might ignore the matlab part and
strive to find out what the least number of MACS for a 150 point fft using
complex data which will solve my problem(s) If any one know the answer I
would appreciate it cheers
Pikey
```
```>thats ok for an approximation I think I might ignore the matlab part and
>strive to find out what the least number of MACS for a 150 point fft
using
>complex data which will solve my problem(s) If any one know the answer I
>would appreciate it cheers
>Pikey

5*N*log2(N) is the asymptotic number of FLOPS for a radix-2 Cooley-Tukey
algorithm (approaches this value for large N).  Different algorithms will
perform (speed) differently, sometimes using more FLOPS but at faster
speeds.  This is usually due to memory architecture and the number of

There are many ways to do a 150 point FFT, so there's no single answer.  N
= 150 decomposes into 5*5*3*2, and you can configure your FFT in just about
any combination of these primes, each probably has a different number of
FLOPS.

Good luck. :)

Mark
```
```On Apr 29, 1:53 pm, "pikey" <pi...@amor888.fsnet.co.uk> wrote:
> I am doing a project in matlab to compare the computational complexities of
> two systems the problem I have is that I need to work out how many multiply
> and adds a 150 point ifft uses in matlab from what I can work out MATLAB
> uses the Prime Factor algorithm to do this but I am having trouble working
> out how many MACS are done. Does matlab have the ability to feed back this
> information?

You don't need to try to reverse-engineer what algorithm Matlab's FFT
is using; you can just download the source code from fftw.org (or just
read the papers on the web site, which describe the algorithms used).
Actually, "doc fft" within Matlab has some information on the
algorithms too.

Short answer: FFTW uses the prime-factor algorithm for small sizes (<=
16) within its codelets at the leaves of its recursion.  Larger
composite sizes are broken down into smaller sizes using mixed-radix
Cooley-Tukey.  It is O(N log N) for all N.

If you call FFTW from C, there is a function fftw_flops to get an
count of arithmetic operations if you really need that for some
reason.  (Some other posters have suggested timing the FFT, but that's
pretty useless except for very crude estimates of flop counts.)
Running it on my machine, FFTW 3.1.2 reports that it requires 3100
additions and 1480 multiplications for an N=150 complex-data FFT.

If you really care about N=150, you can generate a hard-coded routine
using FFTW's code generator.  For complex data it generates a routine
with 3012 additions and 1304 multiplications.  (This is not optimal,
although it should be reasonably close.  If I remember correctly, it
turned out that the algorithm with the lowest-known arithmetic count
for size 5 actually performed worse in practice than an algorithm with
more arithmetic, as the latter pipelined better last we checked.)

Regards,
Steven G. Johnson
```
```On May 2, 1:27 pm, "Steven G. Johnson" <stev...@alum.mit.edu> wrote:
> If you really care about N=150, you can generate a hard-coded routine
> using FFTW's code generator.  For complex data it generates a routine
> with 3012 additions and 1304 multiplications.  (This is not optimal,
> although it should be reasonably close.  If I remember correctly, it
> turned out that the algorithm with the lowest-known arithmetic count
> for size 5 actually performed worse in practice than an algorithm with
> more arithmetic, as the latter pipelined better last we checked.)

Actually, I take that back; the difference was for sizes 7 and 11.
For sizes 3 and 5 our kernels match the lowest known flop count, as
far as I can tell (although there is no proof that any of these
algorithms are optimal).

Steven
```
```>(although there is no proof that any of these
>algorithms are optimal).

Yeah, as you note, even "optimal" in terms of the least amount of FLOPS is
not necessarily "optimal" in terms of speed.  I had a professor once that
used to get quite ticked every time he heard somebody mention "optimal"
without reference to some criteria.  Set your minimum FLOP count algorithm
up in such a way that your next instruction is always waiting for
completion of the current instruction (a pipeline stall) and you'll find it
may not work very well compared to another algorithm with more required
FLOPS, but a better overall flow.

Mark
```
```Steven G. Johnson wrote:
(snip)

> Short answer: FFTW uses the prime-factor algorithm for small sizes (<=
> 16) within its codelets at the leaves of its recursion.  Larger
> composite sizes are broken down into smaller sizes using mixed-radix
> Cooley-Tukey.  It is O(N log N) for all N.

An interesting statement.  O() notation gives the order as
N goes to infinity, and it may or may not be of any use
for small N.  Now, is infinity prime or composite?

-- glen

```