Jimm Avante wrote:
> Yes, I need to calculate the FFT as fast as possible.

As fast as possible from when to what?
And with which what computational resources versus
what data rate?

> I have never heard of colomn-wise DFT. I bet the performance of any new
> algorithm must match the efficiency of well-known Radix 2, 4, 8 algorithms
> before the new algorithm becomes feasible.

It's not a new algorithm; DFT's pre-dates FFT's.

IMHO. YMMV.
-- 
rhn A.T nicholson d.0.t C-o-M

Jimm Avante wrote:

...
> I read about DIT & DIF algorithm, both operates like but(0, N/2-1), but(1,
> N/2), ...
> meaning the butterfly can only start once the (N/2-1)th has been received.
> Or else I have got wrong reference books!?!?

Probably.

>
> I use Digital SIgnal Processing by S Salivahanan, A. Vallavarak. And
> Digital Signal Processing John G. Proakis, Dimitris K Manolakis

I have neither. You should find abundent information on this topic in
the net.

> In an OFDM communication system, the receiver receives 0, 1, 2, 3, ... N-1.
> Of course starting BUT operation at (N/2-1)th doesn't need extra hardware,
> but eliminating this latency might be useful.
>
>
>
>
>
> > I guess your aim is to produce FFT bins as fast as possible, without
> > waiting for the whole input to be available. Obviously, the full FFT
> > output vector is only available once the whole input vector is
> > available. You might want to consider computing a column-wise DFT. For
> > each input sample, you have to accumulate N output samples (N is the
> > size of the DFT). That is, once your last sample is available, you have
> > to perform another N multiply-accumulate instructions, then your full
> > DFT output is ready.
> >
>
> Yes, I need to calculate the FFT as fast as possible.
>
> I have never heard of colomn-wise DFT. I bet the performance of any new
> algorithm must match the efficiency of well-known Radix 2, 4, 8 algorithms
> before the new algorithm becomes feasible.

The point is this: any output bin of the FFT will only be valid after
you have the last sample available. This is because each output bin is,
in general, affected by each input "bin" (time sample). So you can
present your final DFT result of the input only after you have received
*all* input samples. The question is this: which algorithm let's you
finish the DFT of the input vector as fast as possible after you have
received the last sample, allowing arbitrary pre-computation on the
first N-1 samples? The DIT & DIF FFT algorithms require all samples to
be available to compute the first stage, so after you have recieved the
last input sample, you still have O(N log (N-1)) computations ahead.

However, if you perform a column-wise DFT, you only have O(N)
computations to produce the full DFT output after having received the
last input sample.

Even though the overall computation of the naive DFT is O(N^2),
therefore slow compared to the FFT, it is faster in the above special
case (provided you have enough time in between each sample to compute a
full N-point multiply-accumulate).

Regards,
Andor

"Andor" <andor.bariska@gmail.com> wrote in message 
news:1167911123.631108.197800@q40g2000cwq.googlegroups.com...
>
>
> On 4 Jan., 12:07, "Jimm Avante" <jimava...@google.com> wrote:
>> I am new to FFT but after reading, all FFT starts butterfly operation 
>> during
>> the
>> middle of the frame. For example, a radix-2, it starts until N/2 sample 
>> has
>> been
>> read in from ADC, i.e. but(0, N/2-1), but(1, N/2), ...
>>
>> Is it possible to design it in such a way it performs a butterfly 
>> operation
>> real time every two samples? but(0, 1), but(2, 3), ...
>
> I don't know what you mean by real time, but yes, the initial ordering
> of the samples is irrelevant (assuming that you don't care about the
> ordering of the frequency domain samples). If you use the order that
> you describe, you end up with a decimation-in-frequency algorithm.
>

I read about DIT & DIF algorithm, both operates like but(0, N/2-1), but(1, 
N/2), ...
meaning the butterfly can only start once the (N/2-1)th has been received. 
Or else I
have got wrong reference books!?!?

I use Digital SIgnal Processing by S Salivahanan, A. Vallavarak. And
Digital Signal Processing John G. Proakis, Dimitris K Manolakis

In an OFDM communication system, the receiver receives 0, 1, 2, 3, ... N-1.
Of course starting BUT operation at (N/2-1)th doesn't need extra hardware, 
but
eliminating this latency might be useful.

> I guess your aim is to produce FFT bins as fast as possible, without
> waiting for the whole input to be available. Obviously, the full FFT
> output vector is only available once the whole input vector is
> available. You might want to consider computing a column-wise DFT. For
> each input sample, you have to accumulate N output samples (N is the
> size of the DFT). That is, once your last sample is available, you have
> to perform another N multiply-accumulate instructions, then your full
> DFT output is ready.
>

Yes, I need to calculate the FFT as fast as possible.

I have never heard of colomn-wise DFT. I bet the performance of any new
algorithm must match the efficiency of well-known Radix 2, 4, 8 algorithms
before the new algorithm becomes feasible.

> If you have to repeatedly apply this procedure to produce several
> consecutive DFT vectors, you might also want to consider the sliding
> DFT algorithm.
>
> Regards,
> Andor
>

On 4 Jan., 12:07, "Jimm Avante" <jimava...@google.com> wrote:
> I am new to FFT but after reading, all FFT starts butterfly operation during
> the
> middle of the frame. For example, a radix-2, it starts until N/2 sample has
> been
> read in from ADC, i.e. but(0, N/2-1), but(1, N/2), ...
>
> Is it possible to design it in such a way it performs a butterfly operation
> real time every two samples? but(0, 1), but(2, 3), ...

I don't know what you mean by real time, but yes, the initial ordering
of the samples is irrelevant (assuming that you don't care about the
ordering of the frequency domain samples). If you use the order that
you describe, you end up with a decimation-in-frequency algorithm.

I guess your aim is to produce FFT bins as fast as possible, without
waiting for the whole input to be available. Obviously, the full FFT
output vector is only available once the whole input vector is
available. You might want to consider computing a column-wise DFT. For
each input sample, you have to accumulate N output samples (N is the
size of the DFT). That is, once your last sample is available, you have
to perform another N multiply-accumulate instructions, then your full
DFT output is ready.

If you have to repeatedly apply this procedure to produce several
consecutive DFT vectors, you might also want to consider the sliding
DFT algorithm.

Regards,
Andor

Jimm Avante wrote:
> I am new to FFT but after reading, all FFT starts butterfly operation during 
> the
> middle of the frame. For example, a radix-2, it starts until N/2 sample has 
> been
> read in from ADC, i.e. but(0, N/2-1), but(1, N/2), ...
> 
> Is it possible to design it in such a way it performs a butterfly operation 
> real time
> every two samples? but(0, 1), but(2, 3), ...
> 

It sounds like you may want a "decimation-in-frequency" FFT,  as opposed
to "decimation-in-time".

Given the way you stated your problem, you may also be interested in
"sliding DFT".

These terms should help you and Google find better explanations than I
can easily give.

-- Mark Borgerding