DSPRelated.com
Forums

Fast convolution

Started by venkat ramanan February 23, 2005

hai,
I am searching for a fast convolution algorithm which has order less than O(MN). I have already tried FFT. Is there any other method available to compute the correlation in a faster manner?
Awaiting reponse..

venkat Mistakes are not end of the world but repeating them is




Venkat--

Can I ask you, as what do you mean by O(MN)? Am not sure Fast convolution-- which is normally made of 3 FFT's + N mutiplies and a little more computation--can be described as O(MN). Either you are off the mark there or am not understanding your notation.

Anyways my understanding is that Fast Convolution is a better method(duh!) to do convolution when compared to the direct, time domain version which costs in the order of N^2(Almost never possible to realize for rela systems).But this is true only when you have a fairly large N and that N itself is un known apriori,which is mostly the case. The fast convolution itself costs in the range of N + 3*N/2 log N *complex* multiplies and 3( N log N) *complex* adds. And if Impulse response H(K) can be calculated apriori, its just the cost of 2 FFT's + N multiplies to deal with.

I know of no better method than this to achieve convolution.

--Bhooshan

venkat ramanan <> wrote:

hai,
I am searching for a fast convolution algorithm which has order less than O(MN). I have already tried FFT. Is there any other method available to compute the correlation in a faster manner?
Awaiting reponse..

venkat Mistakes are not end of the world but repeating them is

To
---------------------------------


Bhooshan-

> I know of no better method than this to achieve convolution.

Search for "fast Hartley transform", or DHT. Input length has to be a power of 2,
but in some cases it seems the DHT might be faster for convolution.

Also from the planet's FFT expert:

http://www.fftw.org/burrus-notes.html

"A very interesting general length FFT system called the FFTW has been
developed by Frigo and Johnson at MIT which uses a library of efficient
"codelets" which are composed for a very efficient calculation of the
DFT on a wide variety of computers [6, 7]. For most lengths and on
most computers, this is the fastest FFT today"

-Jeff

> venkat ramanan <> wrote:
>
> hai,
> I am searching for a fast convolution algorithm which has order less than O(MN). I have already tried FFT. Is there any other method available to compute the correlation in a faster manner?
> Awaiting reponse..
>
> venkat
>
> Mistakes are not end of the world but repeating them is





Dear Jeff, Venkat, Shree,

I am putting below a few random notes on the numerical convolution.
(so it's not about an integral k(x-y)f(y)dy):

> Date: 25 Feb 2005 03:31:46 -0000
> From: "venkat ramanan" <>
> Subject: Re: Re: [speechcoding] Fast convolution
>
> I have a difficulty in taking fft. I have to take a 40 point fft. (all the inputs are
> always real only). So i padded 24 zeros and make in 2 power and take 64 point fft for
> both impulse response and the input. and then multiplied the both and take inverse
> transform. But the result is not much accurate. (i am using fixed point arithmetic).
> Can anybody suggest a very good way for implementing a 40-point FFT?

Try modeling the process in floating point in either single or double
precision, estimate errors with fixed point. If errors are big, try to
increase bit length of the fixed point representation. If results are off
the dynamic range of the fixed point system, do increase it at the expense
of lesser fraction part. A fixed point representation, while keeping
a reasonable number of fractional bits might not have enough dynamic
range anyway, so you might want to use floating point numbers in your
production code.

> Date: Thu, 24 Feb 2005 20:10:42 -0600
> From: Jeff Brower <>
> Subject: Re: Fast convolution
>
> Search for "fast Hartley transform", or DHT. Input length has to be a power of 2,
> but in some cases it seems the DHT might be faster for convolution.

I am not sure if this is satisfied for the actual number of arithmetic
operations, however it might well be correct for the actual runtime, due
to e.g. better locality/cacheability of FHT (is it true?).

Otherwise, the only difference of DHT from DFT is in its kernel function,
which is defined as cos(x) + sin(x) instead of cos(x) + i*sin(x), therefore
the DHT of a real input is always real. There is a simple formula to transform
DHT to DFT.

There is also a relation between convolution in the function domain and
the DHT image domain. Don't remember it exactly, but since DHT and DFT
mathematically are somewhat equivalent, I suppose the relation must be
the similar, therefore the number of arithmetic operations would be
the same up to O(1) term probably.

> Date: Thu, 24 Feb 2005 20:46:12 -0800 (PST)
> From: Shree Jaisimha <>
> Subject: Re: Fast convolution]
>
> You mentioned you are doing a 40 point FFT. Please note that any type
> of fast convolution has some disadvantages.

Just like any tool we might use. In russian they say (in loose translation)
that it is not practical to shoot sparrows with an 18" gun :)

> They are as follows:
> (i) Increased Memory locations on your DSP RAM to do the block
> processing, example : Y(w) = H(w)X(w)

Yep, but memory is a cheap resource these days.

> (ii) There is also inherent processing delay. You need to store the
> Frame and make sure its full before you do the frequency domain
> multiplication as compared to time domain convolution which is sample by
> sample.

I guess the delay would be the same as in the function domain. Once you
collected the right size buffer to perform convolution, you always can
apply the circular buffer method to perform "sliding" FFT by deleting the
"oldest" sample and adding the "latest" sample, thus obtaining
convolution result sample by sample.

> With regards to computational savings as Booshan pointed out, with fast convolution
> its O(4*([3N/2 lg(base2)N + N)]) while normal time domain convolution will use up
> 2N - 1 assuming equal length sequences of length N.

I am not sure if you meant N^2 here instead of 2N? Numerically, the
convolution is a dot product of two vectors of length N. It might be
calculated for a sequence of samples of length M, but for every output
sample the dot product complexity is O(N), which gives total complexity
of O(MN). If M = N, we end up with O(N^2) number of operations.

> Fast Convolution is ONLY faster if the following inequality is satisfied:
>
> 6Nlog(base2) N + 4N < N^2

Assume the inequality is written correct

> In your case N@, please do the computation above and see if the
> equality is satisfied. If not fast convolution is not for your end
> application as its not going to help in terms of computational savings.
> However since N is a power of 2, you need to append zeroes and the
> nearest Nd.

Let's check it:

log2(64) = 6, thus 6*64*6 + 4*64 = (36 + 4)*64 = 40 * 64,

which is less than 64^2. So even for small N, convolution in the image
domain seem to be profitable, and more and more profitable for larger N.

Sorry for a long post.

Andrew