comp.dsp | Circular convolution optimization

Hi there. I'm going to implement circular convolution with zero latency,
however here I'll simplify it to say 1024 points latency, because I need to
understand where is the problem.

So let's say we have an extremely long kernel, like 1000 * 1024 points. Now
this is the input:
i1 i2 i3 .. i1000 (each 1024 points long).

And this is the kernel:
k1 k2 k3 .. k1000 (each 1024 points long).

To perform convolution we can zero pad each input and kernel block to 2048
samples and:
out1 = idft( dft(i1 zero padded) * dft(k1 zero padded) )

And this is the result of convolution for input i1000:
finalout1000 = out1 + out2 + .. + out1000 + leakagefromfinalout999

We can obviously precalculate dft's for k1 .. k1000. But we can also
calculate input dft's just once, when processing realtime,
because what is now i1000, will soon be i999, and then i998 etc., right?

And here's my dilemma - I think we can also calculate output idft once. Why
summing out1+out2.., when we can sum it in frequency domain? So, is this
true: ?

finalout1000 = idft( 
  dft(i1 zero padded) * dft(k1 zero padded) +
  dft(i2 zero padded) * dft(k2 zero padded) + 
  ..
  dft(i1000 zero padded) * dft(k1000 zero padded)) +
  leakagefromfinalout999

If that would be true, the convolution could be done too quickly I believe,
since FFT is the most time-consuming task here. But why it is wrong?

Thank in advance.

Reply by Andor ●May 17, 20102010-05-17

On 17 Mai, 15:58, "jungledmnc" <jungledmnc@n_o_s_p_a_m.gmail.com>
wrote:
> Hi there. I'm going to implement circular convolution with zero latency,
> however here I'll simplify it to say 1024 points latency, because I need =
to
> understand where is the problem.
>
> So let's say we have an extremely long kernel, like 1000 * 1024 points. N=
ow
> this is the input:
> i1 i2 i3 .. i1000 (each 1024 points long).
>
> And this is the kernel:
> k1 k2 k3 .. k1000 (each 1024 points long).
>
> To perform convolution we can zero pad each input and kernel block to 204=
8
> samples and:
> out1 =3D idft( dft(i1 zero padded) * dft(k1 zero padded) )
>
> And this is the result of convolution for input i1000:
> finalout1000 =3D out1 + out2 + .. + out1000 + leakagefromfinalout999
>
> We can obviously precalculate dft's for k1 .. k1000. But we can also
> calculate input dft's just once, when processing realtime,
> because what is now i1000, will soon be i999, and then i998 etc., right?
>
> And here's my dilemma - I think we can also calculate output idft once. W=
hy
> summing out1+out2.., when we can sum it in frequency domain? So, is this
> true: ?
>
> finalout1000 =3D idft(
> =A0 dft(i1 zero padded) * dft(k1 zero padded) +
> =A0 dft(i2 zero padded) * dft(k2 zero padded) +
> =A0 ..
> =A0 dft(i1000 zero padded) * dft(k1000 zero padded)) +
> =A0 leakagefromfinalout999
>
> If that would be true, the convolution could be done too quickly I believ=
e,
> since FFT is the most time-consuming task here. But why it is wrong?
>
> Thank in advance.

I'm not quite sure what your question is, but you might find answers
here:

http://pcfarina.eng.unipr.it/Public/AES-113/

Regards,
Andor

Reply by jungledmnc ●May 17, 20102010-05-17

Aaaa, thanks a lot! It's in the Microsoft patent!

Circular convolution optimization

Sign in

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group