comp.dsp | partitioned/fast convolution| page 3

Reply by Andor ●December 25, 20072007-12-25

SteveSmith wrote:
> Keep in mind that it has been over 15 years since I prepared Fig 18-3, so
> my explanation may have some gaps.  Here is the link to the figure:
>
> http://www.dspguide.com/ch18/3.htm
>
> This graph is based on several measured values with smooth lines drawn
> between.  In retrospect, this graph should have shown the data markers for
> these measurement points. (In particular, the curve for the FFT graph is
> not this smooth, since the FFT length jumps by powers of two.)   The
> measured data  were taken on a 100 MHz Pentium processor, running under
> DOS, in compiled QuickBasic.  I know that one of the measured points was
> for a 400 point filter kernel with 1024 point FFTs being used.   My guess
> is that the other points were powers of two of this, e.g. an 800 point
> filter kernel with 2048 point FFTs.
>
> I didn't investigate how the length of the FFT affects the execution
> speed.  For instance, I simply state that a 600 point filter kernel could
> use an FFT of 1024 or 2048 points. Robert, it seems you looked at this
> issue, but I couldn't see the answer in your post.  For a filter kernel
> of length, L, what is the optimum FFT size, N?

I calculated this for the SHARC DSP (2nd generation with SIMD) in this
post:

http://groups.google.com/group/comp.dsp/browse_frm/thread/0985ad0ca9d10309/e416169ba33926a0?#e416169ba33926a0

The numbers are highly hard- and software dependent. The ADI library
for complex vector MAC uses 6 cycles per MAC*), and I coded it in 5
cycles per MAC, etc. I once worked on another system with shared
SDRAM, where the bottle neck was the memory interface. That system was
also more complex, in that the impulse response was partitioned into
multiple blocks of different size (the Gardener concept), same thing
that the OP has in mind. In such systems, calculating the break-even
point isn't that simple, but the idea remains the same.

Regards,
Andor

*) see here:
http://www.analog.com/processors/sharc/technicalLibrary/codeExamples/2116x_simd_code.html

Reply by robert bristow-johnson ●December 26, 20072007-12-26

On Dec 25, 4:40 am, Andor <andor.bari...@gmail.com> wrote:
> SteveSmith wrote:
> > Keep in mind that it has been over 15 years since I prepared Fig 18-3, so
> > my explanation may have some gaps.  Here is the link to the figure:
>
> >http://www.dspguide.com/ch18/3.htm
>
> > This graph is based on several measured values with smooth lines drawn
> > between.  In retrospect, this graph should have shown the data markers for
> > these measurement points. (In particular, the curve for the FFT graph is
> > not this smooth, since the FFT length jumps by powers of two.)   The
> > measured data  were taken on a 100 MHz Pentium processor, running under
> > DOS, in compiled QuickBasic.  I know that one of the measured points was
> > for a 400 point filter kernel with 1024 point FFTs being used.   My guess
> > is that the other points were powers of two of this, e.g. an 800 point
> > filter kernel with 2048 point FFTs.
>
> > I didn't investigate how the length of the FFT affects the execution
> > speed.  For instance, I simply state that a 600 point filter kernel could
> > use an FFT of 1024 or 2048 points. Robert, it seems you looked at this
> > issue, but I couldn't see the answer in your post.  For a filter kernel
> > of length, L, what is the optimum FFT size, N?
>
> I calculated this for the SHARC DSP (2nd generation with SIMD) in this
> post:
>
> http://groups.google.com/group/comp.dsp/browse_frm/thread/0985ad0ca9d...

i remember that.  i was also singing the same tired old tune then.

> The numbers are highly hard- and software dependent.

but i'd bet that it cost/sample function still fits the form:

   cost(L,N) = (A*N*log2(N) + B*N + C*log2(N) + D)/(N-(L-1)) .

other than independently hand-coded FFTs (for every N=2^p), i would
think that using a vanilla flavored, radix-2, general N=2^p FFT
program, the fast convolution would fit that cost function for
constant A, B, C, and D.

r b-j

Reply by SteveSmith ●December 26, 20072007-12-26

Hi Robert,
I agree with you assessment.  The &ldquo;FFT&rdquo; curve in my Fig. 18-3 would be
about 20% lower if longer FFTs had been used for the measured data.   

I used your general method on FFT lengths from 512 to 16,384.  If one is
satisfied to be within about 10% of optimal operation, the FFT length
needs to be at least 3-4 times the length of the filter kernel.  If you
want to be within a few percent, its more like 6-8 times the length of the
filter kernel.  These are much longer than I expected&ndash; thanks for the
good data. 

Regards,
Steve Smith
Steve.Smith1@SpectrumSDI.com

Previous 1 23Next

partitioned/fast convolution

Sign in

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group