DSPRelated.com
Forums

partitioned/fast convolution

Started by peterworth December 19, 2007
SteveSmith wrote:
> Keep in mind that it has been over 15 years since I prepared Fig 18-3, so > my explanation may have some gaps. Here is the link to the figure: > > http://www.dspguide.com/ch18/3.htm > > This graph is based on several measured values with smooth lines drawn > between. In retrospect, this graph should have shown the data markers for > these measurement points. (In particular, the curve for the FFT graph is > not this smooth, since the FFT length jumps by powers of two.) The > measured data were taken on a 100 MHz Pentium processor, running under > DOS, in compiled QuickBasic. I know that one of the measured points was > for a 400 point filter kernel with 1024 point FFTs being used. My guess > is that the other points were powers of two of this, e.g. an 800 point > filter kernel with 2048 point FFTs. > > I didn't investigate how the length of the FFT affects the execution > speed. For instance, I simply state that a 600 point filter kernel could > use an FFT of 1024 or 2048 points. Robert, it seems you looked at this > issue, but I couldn't see the answer in your post. For a filter kernel > of length, L, what is the optimum FFT size, N?
I calculated this for the SHARC DSP (2nd generation with SIMD) in this post: http://groups.google.com/group/comp.dsp/browse_frm/thread/0985ad0ca9d10309/e416169ba33926a0?#e416169ba33926a0 The numbers are highly hard- and software dependent. The ADI library for complex vector MAC uses 6 cycles per MAC*), and I coded it in 5 cycles per MAC, etc. I once worked on another system with shared SDRAM, where the bottle neck was the memory interface. That system was also more complex, in that the impulse response was partitioned into multiple blocks of different size (the Gardener concept), same thing that the OP has in mind. In such systems, calculating the break-even point isn't that simple, but the idea remains the same. Regards, Andor *) see here: http://www.analog.com/processors/sharc/technicalLibrary/codeExamples/2116x_simd_code.html
On Dec 25, 4:40 am, Andor <andor.bari...@gmail.com> wrote:
> SteveSmith wrote: > > Keep in mind that it has been over 15 years since I prepared Fig 18-3, so > > my explanation may have some gaps. Here is the link to the figure: > > >http://www.dspguide.com/ch18/3.htm > > > This graph is based on several measured values with smooth lines drawn > > between. In retrospect, this graph should have shown the data markers for > > these measurement points. (In particular, the curve for the FFT graph is > > not this smooth, since the FFT length jumps by powers of two.) The > > measured data were taken on a 100 MHz Pentium processor, running under > > DOS, in compiled QuickBasic. I know that one of the measured points was > > for a 400 point filter kernel with 1024 point FFTs being used. My guess > > is that the other points were powers of two of this, e.g. an 800 point > > filter kernel with 2048 point FFTs. > > > I didn't investigate how the length of the FFT affects the execution > > speed. For instance, I simply state that a 600 point filter kernel could > > use an FFT of 1024 or 2048 points. Robert, it seems you looked at this > > issue, but I couldn't see the answer in your post. For a filter kernel > > of length, L, what is the optimum FFT size, N? > > I calculated this for the SHARC DSP (2nd generation with SIMD) in this > post: > > http://groups.google.com/group/comp.dsp/browse_frm/thread/0985ad0ca9d...
i remember that. i was also singing the same tired old tune then.
> The numbers are highly hard- and software dependent.
but i'd bet that it cost/sample function still fits the form: cost(L,N) = (A*N*log2(N) + B*N + C*log2(N) + D)/(N-(L-1)) . other than independently hand-coded FFTs (for every N=2^p), i would think that using a vanilla flavored, radix-2, general N=2^p FFT program, the fast convolution would fit that cost function for constant A, B, C, and D. r b-j
Hi Robert,
I agree with you assessment.  The &ldquo;FFT&rdquo; curve in my Fig. 18-3 would be
about 20% lower if longer FFTs had been used for the measured data.   

I used your general method on FFT lengths from 512 to 16,384.  If one is
satisfied to be within about 10% of optimal operation, the FFT length
needs to be at least 3-4 times the length of the filter kernel.  If you
want to be within a few percent, its more like 6-8 times the length of the
filter kernel.  These are much longer than I expected&ndash; thanks for the
good data. 

Regards,
Steve Smith
Steve.Smith1@SpectrumSDI.com