hey guys! i gotta sorta holiday DSP present for you. Steve Smith's
plot on Figure 18-3 represented some interesting data regarding the
cost of fast convolution (which is something that i messed around with
theoretically a little bit). i had some old notes on how to approach
the optimization problem (given an FIR length, L, how to choose the
FFT size, N, so that the number of operations per processed sample is
minimum). so i took the notes and coded it in MATLAB (i'm sure that
Octave will work) and it's at the bottom of this post. have fun with
it. i think it's pretty conclusive.
anyway, it *does* show that if you optimize the FFT size that the cost
of fast convolution increases linearly with the logarithm of the FIR
length L. i.e., if your FIR length gets big enough, doubling the FIR
length, L, increases the time cost of processing by a constant unit of
time.
Melly Clistmist everyone,
r b-j
On Dec 22, 6:22 pm, robert bristow-johnson <r...@audioimagination.com>
wrote:
>
> Now, I have a question for Steve Smith:
>
> this has to do with Figure 3-18 on page 183 of your book,
i meant to say Figure 18-3 on page 318.
> http://www.dspguide.com/CH18.PDF. it's displaying the net execution
> time for the fast convolution in ms/L (L is the FIR length, or FIR
> order+1). can you tell me, as a function of the FIR length L, what
> size FFT (N) did you use? then we know, per block, that the total
> cost of processing is divided by (N-L+1) to get the cost of processing
> per sample.
>
> i've posted to this newsgroup a couple of times, a methodology for
> picking the optimal value of N as a function of L, the desired FIR
> length (that is usually determined by the minimum performance needs in
> the filter spec as it gets designed by P-McC or similar).
>
> so, for comparison, the value of N i would chose would be from this:
>
> the cost of the FFT would likely be of this form:
>
> A1*N*log2(N) + B1*N + C1*log2(N) + D1
>
> for some constants, A1, B1, C1, and D1, given the nature of the
> hardware and software that implements the FFT. we expect the cost of
> multiplication in the frequency domain to be
>
> B2*N + D2
>
> and there might be more overhead regarding moving samples around. in
> OLS, i would expect that cost to be proportional to the number of
> samples processed per block
>
> E*(N-L+1)
>
> where E is the cost per sample of moving data. anyway, given those
> cost functions, the total cost to process one frame of samples would
> be
>
> Cost/Block = A*N*log2(N) + B*N + C*log2(N) + D + E*(N-L+1)
>
> where the A, B, C, D, and E coefficients are easily determined from
> the cost coefficients above (remember there are 2 FFTs). then, the
> cost per sample, the performance measure you have plotted in your
> book, Steve, is a function that should look like:
>
> Cost/Samp = (A*N*log2(N) + B*N + C*log2(N) + D)/(N-L+1) + E
>
> Now, given an arbitrary (but fixed) L, you can come up with a
> recursion equations that compares the Cost/Samp for N that are
> adjacent powers of 2 (well call them N and N/2). That difference is:
>
> (A*N*log2(N) + B*N + C*log2(N) + D)/(N-L+1)
> - (A*N/2*log2(N/2) + B*N/2 + C*log2(N/2) + D)/(N/2-L+1)
>
> (note the E's kill each other off).
>
> It intially looks like you can simplify that cost delta function, but
> because the two denominators are different, you can't do too much to
> it (oh, maybe you can put it on a common denominator) except evaluate
> it, given A, B, C, D, and L, for increasing powers of 2 for N. you
> can also look at the same cost function, but in terms of log2(N) and
> with that variable as continuous (so we're essentiall sampling that
> function every integer value of log2(N), for radix-2 FFTs). when you
> do that, you can show by derivative, that there is only one value of
> log2(N) that minimizes the Cost/Samp function. so that means you can
> take the above cost difference equation (with L fixed) and repeatedly
> double N until it first becomes negative. that would be one power of
> two beyond your optimal power of 2 for N.
>
> anyway, what i would like to know, Steve, is how you chose your FFT
> size, N, as a function of FIR length, L? and then assuming you pick
> that N, is that the cost/L function that you have plotted in Figure
> 18-3?
>
%
% An investigation in optimal radix-2 FFT size, N = 2^p,
% for fast convolution with FIR length of L
%
% Cost function:
%
% cost/sample = (cost/block) / (num_processed_samples/block)
%
% = (A*p*N + B*N + C*p + D) / (N - (L-1))
%
% N = 2^p = size of radix-2 FFT
% p = log2(N) = number of passes in radix-2 FFT
% L = FIR length ( L-1 is FIR order )
% N - (L-1) = number of samples processed in each block
%
% p*N/2 = number of FFT butterflies in one radix-2 FFT
% N-1 = number of FFT groups in one radix-2 FFT
% N/2 = number of complex multiplies needed for scaling spectrum
% p = number of FFT passes in one radix-2 FFT
%
% A = cost of a single FFT butterfly
% B = twice of overhead cost for setting up each FFT group
% + half cost of one complex multiply in transfer function
% C = twice of overhead cost for setting up each FFT pass
% D+B = twice of overhead cost for setting up FFT
% + other constant costs per block
%
% Fixing the FIR length to L, the optimal power, p,
% of two for N = 2^p is such a p that
%
% L-1 >= (A*2^p - C*(p-2) - D)/(A*(p+1) + B + C*2^(1-p))
%
% and
%
% L-1 <= (A*2^(p+1) - C*(p-1) - D)/(A*(p+2) + B + C*2^(-p))
%
% So, instead of solving for p as a function of L, for every
% integer value of p (or for every power of two, N), we delimit
% the range of L that has that particular integer power of p
% optimal for the range of L. If, as L increases, it exceeds the
% optimal range, p is incremented by 1 (which changes the range so
% that L is now inside the range for which p is optimal).
%
% The blue graph is the cost function for increasing L using the
% optimal value for p or N while the red and green graphs are the
% cost functions for FFT size N twice as large as the optimal
% and half as large as optimal N, repectively. Note that the red
% and green graphs never get less than the blue (optimal) graph
% and touch it only at points where the FIR length L is about to
% have the optimal N double.
%
% From Robert Bristow-Johnson, (c) 2007
%
if ~exist('maxP')
maxP = 32
end
if ~exist('L_per_oct')
L_per_oct = 32
end
if ~exist('A')
A = 1
end
if ~exist('B')
B = 0
end
if ~exist('C')
C = 0
end
if ~exist('D')
D = 0
end
p = linspace(1, 40, 40);
L_optimal = (A*2.^p - C*(p-2) - D)./(A*(p+1) + B + C*2.^(-p+1)) + 1;
clear p;
LL = round(2.^linspace(1, maxP, maxP*L_per_oct+1));
cost = zeros(size(LL));
p = 1;
for k=1:maxP*L_per_oct+1
L = LL(k);
if (L >= L_optimal(p+1))
p = p+1;
end;
N = 2^p; % optimal FFT size N for FIR length L
cost(k) = (A*p*N + B*N + C*p + D)/(N - (L-1)); % cost function
end;
semilogx(LL, cost, 'b');
if 1 % set to 1 for comparative plots
hold on;
p = 1;
for k=1:maxP*L_per_oct+1
L = LL(k);
if (L >= L_optimal(p+1))
p = p+1;
end;
N = 2^(p-1); % half of optimal FFT size N for FIR length L
cost(k) = (A*(p-1)*N + B*N + C*(p-1) + D)/(N - (L-1));
end;
semilogx(LL, cost, 'g');
p = 1;
for k=1:maxP*L_per_oct+1
L = LL(k);
if (L > L_optimal(p+1))
p = p+1;
end;
N = 2^(p+1); % twice optimal FFT size N for FIR length L
cost(k) = (A*(p+1)*N + B*N + C*(p+1) + D)/(N - (L-1));
end;
semilogx(LL, cost, 'r');
hold off;
end;