bump.

Mark Borgerding wrote:
> I recently implemented fast convolution filtering for the "tools" 
> portion of KISS FFT ( http://sourceforge.net/projects/kissfft )
> 
> I've seen a few realizations of the overlap-scrap (aka overlap save) 
> method. They all have the scrapped output samples at the beginning of 
> the output buffer.  Those realizations require an additional buffer copy 
> to write continuous output.
> 
> I found a way to get around this buffer copy.  After zero padding the 
> impulse response up to Nfft, rotate the buffer left by Nh - 1 samples.
> 
> The junk samples end up at the end of the ifft'd buffer.  Thus the IFFT 
> can use the output buffer for filtering as its output buffer. Only 
> Nfft-Nh+1 of those samples are fit for consumption, but the scrap 
> samples can be overwritten by the next chunk.
> 
> I searched around a little bit, but couldn't find mention of this 
> technique anywhere.  Is this new, or is it so obvious as to not warrant 
> description?
> 
> - Mark Borgerding
> 
> 
> PS. Here's some octave (Matlab clone) code that I used for 
> proof-of-concept.  If you want a working version, see kiss_fastfir.c in 
> the latest version of KISS FFT.
> 
> function maxabsdiff=tailscrap()
> % test code for circular convolution with the scrapped portion
> % at the tail of the buffer, rather than the front
> %
> % The idea is to rotate the zero-padded h (impulse response) buffer
> % to the left nh-1 samples, rotating the junk samples as well.
> % This could be very handy in avoiding buffer copies during fast filtering.
> nh=10;
> nfft=256;
> 
> h=rand(1,nh);
> x=rand(1,nfft);
> 
> hpad=[ h(nh) zeros(1,nfft-nh) h(1:nh-1) ];
> 
> % baseline comparison
> y1 = filter(h,1,x);
> y1_notrans = y1(nh:nfft);
> 
> % fast convolution
> y2 = ifft( fft(hpad) .* fft(x) );
> y2_notrans=y2(1:nfft-nh+1);
> 
> maxabsdiff = max(abs(y2_notrans - y1_notrans))
> 
> end
>

I recently implemented fast convolution filtering for the "tools" 
portion of KISS FFT ( http://sourceforge.net/projects/kissfft )

I've seen a few realizations of the overlap-scrap (aka overlap save) 
method. They all have the scrapped output samples at the beginning of 
the output buffer.  Those realizations require an additional buffer copy 
to write continuous output.

I found a way to get around this buffer copy.  After zero padding the 
impulse response up to Nfft, rotate the buffer left by Nh - 1 samples.

The junk samples end up at the end of the ifft'd buffer.  Thus the IFFT 
can use the output buffer for filtering as its output buffer. Only 
Nfft-Nh+1 of those samples are fit for consumption, but the scrap 
samples can be overwritten by the next chunk.

I searched around a little bit, but couldn't find mention of this 
technique anywhere.  Is this new, or is it so obvious as to not warrant 
description?

- Mark Borgerding


PS. Here's some octave (Matlab clone) code that I used for 
proof-of-concept.  If you want a working version, see kiss_fastfir.c in 
the latest version of KISS FFT.

function maxabsdiff=tailscrap()
% test code for circular convolution with the scrapped portion
% at the tail of the buffer, rather than the front
%
% The idea is to rotate the zero-padded h (impulse response) buffer
% to the left nh-1 samples, rotating the junk samples as well.
% This could be very handy in avoiding buffer copies during fast filtering.
nh=10;
nfft=256;

h=rand(1,nh);
x=rand(1,nfft);

hpad=[ h(nh) zeros(1,nfft-nh) h(1:nh-1) ];

% baseline comparison
y1 = filter(h,1,x);
y1_notrans = y1(nh:nfft);

% fast convolution
y2 = ifft( fft(hpad) .* fft(x) );
y2_notrans=y2(1:nfft-nh+1);

maxabsdiff = max(abs(y2_notrans - y1_notrans))

end