comp.dsp | Convolution versus LPFiltering| page 3

Reply by Mark Borgerding ●February 25, 20042004-02-25

Jerry Avins wrote:
> Mark Borgerding wrote:
> 
>> Jerry Avins wrote:
>>
>>> Fred Marshall wrote:
>>>
>>>   ...
>>>
>>>> Did I get it right?
>>>>
>>>> Fred
>>>
>>>
>>>
>>>
>>> Mark's discussion about process never
>>> addressed the startup transient as far as I could see, 
>>
>>
>>
>> I would've thought this did (copied from my earlier post):
>>
>> "The transient response is still there.  It's the last nh-1 samples of 
>> the output buffer.  The invalid output samples get shifted there by a 
>> linear phase multiplication in the frequency domain (i.e convolution 
>> with delta pulse in the time domain ). "
> 
> 
> Mark,
> 
> But then it isn't a start-up transient. 

The transients with circular convolution are not really "startup" 
transients anyway.  The transients are caused by the filter overlapping 
both the front and back of the signal buffer.

If the signal buffer is zeropadded with nh-1 zeros, as with overlap-add, 
then the first nh-1 samples are numerically the same as the startup 
transient, since the back is all zeros.

If the signal buffer is *not* zeropadded, as with overlap-save,
the first nh-1 samples are due to circular convolution with the back of 
the buffer.  These transient samples are meaningless, and must be discarded.

All I am saying is that you can easily shift those nh-1 samples from the 
front of the buffer to the back.  Juts rotate the impulse response. In 
other words, circularly convolve the impulse response with a delta pulse.

> The ordinary operation of a
> usual FIR has output less than perfect until as many samples have been
> accepted as the filter has taps. It is possible to suppress the output
> until that number of samples have been loaded, discarding the imperfect
> ones, but that delays when output begins. 

The idea of delays is tricky with regards to block processing.  There is 
no constant delay.

The first output block is not available until the entire first input 
block is read.  Then the entire output block is available.

The input-to-output latency of any useful fft based filtering is always 
worse than with a transversal (direct form FIR) filter. (see Note 1)

> You seem to claim that perfect output begins immediately with "overlap scrap", before a transversal
> calculation would make it available or that the initial filter inputs
> actually produce good output.

I claim no such thing.

I simply say that you can control the placement of the transient samples 
within the output block.  That output block is not computed until an 
entire input block has been read.

The time travel buggy does not shift into reverse.

>> Was this unclear?
> I hope so, for my sanity's sake.

Shall I send the nice men in white suits to take you to the soft room ? :)

-- Mark Borgeding

Note 1:  One could contend that the ridiculous case where a 
fft->mult->ifft is done for each input sample results in a constant lag, 
but this is a pretty silly use of resources.

Reply by Andrew Reilly ●February 25, 20042004-02-25

On Wed, 25 Feb 2004 17:41:03 -0500, Jerry Avins wrote:
> But then it isn't a start-up transient. The ordinary operation of a
> usual FIR has output less than perfect until as many samples have been
> accepted as the filter has taps. It is possible to suppress the output
> until that number of samples have been loaded, discarding the imperfect
> ones, but that delays when output begins. You seem to claim that perfect
> output begins immediately with "overlap scrap", before a transversal
> calculation would make it available or that the initial filter inputs
> actually produce good output.

The startup transient that you speak of is not the same
transient that Mark is discussing, I believe, which is (in his
case) the discarded circular convoulution part.

I'm afraid that I didn't get your meaning at first either,
and in retrospect, that is interesting to me.  You see, I
work/think in a world of filters defined by time domain impulse
responses, where initial conditions are defined to be zero
(all other situations are covered by linear shift invariance).
I.e., signals are always {x[n]= 0, n < 0}.

What you call the "startup transient" is therefore all that I'm
interested in.  You only get yourself into a state where you
would call this transient a "less than perfect" part of the
response if you were only interested in steady state (perhaps
"frequency domain") performance, where that steady state was
some non-zero sequence, or at least did not contain periods of
zeroes longer than the impulse response length.

Interesting.  Thanks for the kick to my mental perspective.

I hope that this is the heart of the matter that you found
confusing?

Cheers,

-- 
Andrew

Reply by Andrew Reilly ●February 25, 20042004-02-25

On Wed, 25 Feb 2004 15:26:57 -0500, Mark Borgerding wrote:
> But if the output from the ifft begins with invalid samples, where can 
> you point the ifft output so that the first valid sample from block N 
> will be placed right after the last valid sample from block N-1?

You're only thinking in terms of canned FFT routines, which
is fair, I admit.  You don't have to compute all of the last
pass of your IFFT if you don't want to.  You could choose to
only compute the results for the valid samples, and store them
wherever you like.  Even into the correct location in a circular
output buffer.

> I don't see how a circular output buffer obviates the buffer copy with 
> textbook overlap-save.

It doesn't.  It's just the usual way to marshall data for
output, on DSPs.

-- 
Andrew

Reply by Jerry Avins ●February 25, 20042004-02-25

Andrew Reilly wrote:

> On Wed, 25 Feb 2004 17:41:03 -0500, Jerry Avins wrote:
> 
>>But then it isn't a start-up transient. The ordinary operation of a
>>usual FIR has output less than perfect until as many samples have been
>>accepted as the filter has taps. It is possible to suppress the output
>>until that number of samples have been loaded, discarding the imperfect
>>ones, but that delays when output begins. You seem to claim that perfect
>>output begins immediately with "overlap scrap", before a transversal
>>calculation would make it available or that the initial filter inputs
>>actually produce good output.
> 
> 
> The startup transient that you speak of is not the same
> transient that Mark is discussing, I believe, which is (in his
> case) the discarded circular convoulution part.
> 
> I'm afraid that I didn't get your meaning at first either,
> and in retrospect, that is interesting to me.  You see, I
> work/think in a world of filters defined by time domain impulse
> responses, where initial conditions are defined to be zero
> (all other situations are covered by linear shift invariance).
> I.e., signals are always {x[n]= 0, n < 0}.
> 
> What you call the "startup transient" is therefore all that I'm
> interested in.  You only get yourself into a state where you
> would call this transient a "less than perfect" part of the
> response if you were only interested in steady state (perhaps
> "frequency domain") performance, where that steady state was
> some non-zero sequence, or at least did not contain periods of
> zeroes longer than the impulse response length.
> 
> Interesting.  Thanks for the kick to my mental perspective.
> 
> I hope that this is the heart of the matter that you found
> confusing?
> 
> Cheers,

What I refer to as the startup transient is also evident as edge effect
when processing images with a kernel that involves pixels surrounding
the one being output. At the edge of the image the kernel is partly off
the image.

The discussion began when the OP asked why FFT convolution gave a
different result than ordinary convolution, expecting them to be precise
equivalents. (As far as I know, they are sample-for-sample equivalents.)
Mark seemed to suggest that, while ordinary convolution had the kind of
start-up transient I've been harping on, his overlap-scrap method does
not. Maybe I completely misunderstood, but that's how I construed his
contribution.

Jerry
-- 
Engineering is the art of making what you want from things you can get.
&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;

Reply by Mark Borgerding ●February 25, 20042004-02-25

Andrew Reilly wrote:
> On Wed, 25 Feb 2004 15:26:57 -0500, Mark Borgerding wrote:
> 
>>But if the output from the ifft begins with invalid samples, where can 
>>you point the ifft output so that the first valid sample from block N 
>>will be placed right after the last valid sample from block N-1?
> 
> 
> You're only thinking in terms of canned FFT routines, which
> is fair, I admit.  You don't have to compute all of the last
> pass of your IFFT if you don't want to.  You could choose to
> only compute the results for the valid samples, and store them
> wherever you like.  Even into the correct location in a circular
> output buffer.

True.  This *would* have computational advantages.

Do you know of [m]any off-the-shelf fft routines that can do this?

I've thought of adding this capability to kissfft.  I'm concerned about 
the costs in terms of simplicity and performance.

-- Mark

Reply by Fred Marshall ●February 26, 20042004-02-26

"Mark Borgerding" <mark@borgerding.net> wrote in message
news:403d3535$1@news.xetron.com...
> Jerry Avins wrote:
> > Mark Borgerding wrote:
>
> The transients with circular convolution are not really "startup"
> transients anyway.  The transients are caused by the filter overlapping
> both the front and back of the signal buffer.

Actually they are startup transients - no matter what you call them.  Signal
buffers are a detail of implementation and startup transients are a matter
of physics.  Then, there's the "mixed mode" where delays due to physics and
delays due to implementation both matter.

>
>
> If the signal buffer is zeropadded with nh-1 zeros, as with overlap-add,
> then the first nh-1 samples are numerically the same as the startup
> transient, since the back is all zeros.
>
> If the signal buffer is *not* zeropadded, as with overlap-save,
> the first nh-1 samples are due to circular convolution with the back of
> the buffer.  These transient samples are meaningless, and must be
discarded.

I need to figure how you do fft/mult/ifft circular convolution without
zeropadding.  Well, that is, unless you're going to allow temporal aliasing.
You have to at least zero pad enough to make temporal room for the filter
impulse response length.  Otherwise there will be aliasing won't there?
Both arrays have to be the same length.... N+L-1, right?  I guess if you're
going to scrap some of the output then you can just not zero-pad and,
instead, data-pad.  Is that the idea?

If the filter is rotated by its entire length, and the data isn't
zero-padded then it must be of the proper length nonetheless.  So, let's
choose L<<N.  So the filter has to be zero-padded out to N.  If the data
isn't zero-padded but the output is ignored then maybe that works is the
point, eh?

>
> All I am saying is that you can easily shift those nh-1 samples from the
> front of the buffer to the back.  Juts rotate the impulse response. In
> other words, circularly convolve the impulse response with a delta pulse.

It can be done but it only changes the index of the samples.

>
>
> > The ordinary operation of a
> > usual FIR has output less than perfect until as many samples have been
> > accepted as the filter has taps. It is possible to suppress the output
> > until that number of samples have been loaded, discarding the imperfect
> > ones, but that delays when output begins.

Isn't supression the same as "ignoring" or throwing out the initial
transient?

>
> The idea of delays is tricky with regards to block processing.  There is
> no constant delay.

Yes, but there is a constant latency which is determined by how soon you can
start taking data out (at the proper rate).  So, the bursts of data make no
difference.  Just like the ideal case of streaming audio.

>
> The first output block is not available until the entire first input
> block is read.  Then the entire output block is available.

Exactly.  That pretty much defines the latency.  (I say latency to avoid
using the word "delay" and getting into the mixed mode).

>
> The input-to-output latency of any useful fft based filtering is always
> worse than with a transversal (direct form FIR) filter. (see Note 1)

Interesting.  I hadn't thought about it.  I guess it must be that for a real
time filter either approach must be "fast enough" for throughput and the
fft/mult/ifft process with the smaller compute load adds latency.  Makes
sense.

Fred

Reply by Mark Borgerding ●February 26, 20042004-02-26

Fred Marshall wrote:
> "Mark Borgerding" <mark@borgerding.net> wrote in message
> news:403d3535$1@news.xetron.com...
> 
>>Jerry Avins wrote:
>>
>>>Mark Borgerding wrote:
>>
>>The transients with circular convolution are not really "startup"
>>transients anyway.  The transients are caused by the filter overlapping
>>both the front and back of the signal buffer.
> 
> 
> Actually they are startup transients - no matter what you call them.  Signal
> buffers are a detail of implementation and startup transients are a matter
> of physics.  Then, there's the "mixed mode" where delays due to physics and
> delays due to implementation both matter.
> 
> 
>>
>>If the signal buffer is zeropadded with nh-1 zeros, as with overlap-add,
>>then the first nh-1 samples are numerically the same as the startup
>>transient, since the back is all zeros.
>>
>>If the signal buffer is *not* zeropadded, as with overlap-save,
>>the first nh-1 samples are due to circular convolution with the back of
>>the buffer.  These transient samples are meaningless, and must be
> 
> discarded.
> 
> I need to figure how you do fft/mult/ifft circular convolution without
> zeropadding.  

Search the index of your favorite DSP book for "overlap save" or 
"overlap scrap".  It will surely explain better than I do.

( Note: "Overlap save" and "overlap scrap" both refer to the same 
technique.  This thread has made it obvious that by using the latter, 
less common, term; I've been making it harder to understand. My apologies.)

 > Well, that is, unless you're going to allow temporal aliasing.
> You have to at least zero pad enough to make temporal room for the filter
> impulse response length.  Otherwise there will be aliasing won't there?
> Both arrays have to be the same length.... N+L-1, right?  I guess if you're
> going to scrap some of the output then you can just not zero-pad and,
> instead, data-pad.  Is that the idea?

Exactly.  The output from the ifft stage of Overlap-Save filtering 
*does* have temporal aliasing.  The affected nh-1 samples must be scrapped.

Overlap Add filtering performs proper circular convolution without 
temporal aliasing.  Unfortunately, this means that an addition step must 
be done to account for the transients.

Overlap Save fills up the data buffer completely, ffts, multiplies by 
the frequency response of the filter, then inverse ffts.  The aliased 
samples at the end must then be scrapped.  This is less intuitive than 
overlap-add, but more efficient since the addition step is unnecessary.

The rotation technique I add to the Overlap Save method further reduces 
intuitiveness in exchange for another small gain in efficiency ( in at 
least *some* situations where it obviates the buffer copy).

>>All I am saying is that you can easily shift those nh-1 samples from the
>>front of the buffer to the back.  Juts rotate the impulse response. In
>>other words, circularly convolve the impulse response with a delta pulse.
> 
> 
> It can be done but it only changes the index of the samples.

Right.  It shifts the first valid sample to the first index of the ifft 
result.  This can result in a redcution in buffer copies.

Say you need to filter a large input buffer of samples to a large output 
buffer.

With *standard* Overlap Save:
The output from the ifft is
nh-1 junk samples, followed by nfft-nh+1 valid output samples.
The valid samples must be copied from your ifft output to your large 
output buffer.

With *shifted* Overlap Save:
The output from the ifft is
nfft-nh+1 valid output samples, followed by nh-1 junk samples.

Minor distinction, but the latter technique allows you to send the ifft 
output directly to your large output buffer. The previous ifft's junk 
samples can be overwritten with this ifft's valid samples.

>>The input-to-output latency of any useful fft based filtering is always
>>worse than with a transversal (direct form FIR) filter. (see Note 1)
 >
> Interesting.  I hadn't thought about it.  I guess it must be that for a real
> time filter either approach must be "fast enough" for throughput and the
> fft/mult/ifft process with the smaller compute load adds latency.  Makes
> sense.

It's the old tradeoff of speed vs. latency.  Push it in one place and it 
pops out another.

> 
> Fred
> 

Thanks for the input,
-- Mark Borgerding

Reply by jim ●February 26, 20042004-02-26


Jerry Avins wrote:
> 
> Andrew Reilly wrote:
> 
> > On Wed, 25 Feb 2004 17:41:03 -0500, Jerry Avins wrote:
> >
> >>But then it isn't a start-up transient. The ordinary operation of a
> >>usual FIR has output less than perfect until as many samples have been
> >>accepted as the filter has taps. It is possible to suppress the output
> >>until that number of samples have been loaded, discarding the imperfect
> >>ones, but that delays when output begins. You seem to claim that perfect
> >>output begins immediately with "overlap scrap", before a transversal
> >>calculation would make it available or that the initial filter inputs
> >>actually produce good output.
> >
> >
> > The startup transient that you speak of is not the same
> > transient that Mark is discussing, I believe, which is (in his
> > case) the discarded circular convoulution part.
> >
> > I'm afraid that I didn't get your meaning at first either,
> > and in retrospect, that is interesting to me.  You see, I
> > work/think in a world of filters defined by time domain impulse
> > responses, where initial conditions are defined to be zero
> > (all other situations are covered by linear shift invariance).
> > I.e., signals are always {x[n]= 0, n < 0}.
> >
> > What you call the "startup transient" is therefore all that I'm
> > interested in.  You only get yourself into a state where you
> > would call this transient a "less than perfect" part of the
> > response if you were only interested in steady state (perhaps
> > "frequency domain") performance, where that steady state was
> > some non-zero sequence, or at least did not contain periods of
> > zeroes longer than the impulse response length.
> >
> > Interesting.  Thanks for the kick to my mental perspective.
> >
> > I hope that this is the heart of the matter that you found
> > confusing?
> >
> > Cheers,
> 
> What I refer to as the startup transient is also evident as edge effect
> when processing images with a kernel that involves pixels surrounding
> the one being output. At the edge of the image the kernel is partly off
> the image.
> 
> The discussion began when the OP asked why FFT convolution gave a
> different result than ordinary convolution, expecting them to be precise
> equivalents. (As far as I know, they are sample-for-sample equivalents.)
> Mark seemed to suggest that, while ordinary convolution had the kind of
> start-up transient I've been harping on, his overlap-scrap method does
> not. Maybe I completely misunderstood, but that's how I construed his
> contribution.

Yes, but using your image processing analogy - Depending on how you implement it
the edge effect can be at one edge, or the other or split between both (we'll
assume the output and input are the same size). To get the same affect as you
expect to see, Mark just needs to start his process with one additional empty
block and then discard his last block. He's not doing that so he doesn't see the
same thing as your method. Or you could just discard the first block and get what
he's getting (if you process one additional empty block at the end).

-jim


-----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
-----==  Over 100,000 Newsgroups - 19 Different Servers! =-----

Reply by Ronald H. Nicholson Jr. ●February 29, 20042004-02-29

In article <403a251b$0$3095$61fed72c@news.rcn.com>,
Jerry Avins  <jya@ieee.org> wrote:
>For the record, the startup transient in overlap-add and overlap-save 
>arises from there being no prior iteration.

The startup transient arises because this method implies a two-sided
interpolator or filter when there is only one-sided data at the start.

Given constraints of some degree of polynomial smoothness, or some
realistic frequency rolloff before the Nyquist frequency, it seems that a
one-sided or asymmetric interpolator or filter might give better results
near the start (and end) (of the complete data sequence. e.g. not just
the current block), certainly better than just using a two-sided linear
phase filter against a lot of unrelated zero fill.

A interesting question might be what would this one-sided filter look
like?  A minimum (or negative) phase conversion of the linear phase FIR?

And how would one transition from the one-sided filter near the startup
(or end) to the two-sided filter once enough data becomes available?
Perhaps dual filter the first data block and a cross-fade between the
two filtered results?

IMHO. YMMV.
-- 
Ron Nicholson   rhn AT nicholson DOT com   http://www.nicholson.com/rhn/ 
#include <canonical.disclaimer>        // only my own opinions, etc.

Previous 1 23Next

Convolution versus LPFiltering

Sign in

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group