DSPRelated.com
Forums

Convolution versus LPFiltering

Started by XED February 22, 2004
Jerry Avins wrote:
> Mark Borgerding wrote: > >> Jerry Avins wrote: >> >>> Fred Marshall wrote: >>> >>> ... >>> >>>> Did I get it right? >>>> >>>> Fred >>> >>> >>> >>> >>> Mark's discussion about process never >>> addressed the startup transient as far as I could see, >> >> >> >> I would've thought this did (copied from my earlier post): >> >> "The transient response is still there. It's the last nh-1 samples of >> the output buffer. The invalid output samples get shifted there by a >> linear phase multiplication in the frequency domain (i.e convolution >> with delta pulse in the time domain ). " > > > Mark, > > But then it isn't a start-up transient.
The transients with circular convolution are not really "startup" transients anyway. The transients are caused by the filter overlapping both the front and back of the signal buffer. If the signal buffer is zeropadded with nh-1 zeros, as with overlap-add, then the first nh-1 samples are numerically the same as the startup transient, since the back is all zeros. If the signal buffer is *not* zeropadded, as with overlap-save, the first nh-1 samples are due to circular convolution with the back of the buffer. These transient samples are meaningless, and must be discarded. All I am saying is that you can easily shift those nh-1 samples from the front of the buffer to the back. Juts rotate the impulse response. In other words, circularly convolve the impulse response with a delta pulse.
> The ordinary operation of a > usual FIR has output less than perfect until as many samples have been > accepted as the filter has taps. It is possible to suppress the output > until that number of samples have been loaded, discarding the imperfect > ones, but that delays when output begins.
The idea of delays is tricky with regards to block processing. There is no constant delay. The first output block is not available until the entire first input block is read. Then the entire output block is available. The input-to-output latency of any useful fft based filtering is always worse than with a transversal (direct form FIR) filter. (see Note 1)
> You seem to claim that perfect output begins immediately with "overlap scrap", before a transversal > calculation would make it available or that the initial filter inputs > actually produce good output.
I claim no such thing. I simply say that you can control the placement of the transient samples within the output block. That output block is not computed until an entire input block has been read. The time travel buggy does not shift into reverse.
>> Was this unclear? > I hope so, for my sanity's sake.
Shall I send the nice men in white suits to take you to the soft room ? :) -- Mark Borgeding Note 1: One could contend that the ridiculous case where a fft->mult->ifft is done for each input sample results in a constant lag, but this is a pretty silly use of resources.
On Wed, 25 Feb 2004 17:41:03 -0500, Jerry Avins wrote:
> But then it isn't a start-up transient. The ordinary operation of a > usual FIR has output less than perfect until as many samples have been > accepted as the filter has taps. It is possible to suppress the output > until that number of samples have been loaded, discarding the imperfect > ones, but that delays when output begins. You seem to claim that perfect > output begins immediately with "overlap scrap", before a transversal > calculation would make it available or that the initial filter inputs > actually produce good output.
The startup transient that you speak of is not the same transient that Mark is discussing, I believe, which is (in his case) the discarded circular convoulution part. I'm afraid that I didn't get your meaning at first either, and in retrospect, that is interesting to me. You see, I work/think in a world of filters defined by time domain impulse responses, where initial conditions are defined to be zero (all other situations are covered by linear shift invariance). I.e., signals are always {x[n]= 0, n < 0}. What you call the "startup transient" is therefore all that I'm interested in. You only get yourself into a state where you would call this transient a "less than perfect" part of the response if you were only interested in steady state (perhaps "frequency domain") performance, where that steady state was some non-zero sequence, or at least did not contain periods of zeroes longer than the impulse response length. Interesting. Thanks for the kick to my mental perspective. I hope that this is the heart of the matter that you found confusing? Cheers, -- Andrew
On Wed, 25 Feb 2004 15:26:57 -0500, Mark Borgerding wrote:
> But if the output from the ifft begins with invalid samples, where can > you point the ifft output so that the first valid sample from block N > will be placed right after the last valid sample from block N-1?
You're only thinking in terms of canned FFT routines, which is fair, I admit. You don't have to compute all of the last pass of your IFFT if you don't want to. You could choose to only compute the results for the valid samples, and store them wherever you like. Even into the correct location in a circular output buffer.
> I don't see how a circular output buffer obviates the buffer copy with > textbook overlap-save.
It doesn't. It's just the usual way to marshall data for output, on DSPs. -- Andrew
Andrew Reilly wrote:

> On Wed, 25 Feb 2004 17:41:03 -0500, Jerry Avins wrote: > >>But then it isn't a start-up transient. The ordinary operation of a >>usual FIR has output less than perfect until as many samples have been >>accepted as the filter has taps. It is possible to suppress the output >>until that number of samples have been loaded, discarding the imperfect >>ones, but that delays when output begins. You seem to claim that perfect >>output begins immediately with "overlap scrap", before a transversal >>calculation would make it available or that the initial filter inputs >>actually produce good output. > > > The startup transient that you speak of is not the same > transient that Mark is discussing, I believe, which is (in his > case) the discarded circular convoulution part. > > I'm afraid that I didn't get your meaning at first either, > and in retrospect, that is interesting to me. You see, I > work/think in a world of filters defined by time domain impulse > responses, where initial conditions are defined to be zero > (all other situations are covered by linear shift invariance). > I.e., signals are always {x[n]= 0, n < 0}. > > What you call the "startup transient" is therefore all that I'm > interested in. You only get yourself into a state where you > would call this transient a "less than perfect" part of the > response if you were only interested in steady state (perhaps > "frequency domain") performance, where that steady state was > some non-zero sequence, or at least did not contain periods of > zeroes longer than the impulse response length. > > Interesting. Thanks for the kick to my mental perspective. > > I hope that this is the heart of the matter that you found > confusing? > > Cheers,
What I refer to as the startup transient is also evident as edge effect when processing images with a kernel that involves pixels surrounding the one being output. At the edge of the image the kernel is partly off the image. The discussion began when the OP asked why FFT convolution gave a different result than ordinary convolution, expecting them to be precise equivalents. (As far as I know, they are sample-for-sample equivalents.) Mark seemed to suggest that, while ordinary convolution had the kind of start-up transient I've been harping on, his overlap-scrap method does not. Maybe I completely misunderstood, but that's how I construed his contribution. Jerry -- Engineering is the art of making what you want from things you can get. &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;
Andrew Reilly wrote:
> On Wed, 25 Feb 2004 15:26:57 -0500, Mark Borgerding wrote: > >>But if the output from the ifft begins with invalid samples, where can >>you point the ifft output so that the first valid sample from block N >>will be placed right after the last valid sample from block N-1? > > > You're only thinking in terms of canned FFT routines, which > is fair, I admit. You don't have to compute all of the last > pass of your IFFT if you don't want to. You could choose to > only compute the results for the valid samples, and store them > wherever you like. Even into the correct location in a circular > output buffer.
True. This *would* have computational advantages. Do you know of [m]any off-the-shelf fft routines that can do this? I've thought of adding this capability to kissfft. I'm concerned about the costs in terms of simplicity and performance. -- Mark
"Mark Borgerding" <mark@borgerding.net> wrote in message
news:403d3535$1@news.xetron.com...
> Jerry Avins wrote: > > Mark Borgerding wrote: > > The transients with circular convolution are not really "startup" > transients anyway. The transients are caused by the filter overlapping > both the front and back of the signal buffer.
Actually they are startup transients - no matter what you call them. Signal buffers are a detail of implementation and startup transients are a matter of physics. Then, there's the "mixed mode" where delays due to physics and delays due to implementation both matter.
> > > If the signal buffer is zeropadded with nh-1 zeros, as with overlap-add, > then the first nh-1 samples are numerically the same as the startup > transient, since the back is all zeros. > > If the signal buffer is *not* zeropadded, as with overlap-save, > the first nh-1 samples are due to circular convolution with the back of > the buffer. These transient samples are meaningless, and must be
discarded. I need to figure how you do fft/mult/ifft circular convolution without zeropadding. Well, that is, unless you're going to allow temporal aliasing. You have to at least zero pad enough to make temporal room for the filter impulse response length. Otherwise there will be aliasing won't there? Both arrays have to be the same length.... N+L-1, right? I guess if you're going to scrap some of the output then you can just not zero-pad and, instead, data-pad. Is that the idea? If the filter is rotated by its entire length, and the data isn't zero-padded then it must be of the proper length nonetheless. So, let's choose L<<N. So the filter has to be zero-padded out to N. If the data isn't zero-padded but the output is ignored then maybe that works is the point, eh?
> > All I am saying is that you can easily shift those nh-1 samples from the > front of the buffer to the back. Juts rotate the impulse response. In > other words, circularly convolve the impulse response with a delta pulse.
It can be done but it only changes the index of the samples.
> > > > The ordinary operation of a > > usual FIR has output less than perfect until as many samples have been > > accepted as the filter has taps. It is possible to suppress the output > > until that number of samples have been loaded, discarding the imperfect > > ones, but that delays when output begins.
Isn't supression the same as "ignoring" or throwing out the initial transient?
> > The idea of delays is tricky with regards to block processing. There is > no constant delay.
Yes, but there is a constant latency which is determined by how soon you can start taking data out (at the proper rate). So, the bursts of data make no difference. Just like the ideal case of streaming audio.
> > The first output block is not available until the entire first input > block is read. Then the entire output block is available.
Exactly. That pretty much defines the latency. (I say latency to avoid using the word "delay" and getting into the mixed mode).
> > The input-to-output latency of any useful fft based filtering is always > worse than with a transversal (direct form FIR) filter. (see Note 1)
Interesting. I hadn't thought about it. I guess it must be that for a real time filter either approach must be "fast enough" for throughput and the fft/mult/ifft process with the smaller compute load adds latency. Makes sense. Fred
Fred Marshall wrote:
> "Mark Borgerding" <mark@borgerding.net> wrote in message > news:403d3535$1@news.xetron.com... > >>Jerry Avins wrote: >> >>>Mark Borgerding wrote: >> >>The transients with circular convolution are not really "startup" >>transients anyway. The transients are caused by the filter overlapping >>both the front and back of the signal buffer. > > > Actually they are startup transients - no matter what you call them. Signal > buffers are a detail of implementation and startup transients are a matter > of physics. Then, there's the "mixed mode" where delays due to physics and > delays due to implementation both matter. > > >> >>If the signal buffer is zeropadded with nh-1 zeros, as with overlap-add, >>then the first nh-1 samples are numerically the same as the startup >>transient, since the back is all zeros. >> >>If the signal buffer is *not* zeropadded, as with overlap-save, >>the first nh-1 samples are due to circular convolution with the back of >>the buffer. These transient samples are meaningless, and must be > > discarded. > > I need to figure how you do fft/mult/ifft circular convolution without > zeropadding.
Search the index of your favorite DSP book for "overlap save" or "overlap scrap". It will surely explain better than I do. ( Note: "Overlap save" and "overlap scrap" both refer to the same technique. This thread has made it obvious that by using the latter, less common, term; I've been making it harder to understand. My apologies.) > Well, that is, unless you're going to allow temporal aliasing.
> You have to at least zero pad enough to make temporal room for the filter > impulse response length. Otherwise there will be aliasing won't there? > Both arrays have to be the same length.... N+L-1, right? I guess if you're > going to scrap some of the output then you can just not zero-pad and, > instead, data-pad. Is that the idea?
Exactly. The output from the ifft stage of Overlap-Save filtering *does* have temporal aliasing. The affected nh-1 samples must be scrapped. Overlap Add filtering performs proper circular convolution without temporal aliasing. Unfortunately, this means that an addition step must be done to account for the transients. Overlap Save fills up the data buffer completely, ffts, multiplies by the frequency response of the filter, then inverse ffts. The aliased samples at the end must then be scrapped. This is less intuitive than overlap-add, but more efficient since the addition step is unnecessary. The rotation technique I add to the Overlap Save method further reduces intuitiveness in exchange for another small gain in efficiency ( in at least *some* situations where it obviates the buffer copy).
>>All I am saying is that you can easily shift those nh-1 samples from the >>front of the buffer to the back. Juts rotate the impulse response. In >>other words, circularly convolve the impulse response with a delta pulse. > > > It can be done but it only changes the index of the samples.
Right. It shifts the first valid sample to the first index of the ifft result. This can result in a redcution in buffer copies. Say you need to filter a large input buffer of samples to a large output buffer. With *standard* Overlap Save: The output from the ifft is nh-1 junk samples, followed by nfft-nh+1 valid output samples. The valid samples must be copied from your ifft output to your large output buffer. With *shifted* Overlap Save: The output from the ifft is nfft-nh+1 valid output samples, followed by nh-1 junk samples. Minor distinction, but the latter technique allows you to send the ifft output directly to your large output buffer. The previous ifft's junk samples can be overwritten with this ifft's valid samples.
>>The input-to-output latency of any useful fft based filtering is always >>worse than with a transversal (direct form FIR) filter. (see Note 1)
>
> Interesting. I hadn't thought about it. I guess it must be that for a real > time filter either approach must be "fast enough" for throughput and the > fft/mult/ifft process with the smaller compute load adds latency. Makes > sense.
It's the old tradeoff of speed vs. latency. Push it in one place and it pops out another.
> > Fred >
Thanks for the input, -- Mark Borgerding

Jerry Avins wrote:
> > Andrew Reilly wrote: > > > On Wed, 25 Feb 2004 17:41:03 -0500, Jerry Avins wrote: > > > >>But then it isn't a start-up transient. The ordinary operation of a > >>usual FIR has output less than perfect until as many samples have been > >>accepted as the filter has taps. It is possible to suppress the output > >>until that number of samples have been loaded, discarding the imperfect > >>ones, but that delays when output begins. You seem to claim that perfect > >>output begins immediately with "overlap scrap", before a transversal > >>calculation would make it available or that the initial filter inputs > >>actually produce good output. > > > > > > The startup transient that you speak of is not the same > > transient that Mark is discussing, I believe, which is (in his > > case) the discarded circular convoulution part. > > > > I'm afraid that I didn't get your meaning at first either, > > and in retrospect, that is interesting to me. You see, I > > work/think in a world of filters defined by time domain impulse > > responses, where initial conditions are defined to be zero > > (all other situations are covered by linear shift invariance). > > I.e., signals are always {x[n]= 0, n < 0}. > > > > What you call the "startup transient" is therefore all that I'm > > interested in. You only get yourself into a state where you > > would call this transient a "less than perfect" part of the > > response if you were only interested in steady state (perhaps > > "frequency domain") performance, where that steady state was > > some non-zero sequence, or at least did not contain periods of > > zeroes longer than the impulse response length. > > > > Interesting. Thanks for the kick to my mental perspective. > > > > I hope that this is the heart of the matter that you found > > confusing? > > > > Cheers, > > What I refer to as the startup transient is also evident as edge effect > when processing images with a kernel that involves pixels surrounding > the one being output. At the edge of the image the kernel is partly off > the image. > > The discussion began when the OP asked why FFT convolution gave a > different result than ordinary convolution, expecting them to be precise > equivalents. (As far as I know, they are sample-for-sample equivalents.) > Mark seemed to suggest that, while ordinary convolution had the kind of > start-up transient I've been harping on, his overlap-scrap method does > not. Maybe I completely misunderstood, but that's how I construed his > contribution.
Yes, but using your image processing analogy - Depending on how you implement it the edge effect can be at one edge, or the other or split between both (we'll assume the output and input are the same size). To get the same affect as you expect to see, Mark just needs to start his process with one additional empty block and then discard his last block. He's not doing that so he doesn't see the same thing as your method. Or you could just discard the first block and get what he's getting (if you process one additional empty block at the end). -jim -----= Posted via Newsfeeds.Com, Uncensored Usenet News =----- http://www.newsfeeds.com - The #1 Newsgroup Service in the World! -----== Over 100,000 Newsgroups - 19 Different Servers! =-----
In article <403a251b$0$3095$61fed72c@news.rcn.com>,
Jerry Avins  <jya@ieee.org> wrote:
>For the record, the startup transient in overlap-add and overlap-save >arises from there being no prior iteration.
The startup transient arises because this method implies a two-sided interpolator or filter when there is only one-sided data at the start. Given constraints of some degree of polynomial smoothness, or some realistic frequency rolloff before the Nyquist frequency, it seems that a one-sided or asymmetric interpolator or filter might give better results near the start (and end) (of the complete data sequence. e.g. not just the current block), certainly better than just using a two-sided linear phase filter against a lot of unrelated zero fill. A interesting question might be what would this one-sided filter look like? A minimum (or negative) phase conversion of the linear phase FIR? And how would one transition from the one-sided filter near the startup (or end) to the two-sided filter once enough data becomes available? Perhaps dual filter the first data block and a cross-fade between the two filtered results? IMHO. YMMV. -- Ron Nicholson rhn AT nicholson DOT com http://www.nicholson.com/rhn/ #include <canonical.disclaimer> // only my own opinions, etc.