DSPRelated.com

Convolution

The most important operation in DSP

You know that an LTI system is completely described by its impulse response h[n]. You know that any input can be decomposed into shifted, scaled impulses. The final piece: how do you actually compute the output?

The answer is convolution. It is the single most important operation in digital signal processing. Every filter, every reverb effect, every smoothing algorithm, every matched detector: they all compute a convolution.

The formula is simple. The insight behind it is even simpler. Let's build it from two different directions and see why they meet at the same answer.

The Input-Side View

Think of it this way: each input sample is a tiny event, like dropping a pebble into water. When sample x[k] arrives at time k, it triggers the system's impulse response, but scaled by x[k] and shifted to start at time k.

So x[0] triggers x[0] · h[n]. Then x[1] triggers x[1] · h[n - 1] (same shape, shifted one step later). Then x[2] triggers x[2] · h[n - 2]. And so on.

The output is the sum of all these overlapping copies:

y[n] = ∑k x[k] · h[n - k]

This is convolution. Each input sample creates a ripple (a scaled copy of h[n]). Where the ripples overlap, they add together. The output at any moment is the sum of all the ripples that are still ringing at that moment.

Try it: The input x[n] = [1, 2, 3, 2, 1] passes through a system with h[n] = [1, 0.5, 0.25]. Each input sample triggers a colored copy of h[n]. Set the slider to "all" to see every copy stacked. The green output at the bottom is their sum. Now slide through k = 0, 1, 2, ... to see each copy individually. Notice how later copies are shifted right (delayed) and scaled by the input value at that time.
Key Insight: The input-side view is the most intuitive way to think about convolution. Every input sample drops a pebble, creating a ripple shaped like h[n]. The output is where all the ripples add up. This is how reverb works: each sound triggers the room's impulse response, and what you hear is the sum of all the overlapping echoes.

The Output-Side View: Flip and Slide

There's a completely different way to compute the same result. Instead of asking "what does each input sample contribute?", ask: "to get one output sample y[n], which input samples matter and how much does each contribute?"

The recipe:

  1. Flip the impulse response: reverse h[m] to get h[-m]
  2. Slide the flipped version to position n: h[n - m]
  3. Multiply point-by-point with x[m] wherever they overlap
  4. Sum the products. That sum is y[n].

Then slide to the next n and repeat. Each position gives you one output sample.

Try it: Press play or drag the slider. The blue stems are x[n] (fixed). The orange stems are h[n - k] (flipped and sliding). At each position, the green/red products appear where the signals overlap, and their sum becomes y[n]. Watch the output build up one sample at a time.
Key Insight: The flip-and-slide view answers a different question: "what goes into computing output sample y[n]?" The answer is: all input samples x[k] that fall under the flipped, shifted impulse response, each weighted by the corresponding h value. It's a moving weighted average.

Same Formula, Two Perspectives

The input-side view and the flip-and-slide view aren't two different operations. They're two ways to read the same sum:

y[n] = ∑k x[k] · h[n - k]

  • Input-side: For each k, the product x[k] · h[n - k] is a contribution from input sample k. Sum them across all k to get y[n]. But you can also read it column-wise: for a fixed k, as n varies, x[k] · h[n - k] traces out a scaled, shifted copy of h.
  • Output-side: For each n, the products x[k] · h[n - k] are the overlapping values. Sum them across k to get y[n]. This is one output sample at a time.

It's the same grid of numbers, just summed in a different order. Rows versus columns. The total is the same.

Try it: Both sides compute y[n] for x = [1, 2, 3, 2, 1] and h = [1, 0.5, 0.25]. The left panel shows the input-side view: colored copies stacking up as each input sample triggers h[n]. The right panel shows the flip-and-slide view: h[n-k] sliding across x[n] one position at a time. The output at the bottom is shared — both sides produce the same green stems. Press play and watch them build the same answer from opposite directions.
Key Insight: The two views are not two algorithms. They are two ways of reading the same sum. The input-side view is better for intuition ("each sample creates a ripple"). The output-side view is better for computation ("to get y[n], flip and slide"). Use whichever helps you think about the problem at hand.

Properties Worth Knowing

  • Output length: Convolving N samples with M samples produces N + M - 1 output samples. The "extra" M - 1 samples come from the impulse response ringing out after the last input sample.
  • Commutativity: x * h = h * x. You get the same output regardless of which signal you call the "input" and which you call the "filter." This follows from a simple change of variable in the sum.
  • Identity: x * δ = x. Convolving with an impulse leaves the signal unchanged. This is why h[n] is defined as the output when the input is δ[n].
  • Associativity: (x * h1) * h2 = x * (h1 * h2). Cascading two systems is equivalent to convolving their impulse responses first, then filtering once.

What Convolution Does

Different impulse responses produce different effects:

  • h = [1/3, 1/3, 1/3] — 3-point moving average. Smooths the signal by averaging neighbors. A simple low-pass filter.
  • h = [1, -1] — First difference. Enhances changes and suppresses constants. A simple high-pass filter.
  • h = [1, 0, 0, 0.5] — Echo. Adds a delayed, quieter copy of the input to itself. Used in audio effects.
  • h = δ[n] — Identity. The output equals the input. Convolution with an impulse leaves the signal unchanged.

The impulse response is the system's personality. Change h, and you change what the system does. The convolution formula stays the same.

Looking Ahead

Convolution works, but it's expensive: convolving an N-sample input with an M-sample impulse response requires roughly N × M multiplications. For large signals and long filters, this adds up fast.

There is a remarkable shortcut. It turns out that convolution in the time domain corresponds to multiplication in the frequency domain. If you transform both signals to the frequency domain (using the DFT), multiply them point-by-point, and transform back, you get the same result as convolution, but much faster for large signals.

This is one of the key reasons the DFT matters, and it's what we'll explore in the next chapter.

Key Insight: Convolution connects the time domain and the frequency domain. Every filter you'll ever design is specified by its impulse response h[n], and its effect on each frequency is determined by the DFT of h[n]. Understanding convolution is the gateway to understanding filtering.

Frequently Asked Questions

Is convolution the same as multiplication?

No. Convolution involves flipping, shifting, multiplying, and summing -- much more involved than element-wise multiplication. However, there is a deep connection: convolution in the time domain is equivalent to multiplication in the frequency domain. This convolution theorem is one of the most important results in DSP and is the reason the FFT is so useful for filtering.

Why is the output longer than the input?

When the last input sample arrives, the impulse response has just been triggered. It takes M - 1 more time steps for that last copy to finish ringing out. So the output extends M - 1 samples beyond the last input, making the total length N + M - 1.

Does the order matter? Is x * h the same as h * x?

Yes, convolution is commutative: x[n] * h[n] = h[n] * x[n]. You get the same output regardless of which signal you treat as the input and which as the filter. This follows directly from the convolution formula by a change of summation variable.

Quick Check

Test your understanding of the key concepts from this lesson.