Convolution is a mathematical operation that combines two sequences (or functions) to produce a third, expressing how one sequence modifies or is shaped by the other. In discrete-time signal processing, it is the fundamental operation linking a linear time-invariant (LTI) system's input, its impulse response, and its output.
In practice
In embedded DSP, convolution appears most directly as FIR filtering. A FIR filter of length N computes one output sample by multiplying the last N input samples against N stored coefficients and summing the products — this is the discrete convolution sum. On a Cortex-M4 or Cortex-M7 with a single-cycle MAC (multiply-accumulate) instruction, a length-64 FIR filter requires 64 multiply-accumulate operations per output sample; actual cycle counts also depend on memory accesses, pipeline effects, and whether the implementation uses loop unrolling or SIMD, but the MAC count provides a useful baseline for budgeting. CMSIS-DSP provides optimized `arm_fir_f32` and `arm_fir_q15` implementations that exploit available DSP and SIMD instructions on cores and build configurations that support them.
For longer filters or large block sizes, direct convolution becomes expensive: convolving an N-point signal with an M-point filter naively costs O(N·M) multiplications (for the full output; exact count varies with boundary convention). Overlap-add and overlap-save methods decompose the problem into shorter FFT-based convolutions, reducing complexity to roughly O(N·log N) in practical terms, though the precise scaling depends on block size and filter length. This tradeoff matters on resource-constrained MCUs where a length-1024 direct convolution may be prohibitive but an FFT-based approach fits within cycle budget. The post "Simultaneously Computing a Forward FFT and an Inverse FFT Using a Single FFT" describes techniques relevant to efficient FFT-based processing pipelines.
Convolution is also used to combine the impulse responses of cascaded LTI stages. If two discrete-time systems have impulse responses h1[n] and h2[n], the overall impulse response is h1[n] * h2[n] (their convolution). This is directly useful when designing or analyzing filter chains, as discussed in "Coefficients of Cascaded Discrete-Time Systems."
A common pitfall is boundary handling. Discrete convolution of an N-point signal with an M-point filter produces N+M-1 output samples; implementations that assume the output length equals the input length silently discard or distort edge samples. Fixed-point implementations add another concern: accumulator width must be wide enough to prevent overflow during the summation, which is why some MCU DSP libraries use 64-bit or saturating accumulators for Q15 and Q31 data types, though accumulator width and scaling strategies vary across libraries and cores. The inverse problem — recovering an input from a known output and impulse response — is deconvolution, covered in "Deconvolution by least squares (Using the power of linear algebra in signal processing)."
Learn this in DSP Foundations
Discussed on DSPRelated
Frequently asked
What is the difference between linear convolution and circular convolution?
Linear convolution of an N-point and an M-point sequence produces an (N+M-1)-point result and correctly models LTI system behavior.
Circular convolution wraps around modulo some length L and is what the
DFT/
FFT naturally computes via pointwise multiplication in the
frequency domain. To use the FFT to compute linear convolution without
aliasing, L must be at least N+M-1, and the sequences must be zero-padded to that length before transforming.
How do I choose between a direct FIR implementation and an FFT-based convolution on a microcontroller?
Direct convolution costs roughly N multiply-accumulate operations per output sample (N = filter length) and has low overhead, making it efficient for short filters or low sample rates.
FFT-based overlap-add/overlap-save methods have higher per-block setup cost but scale as O(L·log L) for a block of length L, so they become favorable when filter length is long enough that the per-sample savings outweigh the block overhead — often cited as a rough heuristic around tens of taps, but the actual crossover is highly dependent on the specific MCU, available instructions, and implementation. Profiling both approaches on the target hardware is the most reliable guide.
Why is FIR filtering described as convolution but IIR filtering is not?
A
FIR filter computes its output as a finite weighted sum of current and past inputs — exactly the discrete convolution sum with a finite-length
impulse response. An
IIR filter feeds back past output values, giving it an infinitely long impulse response that cannot be expressed as a finite convolution kernel. IIR filters are instead described by a difference equation or, equivalently, by a ratio of polynomials in the z-domain.
What accumulator width is needed for fixed-point convolution to avoid overflow?
For
Q15 input and Q15 coefficients, each product is Q30, and summing N such products can grow the result by up to ceil(log2(N)) bits in the worst case. A 64-bit accumulator provides enough headroom for filters of typical embedded DSP lengths under normal scaling assumptions. Many Cortex-M4/M7 DSP instructions support a 64-bit accumulator directly (SMLAL, SMLALD), and CMSIS-DSP q15
FIR routines rely on this. On cores without wide accumulators, guard bits or scaling before accumulation are required to prevent overflow.
How does convolution relate to the frequency-domain multiplication property?
For LTI systems, convolution in the
time domain corresponds to pointwise multiplication in the
frequency domain: Y(f) = X(f) · H(f). This duality is why
FIR filter design is often done by specifying a desired
frequency response H(f) and transforming back to get the
impulse response h[n]. It is also the basis for
FFT-based fast convolution: transform both sequences, multiply pointwise, then inverse-transform the result.
Differentiators vs similar concepts
Convolution is sometimes confused with correlation.
Cross-correlation measures the similarity between two signals as a function of a time lag and uses the same sum-of-products structure, but one sequence is not time-reversed as it is in convolution. For symmetric
FIR filters the two operations produce identical results, which can obscure the distinction in practice. Convolution is also distinct from polynomial multiplication in name only — the operations are mathematically identical, a fact exploited in z-domain analysis of cascaded systems.