Algorithm Adaptation and Optimization of a Novel DSP Vector Co-processor
The Division of Computer Engineering at Linköping's university is currently researching the possibility to create a highly parallel DSP platform, that can keep up with the computational needs of upcoming standards for various applications, at low cost and low power consumption. The architecture is called ePUMA and it combines a general RISC DSP master processor with eight SIMD co-processors on a single chip. The master processor will act as the main processor for general tasks and execution control, while the co-processors will accelerate computing intensive and parallel DSP kernels.This thesis investigates the performance potential of the co-processors by implementing matrix algebra kernels for QR decomposition, LU decomposition, matrix determinant and matrix inverse, that run on a single co-processor. The kernels will then be evaluated to find possible problems with the co-processors' microarchitecture and suggest solutions to the problems that might exist. The evaluation shows that the performance potential is very good, but a few problems have been identified, that causes significant overhead in the kernels. Pipeline mismatches, that occurs due to different pipeline lengths for different instructions, causes pipeline hazards and the current solution to this, doesn't allow effective use of the pipeline. In some cases, the single port memories will cause bottlenecks, but the thesis suggests that the situation could be greatly improved by using buffered memory write-back. Also, the lack of register forwarding makes kernels with many data dependencies run unnecessarily slow.
Summary
This paper evaluates algorithm adaptation and optimization for ePUMA, a heterogeneous DSP platform that pairs a RISC master with eight SIMD co-processors. It shows how matrix-algebra kernels (QR, LU, inverse, determinant) and other parallel DSP routines can be mapped and tuned to exploit SIMD resources for low-power, real-time signal processing.
Key Takeaways
- Quantify performance and energy benefits of offloading matrix- and kernel-intensive DSP tasks to SIMD co-processors on the ePUMA platform.
- Demonstrate concrete mapping and vectorization strategies for QR and LU decompositions and related matrix kernels to maximize SIMD utilization.
- Optimize memory layout and scheduling to reduce bandwidth stalls and improve sustained throughput on a multi-co-processor DSP chip.
- Assess trade-offs between numerical precision, parallelism, and power when adapting signal-processing algorithms to a vector co-processor.
Who Should Read This
Advanced DSP engineers, system architects, and researchers focused on SIMD/vector accelerator design and algorithm mapping for low-power, real-time signal-processing systems.
Still RelevantAdvanced
Related Documents
- A New Approach to Linear Filtering and Prediction Problems TimelessAdvanced
- A Quadrature Signals Tutorial: Complex, But Not Complicated TimelessIntermediate
- An Introduction To Compressive Sampling TimelessIntermediate
- Lecture Notes on Elliptic Filter Design TimelessAdvanced
- Computing FFT Twiddle Factors TimelessAdvanced







