Implementation of Elementary Functions for a Fixed Point SIMD DSP Coprocessor
This thesis is about implementing the functions for reciprocal, square root, inverse square root and logarithms on a DSP platform. A multi-core DSP platform that consists of one master processor core and several SIMD coprocessor cores is currently being designed by a team at the Computer Engineering Department of Linköping University. The SIMD coprocessors’ arithmetic logic unit (ALU) has 16 multipliers to support vector multiplication instructions. By efficiently using the 16 multipliers, it is possible to evaluate polynomials very fast. The ALU does not have (hardware) support for floating point arithmetic, so the challenge is to get good precision by using fixed point arithmetic. Precise and fast solutions to implement the mathematical functions are found by converting the fixed point input to a soft floating point format before polynomial approximation, choosing a polynomial based on an error analysis of the polynomial approximation, and using Newton-Raphson or Goldschmidt iterations to improve the precision of the polynomial approximations. Finally, suggestions are made of changes and additions to the instruction set architecture, in order to make the implementations faster, by efficiently using the currently existing hardware.
Summary
This master’s thesis presents methods to implement reciprocal, square root, inverse square root and logarithm functions on a fixed-point SIMD DSP coprocessor. It shows how to combine range reduction, polynomial/lookup approximations and iterative refinement while exploiting a 16-multiplier SIMD ALU to achieve high speed and good precision for real-time DSP tasks.
Key Takeaways
- Implement range reduction and Q-format fixed-point scaling to prepare inputs for polynomial and table-based approximations.
- Design polynomial and lookup-table hybrid approximations (minimax/Chebyshev style) for reciprocal, sqrt, inv-sqrt and log tailored to fixed-point SIMD evaluation.
- Apply iterative refinement (e.g., Newton–Raphson) to rapidly converge to target precision with few iterations.
- Exploit the SIMD coprocessor’s 16 multipliers to vectorize and pipeline polynomial evaluation for low-latency, high-throughput implementations.
- Measure and trade off precision vs. cycle count; validate algorithms with error bounds and test vectors to meet application requirements.
Who Should Read This
Advanced DSP or embedded systems engineers working on fixed-point SIMD/multi-core DSP platforms who need efficient, accurate implementations of elementary math functions for real-time audio, radar or communications systems.
Still RelevantAdvanced
Related Documents
- A New Approach to Linear Filtering and Prediction Problems TimelessAdvanced
- A Quadrature Signals Tutorial: Complex, But Not Complicated TimelessIntermediate
- An Introduction To Compressive Sampling TimelessIntermediate
- Lecture Notes on Elliptic Filter Design TimelessAdvanced
- Computing FFT Twiddle Factors TimelessAdvanced







