DSPRelated.com
Forums

Floating point DSPs

Started by HardySpicer July 6, 2010
For floating point arithmetic how much faster is an add/subtract than
a multiply/accumulate? (percentage wise).


Hardy
On 7/6/10 6:33 PM, HardySpicer wrote:
> For floating point arithmetic how much faster is an add/subtract than > a multiply/accumulate? (percentage wise).
Probably depends on the chip. The last time I used a floating point dsp (C30!) all floating point ops (add, sub, mul, mac) finished in a single cycle. (I think.) Ray
On 7/6/2010 6:33 PM, HardySpicer wrote:
> For floating point arithmetic how much faster is an add/subtract than > a multiply/accumulate? (percentage wise).
It depends on the chip. Done in software, multiplication is often faster than addition. Jerry -- Engineering is the art of making what you want from things you can get. �����������������������������������������������������������������������
>On 7/6/10 6:33 PM, HardySpicer wrote: >> For floating point arithmetic how much faster is an add/subtract than >> a multiply/accumulate? (percentage wise). > >Probably depends on the chip. The last time I used a floating point dsp >(C30!) all floating point ops (add, sub, mul, mac) finished in a single >cycle. (I think.)
It could issue a new floating point instruction each cycle, but each instruction took a number of cycles to move through the pipeline, and pop out the end. I've never seen a floating point unit that attempted to complete the instructions in a single cycle. It would be extremely inefficient. Steve
Raymond Toy <toy.raymond@gmail.com> wrote in news:i10bse$h90$1@news.eternal-
september.org:

> On 7/6/10 6:33 PM, HardySpicer wrote: >> For floating point arithmetic how much faster is an add/subtract than >> a multiply/accumulate? (percentage wise). > > Probably depends on the chip. The last time I used a floating point dsp > (C30!) all floating point ops (add, sub, mul, mac) finished in a single > cycle. (I think.) > > Ray
I entered into the middle of this thread so unless I have the context wrong.... On a SHARC, floating point multiply and floating add have the same cost - one instruction, actually you can do two each in SIMD with some constraints. Fixed point math also operates in one cycle. Instructions on a SHARC operate at the core clock, which can be as high as 450M. They all execute in 1 cycle. I assume that the TI floating point DSPs would be similar. Single cycle (1 instruction) processing is quite normal for DSPs. Algorithms that trade off multiplies for adds are not generally helpful with DSPs. OTOH, these techniques can be very useful for other type of devices such as FPGAs or GP microcontrollers. Al Clark www.danvillesignal.com
On Jul 6, 3:33&#4294967295;pm, HardySpicer <gyansor...@gmail.com> wrote:
> For floating point arithmetic how much faster is an add/subtract than > a multiply/accumulate? (percentage wise). > > Hardy
I have experience with the C32 and C33. Multiplies and adds or multiplies and subtracts can happen at a rate of 1 per clock cycle but that doesn't mean the complete in that time as other have mentioned. The C32 and C33 can sometimes do two floating point operations but usually one is fetching from memory. The big enemy is not multiplies, adds or subtracts but divides. Also pipe line stalls due to getting and storing data back to memory. To estimate time I usually count memory cycles. Peter Nachtwey

pnachtwey wrote:
> On Jul 6, 3:33 pm, HardySpicer <gyansor...@gmail.com> wrote: > >>For floating point arithmetic how much faster is an add/subtract than >>a multiply/accumulate? (percentage wise). >> >>Hardy > > I have experience with the C32 and C33. Multiplies and adds or > multiplies and subtracts can happen at a rate of 1 per clock cycle but > that doesn't mean the complete in that time as other have mentioned. > The C32 and C33 can sometimes do two floating point operations but > usually one is fetching from memory. The big enemy is not multiplies, > adds or subtracts but divides.
IIRC there is no penalty for floating point division in Intel P5+ CPUs; with their huge pipelines all arithmetic operations have the same cost. Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com
On Jul 6, 11:33&#4294967295;pm, HardySpicer <gyansor...@gmail.com> wrote:
> For floating point arithmetic how much faster is an add/subtract than > a multiply/accumulate? (percentage wise). > > Hardy
It depends on your code, compiler, and architecture. Best practice is to measure this statistically. I tend to think architecture all the time. Sometimes a memory-memory instruction can make all the difference in the world and even with increased compiler smartness, there is no substitute to human prudence. Because nothing can understand exactly what you'r after and what you can do with or without but you. -Momo
On Jul 7, 11:45&#4294967295;pm, Manny <mlou...@hotmail.com> wrote:
> On Jul 6, 11:33&#4294967295;pm, HardySpicer <gyansor...@gmail.com> wrote: > > > For floating point arithmetic how much faster is an add/subtract than > > a multiply/accumulate? (percentage wise). > > > Hardy > > It depends on your code, compiler, and architecture. Best practice is > to measure this statistically. > > I tend to think architecture all the time. Sometimes a memory-memory > instruction can make all the difference in the world and even with > increased compiler smartness, there is no substitute to human > prudence. Because nothing can understand exactly what you'r after and > what you can do with or without but you. > > -Momo
Ah. Well that was in reply to other posts rather than your original. If you'r building a case against something, power might be of relevance here. -Momo
>For floating point arithmetic how much faster is an add/subtract than >a multiply/accumulate? (percentage wise). > > >Hardy >
The previous replies are correct if your metric is programmable processer clock cycles. In the hardware - a floating point number consists of a mantissa (normalized fractional portion) and an exponent (the power of 2 of the number). When multiplying, the two mantissas are multiplied in a fashion similiar to fixed-point multiplies. The exponents are added. The result is then adjusted in its exponent to re-normalize the mantissa. The mantissa generally is normalized to be on [0.5, 1.0), meaning two mantissas multiplied together will range on [0.25, 1.0), meaning to re-normalize this result back to the the [0.5, 1.0) range, there can be an extra shift of 1 bit (i.e. added to the resultant exponent). However - standard floating point multipliers also check for floating-point overflow (exponent too large) and zero. This adds another level of logic at the output. So - the mantissa multiply will be roughly of (relative) complexity M^2, where M = number of mantissa bits. The exponent add is of (relative) complexity N, where N = number of exponent bits. The single-bit shift, depending on how it's done, can be extremely simple, but let's call it complexity N because of the exponent decrement. For the addition - this requires that the two numbers be adjusted so they have the same exponent. This requires a compare of the two exponents (complexity N), a shift of the smaller number to match the larger number (complexity 2M), then an add of the mantissas (complexity M), a small-shift adjustment of the result (complexity N), plus the misc logic to check overflow and zero. So, ignoring the output checking logic, a VERY rough estimate is that a floating point multiply is of complexity (M^2 + N + N). A floating point add is of complexity (N + 2M + M + N). Based on your specific floating point format, you can then calculate your percentage comparison. That being said, there are tricks to simplify this. For instance, the final single-bit adjust of the multiply output can be incorporated into the exponent add with some look ahead logic. I also add the caveat that I am making a gross assumption that size / # of gates <--> delay. Bryant Sorensen DSP Platforms Starkey Laboratories