Q1.15 calculation

Started by Frank June 8, 2010
Hey guys,

I was wondering if you could help me optimize the code below (if possible):

R = (B*X);
tmp = (int16_t) (R>>15);
R = ((int32_t)tmp)+Z;
tmp = (int16_t)R;
R = (A*((int32_t)Y));
tmp2 = -(int16_t) (R>>15);
Rout=tmp+tmp2;

Rout is a SINT16 variable containing a Q1.15 value
A and B will be replaced by a number between -32768 and 32767
tmp and tmp2 are SINT16 variables; each containing a Q1.15 value
Z is a SINT32 variable containing a Q1.15 value
Y is a SINT16 variable containing a Q1.15 value
R is a SINT32 variable containing the result of an operation on two Q1.15 
values
X is a SINT32 variable containing a Q1.15 value

The above code looks too messy to me and I'm sure it can be optimized..Or 
maybe I am wrong?
Basically I just want to do some operations on Q1.15 numbers in a 32-bit 
cell.

Thank you in advance.


On Jun 8, 5:49&#2013266080;am, "Frank" <Fr...@invalidmail.com>
wrote:
> > Basically I just want to do some operations on Q1.15 numbers in a 32-bit > cell.
so left-justify it and call it a Q1.31 number (where the 16 LSBs are zero). or right justify it and call it a Q16.16 number. the code looks like C where you've typedeffed some things. r b-j
On 06/08/2010 02:49 AM, Frank wrote:
> Hey guys, > > I was wondering if you could help me optimize the code below (if possible): > > R = (B*X); > tmp = (int16_t) (R>>15); > R = ((int32_t)tmp)+Z; > tmp = (int16_t)R; > R = (A*((int32_t)Y)); > tmp2 = -(int16_t) (R>>15); > Rout=tmp+tmp2; > > Rout is a SINT16 variable containing a Q1.15 value > A and B will be replaced by a number between -32768 and 32767 > tmp and tmp2 are SINT16 variables; each containing a Q1.15 value > Z is a SINT32 variable containing a Q1.15 value > Y is a SINT16 variable containing a Q1.15 value > R is a SINT32 variable containing the result of an operation on two > Q1.15 values > X is a SINT32 variable containing a Q1.15 value > > The above code looks too messy to me and I'm sure it can be > optimized..Or maybe I am wrong? > Basically I just want to do some operations on Q1.15 numbers in a 32-bit > cell. > > Thank you in advance. > >
Is this C? Do you have to stick to C? C really doesn't like Q1.anything -- it's only native fixed-point data type is integer, and it sticks to it like glue. You can do fractional fixed-point arithmetic _much_ faster in assembly -- if I'm working with a processor that doesn't have really fast floating point math then I'll write one -- it usually takes less than a day. See chapter 10 of my book -- http://www.wescottdesign.com/actfes/actfes.html -- it presents a Q1.31 library for the x86; if you can understand assembly language programming at all it should make clear what you need to do for Q1.whatever math on your processor. -- Tim Wescott Control system and signal processing consulting www.wescottdesign.com
> Is this C? Do you have to stick to C?
It is C and I have to stick to C.
>C really doesn't like Q1.anything -- it's only native fixed-point data type >is integer, and it sticks to it like glue. You can do fractional >fixed-point arithmetic _much_ faster in assembly -- if I'm working with a >processor that doesn't have really fast floating point math then I'll write >one -- it usually takes less than a day.
I was trying to do it with some defines like these: #define TOQ15(x32) (((int16_t)(x32>>15)) & (int16_t)(((x32>>31)<<15) | 65535)) The idea here is to shift a 32 bit value 15 bits down set bit 15 (the sign bit) correctly. #define MULXY(x16,y16) TOQ15(((int32_t)x16)*((int32_t)y16)) #define ADDXY(x16,y16) TOQ15(((int32_t)x16)+((int32_t)y16)) Some basic operations. However, there seems to be an error in the above defines...Have to debug it...
> See chapter 10 of my book -- > http://www.wescottdesign.com/actfes/actfes.html -- it presents a Q1.31 > library for the x86; if you can understand assembly language programming > at all it should make clear what you need to do for Q1.whatever math on > your processor.
Thank you. I will have a look at it (even though I still have to do the operations in C)....
>so left-justify it and call it a Q1.31 number (where the 16 LSBs are >zero). or right justify it and call it a Q16.16 number.
Yeah...i guess...but how does that answer my question? :o)
On 06/08/2010 09:55 AM, Frank wrote:
>> Is this C? Do you have to stick to C? > > It is C and I have to stick to C. > > >> C really doesn't like Q1.anything -- it's only native fixed-point data >> type is integer, and it sticks to it like glue. You can do fractional >> fixed-point arithmetic _much_ faster in assembly -- if I'm working >> with a processor that doesn't have really fast floating point math >> then I'll write one -- it usually takes less than a day. > > I was trying to do it with some defines like these: > > #define TOQ15(x32) (((int16_t)(x32>>15)) &
(int16_t)(((x32>>31)<<15) |
> 65535)) > > The idea here is to shift a 32 bit value 15 bits down set bit 15 (the > sign bit) correctly. > > #define MULXY(x16,y16) TOQ15(((int32_t)x16)*((int32_t)y16)) > #define ADDXY(x16,y16) TOQ15(((int32_t)x16)+((int32_t)y16)) > > Some basic operations. However, there seems to be an error in the above > defines...Have to debug it... > > >> See chapter 10 of my book -- >> http://www.wescottdesign.com/actfes/actfes.html -- it presents a Q1.31 >> library for the x86; if you can understand assembly language >> programming at all it should make clear what you need to do for >> Q1.whatever math on your processor. > > Thank you. I will have a look at it (even though I still have to do the > operations > in C)....
You can do it all in ANSI-C as well (and I show it in the book, although in the context of implementing Q1.31 using long long for arithmetic). I just make functions and a test framework rather than using defines -- even with function call overhead you'll still be way faster than floating point if the processor doesn't support it. You're on the right track. -- Tim Wescott Control system and signal processing consulting www.wescottdesign.com
> You're on the right track. >
Worked on it and came up with these macros: #define TOQ15(x32) (((x32>>31)<<15)|(x32&0x7fff)) #define MULXY(x32,y32) TOQ15(((x32*y32)>>15)) So when I execute this code: int16_t x1=-24576; // representing -0.75 in Q1.15 int16_t x2=4915; // representing 0.15 in Q1.15 int32_t acc=0; // expected 16bit signed integer value -3687 acc=MULXY(x1,x2); I get -3687 in acc..... Any comments ? Have I overlooked something?
On 06/08/2010 12:07 PM, Frank wrote:
> >> You're on the right track. >> > > Worked on it and came up with these macros: > > #define TOQ15(x32) (((x32>>31)<<15)|(x32&0x7fff)) > #define MULXY(x32,y32) TOQ15(((x32*y32)>>15)) > > So when I execute this code: > > int16_t x1=-24576; // representing -0.75 in Q1.15 > int16_t x2=4915; // representing 0.15 in Q1.15 > int32_t acc=0; // expected 16bit signed integer value -3687 > > acc=MULXY(x1,x2); > > I get -3687 in acc..... > > Any comments ? Have I overlooked something?
Test the hell out of it. In particular test it for x1 = x2 = 0x8000 -- that's an amazing number in two's complement, and it can cause much grief. -- Tim Wescott Control system and signal processing consulting www.wescottdesign.com
"Frank" <Frank@invalidmail.com> writes:

>> You're on the right track. >> > > Worked on it and came up with these macros: > > #define TOQ15(x32) (((x32>>31)<<15)|(x32&0x7fff)) > #define MULXY(x32,y32) TOQ15(((x32*y32)>>15)) > > So when I execute this code: > > int16_t x1=-24576; // representing -0.75 in Q1.15 > int16_t x2=4915; // representing 0.15 in Q1.15 > int32_t acc=0; // expected 16bit signed integer value -3687 > > acc=MULXY(x1,x2); > > I get -3687 in acc..... > > Any comments ? Have I overlooked something?
Hi Frank, You're making a mess! First note that the compiler promotes the 16x16 bit multiply to a 32-bit result in MULXY. Then note that shifting a 32-bit value right 31 bits makes the LSB of the result either 0 or 1. Since these are signed values, the compiler sign-extends the shift. Shifting 0 or 1 back left 15 bits gives you 0 or -32768, respectively. OR'ing that with the x32&0x7FFF buys you nothing. So all the work is really done in the argument to TOQ15, x32*y32 >> 15, which is the right way to multiply two Q1.15 numbers and get a Q17.15 result. -- Randy Yates % "So now it's getting late, Digital Signal Labs % and those who hesitate mailto://yates@ieee.org % got no one..." http://www.digitalsignallabs.com % 'Waterfall', *Face The Music*, ELO
Tim Wescott <tim@seemywebsite.now> writes:
> [...] > Is this C? Do you have to stick to C? C really doesn't like > Q1.anything -- it's only native fixed-point data type is integer, and > it sticks to it like glue.
Er, all fixed-point processing uses plain integer operations. Fixed-point is all in how you interpret things. There is nothing inherently "non-fixed-point" in C. C is just fine for fixed-point processing. -- Randy Yates % "Maybe one day I'll feel her cold embrace, Digital Signal Labs % and kiss her interface, mailto://yates@ieee.org % til then, I'll leave her alone." http://www.digitalsignallabs.com % 'Yours Truly, 2095', *Time*, ELO
In article <4c0e124a$0$272$14726298@news.sunsite.dk>,
Frank <Frank@invalidmail.com> wrote:
>Hey guys, > >I was wondering if you could help me optimize the code below (if possible): > >R = (B*X); >tmp = (int16_t) (R>>15); >R = ((int32_t)tmp)+Z; >tmp = (int16_t)R; >R = (A*((int32_t)Y)); >tmp2 = -(int16_t) (R>>15); >Rout=tmp+tmp2;
The first thing is to get rid of the casts. They don't generate code, they just supress warnings. But you want to hear the warnings! The second is to inspect the assembly code generated. The code must be the same after getting rid of the casts. Furthermore the assembler code may be much easier to understand. Secondly, I can't believe that you supply this code without the declarations of R and tmp. But of course R is 32 bit and tmp is 16 bit. Make tmp 32 bit and we need not worry about a thing. Thirdly extract the formula: Scale = 1<<15 result = (B*X)/scale + (A*Y)/scale + Z So apparently we are talking about fixed point numbers. If we multiply them we need to scale them back, but this may require a double precision intermediate result. In Forth this would be \ Fixed point multiplication : *F $8000 */ ; \ And then the code looks like this: : calc A X *F B Y *F + Z + ;
>The above code looks too messy to me and I'm sure it can be optimized..Or >maybe I am wrong?
This code looks messy, so surely it is "optimized".
>Basically I just want to do some operations on Q1.15 numbers in a 32-bit >cell. > >Thank you in advance.
This triggers me to throw a rant that belongs in comp.lang.c int sf( int B, X ) { long R; int result; R = (B*X)/$8000; result = R; return; } This code has a defect and will generate the warning result = R conversion of int to a long without a cast, might not fit. This is stupid. If I store the result of a long in an int, that is what I want to do. Of course I have checked that it fits. Even more stupid, is a management that wants a clean compile, without warnings, such that a cast is added to this line. The poor programmer wastes his time, and the defect is not cured. The line that is wrong is R = (B*X)/0x8000; I have never seen the warning: Possible overflow on intermediate result. Correct code would be (long calculations throughout) R = B; R *= X; R /= 0x8000; Groetjes Albert -- -- Albert van der Horst, UTRECHT,THE NETHERLANDS Economic growth -- being exponential -- ultimately falters. albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst
glen herrmannsfeldt <gah@ugcs.caltech.edu> writes:

> Randy Yates <yates@ieee.org> wrote: > (snip) > >> Er, all fixed-point processing uses plain integer >> operations. Fixed-point is all in how you interpret things. > > Yes, but you need different product bits for multiply,
Then tell me glen what's the difference between the result of a fixed-point product and an integer product? -- Randy Yates % "My Shangri-la has gone away, fading like Digital Signal Labs % the Beatles on 'Hey Jude'" mailto://yates@ieee.org % http://www.digitalsignallabs.com % 'Shangri-La', *A New World Record*, ELO
Tim Wescott <tim@seemywebsite.now> wrote:
(snip, I wrote)

>> I believe that some Cray machines (not currently in production) >> are 64 bit word addressed. But even that doesn't mean that >> all the types need to be word sized.
>> There are C compilers for the 36 bit word addressed PDP-10 that >> use, I believe, 9 bits for char. It is a little tricky, but >> it can be done.
> It depends on how hard the compiler writer wants to work for it. In C, > a pointer has to fit in a long, and it has to be able to point to a > unique character. If you want to use a smaller element of storage than > what the machine points to natively, you have to kludge up your own > pointer representation, and use it _everywhere_, or have a non-compliant > compiler.
C allows for a different representation for different pointers, as long as the casts work right. One possibility is to use high order bits for the character in word. The basic PDP-10 addressing is 18 bits in the low half of the word, so it isn't so hard to do. -- glen
Randy Yates <yates@ieee.org> writes:
> [...]
I should also add that I use stdint.h extensively. It's the mostest wonderfullest thing that's happened to C in 3 decades. --Randy
> Tim Wescott <tim@seemywebsite.now> writes: > >> On 06/08/2010 06:58 PM, Randy Yates wrote: >>> Tim Wescott<tim@seemywebsite.now> writes: >>>> [...] >>>> Is this C? Do you have to stick to C? C really doesn't like >>>> Q1.anything -- it's only native fixed-point data type is integer,
and
>>>> it sticks to it like glue. >>> >>> Er, all fixed-point processing uses plain integer >>> operations. Fixed-point is all in how you interpret things. >>> >>> There is nothing inherently "non-fixed-point" in C. C is just
fine for
>>> fixed-point processing. >> >> In most processor instruction sets, multiplying two N-bit registers >> coughs up an 2*N bit answer. In most of the rest there's a "multiply >> and return high" and a "multiply and return low". In most
of the slim
>> remainder (getting fatter with all the fixed point DSP's out there) >> there is some fast way of getting one or the other part -- including >> many DSP chips that have a straight "multiply fractional and >> accumulate" > > I've worked on the TI C5x, C55x, C64x, ADI SHARC 21369, and TigerSHARC > TS201. I've not seen a "multiply fractional and accumulate" on
these.
> Can you give me an example? Yes, the old Moto 56k did an automatic > shift left by one. That's one out of how many? It's not the norm. > >> or "multiply with shift and accumulate". > > There is an option on several for automatically shifting > right by 1 after multiplying so the result is fractional. > Is that what you mean? > > If so, that's hardly a great reason for avoiding fixed-point > in C. > > I'm the last person in the world that would advocate C over > assembly. But in this case, it's usually justified. > >> C discards the high half, and returns the low. > > Are you really complaining this hard about having to do an explicit > typecast? (See my other post tonight for an example.) > >> For fractional arithmetic, you want the high half, shifted up one. > > Big whup. > >> C is just fine for _integer_ processing, but wastes a lot of steps for >> fixed point _fractional_ processing. > > Maybe one or two, and maybe not even that. It depends on how smart the > compiler is, e.g., can it utilize the auto-left-shift mode present in > some fixed-point processors. But in any case, I don't call that "a > lot." > > Also, there are many cases (most?) where you are not just doing one > multiply and then socking away the result, but rather a series of > multiplies. In those cases the extra shift is amortized over the > whole block of instructions and is hard worth noting. > > One thing that *is* rather expensive in fixed-point processing that I've > not heard mentioned is saturating. Yes, if you're implementing an > equalizer with lots of saturates, shifts, etc., at a high sample rate in > a tight inner loop on a small fixed-point DSP with power constraints, > yada yada yada, it's going to be worth going to assembly. That's hardly > the norm, though.
-- Randy Yates % "How's life on earth? Digital Signal Labs % ... What is it worth?" mailto://yates@ieee.org % 'Mission (A World Record)', http://www.digitalsignallabs.com % *A New World Record*, ELO
On 6/9/10 7:52 PM, Tim Wescott wrote:
> On 06/09/2010 04:46 PM, glen herrmannsfeldt wrote: >> Tim Wescott<tim@seemywebsite.now> wrote: >> (snip, someone wrote) >> >>>> Did you really mean a 64-bit machine uses 64 bits for short and int
and
>>>> long? That's not true for every 64-bit machine. Don't most C >>>> compilers >>>> for 64-bit machines have 16-bit shorts, 32-bit ints, and 64-bit
longs?
>> >>> I was generalizing. Probably most C compilers for 64-bit
byte-addressed
>>> machines will do it the way you said (plus 8-bit characters). But if >>> it's a 64-bit word-addressed machine, then everything including chars >>> will be 64-bit. >> >> I believe that some Cray machines (not currently in production) >> are 64 bit word addressed. But even that doesn't mean that >> all the types need to be word sized. >> >> There are C compilers for the 36 bit word addressed PDP-10 that >> use, I believe, 9 bits for char. It is a little tricky, but >> it can be done. > > It depends on how hard the compiler writer wants to work for it. In C, > a pointer has to fit in a long, and it has to be able to point to a > unique character. If you want to use a smaller element of storage than > what the machine points to natively, you have to kludge up your own > pointer representation, and use it _everywhere_, or have a non-compliant > compiler. >
Long ago I used a 24-bit word addressable machine. It had 8-bit chars that were packed into a word. I don't remember exactly how it worked, but you could address each char. Somehow a 24-bit address indicated which char. It was a pretty bizarre architecture. The Fortran compiler supported integer*3 and integer*6, but integer*6 wasn't twice the size of integer*3. Two words were used, but some bits weren't used, so the max size was less than 2^48. This was long ago. I might have gotten the details wrong. Ray
Randy Yates <yates@ieee.org> writes:

> There is an option on several for automatically shifting > right by 1 after multiplying so the result is fractional. > Is that what you mean?
Correction: left by 1. -- Randy Yates % "She tells me that she likes me very much, Digital Signal Labs % but when I try to touch, she makes it mailto://yates@ieee.org % all too clear." http://www.digitalsignallabs.com % 'Yours Truly, 2095', *Time*, ELO
Tim Wescott <tim@seemywebsite.now> writes:

> On 06/08/2010 06:58 PM, Randy Yates wrote: >> Tim Wescott<tim@seemywebsite.now> writes: >>> [...] >>> Is this C? Do you have to stick to C? C really doesn't like >>> Q1.anything -- it's only native fixed-point data type is integer, and >>> it sticks to it like glue. >> >> Er, all fixed-point processing uses plain integer >> operations. Fixed-point is all in how you interpret things. >> >> There is nothing inherently "non-fixed-point" in C. C is just
fine for
>> fixed-point processing. > > In most processor instruction sets, multiplying two N-bit registers > coughs up an 2*N bit answer. In most of the rest there's a "multiply > and return high" and a "multiply and return low". In most of
the slim
> remainder (getting fatter with all the fixed point DSP's out there) > there is some fast way of getting one or the other part -- including > many DSP chips that have a straight "multiply fractional and > accumulate"
I've worked on the TI C5x, C55x, C64x, ADI SHARC 21369, and TigerSHARC TS201. I've not seen a "multiply fractional and accumulate" on these. Can you give me an example? Yes, the old Moto 56k did an automatic shift left by one. That's one out of how many? It's not the norm.
> or "multiply with shift and accumulate".
There is an option on several for automatically shifting right by 1 after multiplying so the result is fractional. Is that what you mean? If so, that's hardly a great reason for avoiding fixed-point in C. I'm the last person in the world that would advocate C over assembly. But in this case, it's usually justified.
> C discards the high half, and returns the low.
Are you really complaining this hard about having to do an explicit typecast? (See my other post tonight for an example.)
> For fractional arithmetic, you want the high half, shifted up one.
Big whup.
> C is just fine for _integer_ processing, but wastes a lot of steps for > fixed point _fractional_ processing.
Maybe one or two, and maybe not even that. It depends on how smart the compiler is, e.g., can it utilize the auto-left-shift mode present in some fixed-point processors. But in any case, I don't call that "a lot." Also, there are many cases (most?) where you are not just doing one multiply and then socking away the result, but rather a series of multiplies. In those cases the extra shift is amortized over the whole block of instructions and is hard worth noting. One thing that *is* rather expensive in fixed-point processing that I've not heard mentioned is saturating. Yes, if you're implementing an equalizer with lots of saturates, shifts, etc., at a high sample rate in a tight inner loop on a small fixed-point DSP with power constraints, yada yada yada, it's going to be worth going to assembly. That's hardly the norm, though. -- Randy Yates % "How's life on earth? Digital Signal Labs % ... What is it worth?" mailto://yates@ieee.org % 'Mission (A World Record)', http://www.digitalsignallabs.com % *A New World Record*, ELO
On 06/09/2010 04:46 PM, glen herrmannsfeldt wrote:
> Tim Wescott<tim@seemywebsite.now> wrote: > (snip, someone wrote) > >>> Did you really mean a 64-bit machine uses 64 bits for short and int
and
>>> long? That's not true for every 64-bit machine. Don't most C
compilers
>>> for 64-bit machines have 16-bit shorts, 32-bit ints, and 64-bit longs? > >> I was generalizing. Probably most C compilers for 64-bit byte-addressed >> machines will do it the way you said (plus 8-bit characters). But if >> it's a 64-bit word-addressed machine, then everything including chars >> will be 64-bit. > > I believe that some Cray machines (not currently in production) > are 64 bit word addressed. But even that doesn't mean that > all the types need to be word sized. > > There are C compilers for the 36 bit word addressed PDP-10 that > use, I believe, 9 bits for char. It is a little tricky, but > it can be done.
It depends on how hard the compiler writer wants to work for it. In C, a pointer has to fit in a long, and it has to be able to point to a unique character. If you want to use a smaller element of storage than what the machine points to natively, you have to kludge up your own pointer representation, and use it _everywhere_, or have a non-compliant compiler. -- Tim Wescott Control system and signal processing consulting www.wescottdesign.com
Tim Wescott <tim@seemywebsite.now> wrote:
(snip, someone wrote)

>> Did you really mean a 64-bit machine uses 64 bits for short and int and >> long? That's not true for every 64-bit machine. Don't most C compilers >> for 64-bit machines have 16-bit shorts, 32-bit ints, and 64-bit longs?
> I was generalizing. Probably most C compilers for 64-bit byte-addressed > machines will do it the way you said (plus 8-bit characters). But if > it's a 64-bit word-addressed machine, then everything including chars > will be 64-bit.
I believe that some Cray machines (not currently in production) are 64 bit word addressed. But even that doesn't mean that all the types need to be word sized. There are C compilers for the 36 bit word addressed PDP-10 that use, I believe, 9 bits for char. It is a little tricky, but it can be done. -- glen
On 06/09/2010 03:22 PM, Raymond Toy wrote:
> On 6/9/10 10:33 AM, Tim Wescott wrote: >> >> So generally on an 8- or 16-bit machine the tools will set short and int >> to 16 bits, and long to 32. On a 64-bit machine, everything will be 64 >> bits. > > Did you really mean a 64-bit machine uses 64 bits for short and int and > long? That's not true for every 64-bit machine. Don't most C compilers > for 64-bit machines have 16-bit shorts, 32-bit ints, and 64-bit longs?
I was generalizing. Probably most C compilers for 64-bit byte-addressed machines will do it the way you said (plus 8-bit characters). But if it's a 64-bit word-addressed machine, then everything including chars will be 64-bit. -- Tim Wescott Control system and signal processing consulting www.wescottdesign.com