It's some time since I happened to be forced to use a LUT and interpolation to implement fuctions that the architecture to be targeted didn't offer or did actually offer but with poor performances. So, out of curiosity, I asked myself what is the most clever way I could come up with to implement, for example, the log function in the compact domain [1,2] for fixed point algebra. So I took Matlab, I calculated the values for a 16 bands LUT, and then I wrote the C code to implement the logarithm(actually I didn't like the disassembly so much so in the end I wrote it directly in x86 assember myself). Being it a game after all, I don't have special requests, but if someone has some insight on how to make it even faster(whitout a substantial increment in memory usage of course) or to keep it as fast as that but with a lower memory footprint, I'd be glad to hear your comments. One thing I'm especially interested in is whether there's a way to suppress the conditional jump in a way different or more clever than how I did it. Thanks, kl31n //---CODE // The function calculates the natural logarithm of a 16 bit fixed point number // in 1QN format over the compact domain [1,2]. unsigned short log_fixed(unsigned short & x) { // The repetition at the end of the LUT is needed to avoid conditional jumps static unsigned short LUT_Y[18] = { 0, 3973, 7719,11262, 14624,17821,20870,23783, 26573,29248,31818,34292, 36675,38975,41196,43345, 45426,45426}; static unsigned short LUT_X[17] = { 16384,17408,18432,19456, 20480,21504,22528,23552, 24576,25600,26624,27648, 28672,29696,30720,31744, 32768}; unsigned int accumulator; /* unsigned char index; // The second operand of the or is there to avoid using the conditional jump index = (unsigned char)(((x & 0x3FFF) >> 0xA) | ((x & 0x8000) >> 0xB)); accumulator = (((LUT_Y[index+1] - LUT_Y[index]) * (x - LUT_X[index])) >> 10) + LUT_Y[index]; */ _asm{ mov ebx,dword ptr [x] mov ax,word ptr [ebx] and eax,00003FFFh sar eax,0Ah mov dx,word ptr [ebx] and edx,00008000h sar edx,0Bh or eax,edx xor ecx, ecx mov cx,word ptr LUT_Y+2 [eax*2] mov dx,word ptr LUT_Y [eax*2] sub cx,dx mov dx,word ptr [ebx] mov bx,word ptr LUT_X [eax*2] sub dx,bx imul ecx,dx sar ecx,0Ah mov dx,word ptr LUT_Y [eax*2] add ecx,edx mov dword ptr [accumulator],ecx } return((unsigned short) accumulator); }
LUT and interpolation
Started by ●May 21, 2007