DSPRelated.com
Forums

LUT and interpolation

Started by kl31n May 21, 2007
It's some time since I happened to be forced to use a LUT and interpolation 
to implement fuctions that the architecture to be targeted didn't offer or 
did actually offer but with poor performances. So, out of curiosity, I asked 
myself what is the most clever way I could come up with to implement, for 
example, the log function in the compact domain [1,2] for fixed point 
algebra. So I took Matlab, I calculated the values for a 16 bands LUT, and 
then I wrote the C code to implement the logarithm(actually I didn't like 
the disassembly so much so in the end I wrote it directly in x86 assember 
myself). Being it a game after all, I don't have special requests, but if 
someone has some insight on how to make it even faster(whitout a substantial 
increment in memory usage of course) or to keep it as fast as that but with 
a lower memory footprint, I'd be glad to hear your comments. One thing I'm 
especially interested in is whether there's a way to suppress the 
conditional jump in a way different or more clever than how I did it.

Thanks,

kl31n

//---CODE

// The function calculates the natural logarithm of a 16 bit fixed point 
number
// in 1QN format over the compact domain [1,2].

unsigned short log_fixed(unsigned short & x) {

 // The repetition at the end of the LUT is needed to avoid conditional 
jumps

static unsigned short LUT_Y[18] = {     0, 3973, 7719,11262,
      14624,17821,20870,23783,
      26573,29248,31818,34292,
      36675,38975,41196,43345,
      45426,45426};
 static unsigned short LUT_X[17] = {  16384,17408,18432,19456,
      20480,21504,22528,23552,
      24576,25600,26624,27648,
      28672,29696,30720,31744,
      32768};

 unsigned int accumulator;

 /*
  unsigned char index;
  // The second operand of the or is there to avoid using the conditional 
jump
  index = (unsigned char)(((x & 0x3FFF) >> 0xA) | ((x & 0x8000) >> 0xB));
  accumulator = (((LUT_Y[index+1] - LUT_Y[index]) * (x - LUT_X[index])) >> 
10) + LUT_Y[index];
 */

 _asm{
  mov   ebx,dword ptr [x]
  mov   ax,word ptr [ebx]
  and   eax,00003FFFh
  sar   eax,0Ah
  mov   dx,word ptr [ebx]
  and   edx,00008000h
  sar   edx,0Bh
  or    eax,edx
  xor   ecx, ecx
  mov   cx,word ptr LUT_Y+2 [eax*2]
  mov   dx,word ptr LUT_Y [eax*2]
  sub   cx,dx
  mov   dx,word ptr [ebx]
  mov   bx,word ptr LUT_X [eax*2]
  sub   dx,bx
  imul  ecx,dx
  sar   ecx,0Ah
  mov   dx,word ptr LUT_Y [eax*2]
  add   ecx,edx
  mov   dword ptr [accumulator],ecx
 }

 return((unsigned short) accumulator);
}