Hey guys.

So far, I've posted on a forum and a mailing list with no results, and only=
 just realized it's probably best to ask this question on a DSP-specialized=
 group.

(Sorry that my first post is a question >_<)

I'm developing a high-fidelity, high-compression format [64:17 compression]=
 that uses linear prediction to smooth the waves and 4bit error correction =
[with a scalar per frame] to keep the sound closer to the original. It is m=
ade specifically for internet-friendly transmission of sound banks (think D=
LS files, but VERY much simplified).

The idea is to use up to 16 linear prediction models that best describe the=
 original sound data

What I'm doing at the moment is:
  -Break data up into sample frames [32 samples per frame]
  -Compute linear prediction coefficients for each frame and save
  -After I have all coefficients, merge all sets together into 16 sets by i=
terating through all the original coefficients and finding the most closely=
-related one in the target [16] ones.

Pseudo-code:

srcCoef    =3D Source coefficients
dstCoef    =3D Target coefficients (assumed to be 0 initialized)
numSrcCoef =3D Number of source coefficients (variable, depends on file)
numDstCoef =3D Number of destination coefficients (16)

for(i =3D 0 ; i < numSrcCoef ; i++) {
  bestPos      =3D 0;
  bestRelation =3D 0;
  for(j =3D 0; j < numDstCoef ; j++) {
    // find relation
    // dot(x,y) returns the normalized dot product
    // I'm using abs instead of acos because they both
    // decay and have no practical difference in calculating
    // the highest value
    relation =3D abs(dot(srcCoef[i], dstCoef[j]));

    // find best relation
    if(relation > bestRelation) bestPos =3D j, bestRelation =3D relation;
  }

  // average coefficients
  dstCoef[bestPos] +=3D srcCoef[i];
  dstCoef[bestPos] /=3D 2;
}

I'm not entirely sure I'm doing the last step correctly.

What would you guys do to optimize the hundreds [sometimes a couple of thou=
sand] coefficient pairs into a maximum of 16?