so this is sorta a specific question (and better suited to comp.dsp than dsp.stackexchange.com): there is this O(log2(N)) complexity max (or min) algorithm perhaps originally from: Brookes: "Algorithms for Max and Min Filters with Improved Worst-Case Performance" IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS�II: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 47, NO. 9, SEPTEMBER 2000 i have a very short C program for it (actually it's a min algorithm, but it's the same thing) at the bottom. there is an exclusive-or operation done on the index value which is necessary instead of incrementing. in the old Mot 56K i was able to do this modulo2 address mode at every pair of samples in a buffer that started at an even address. the 56K *infers* its base address from the index address, but the SHARC makes you *load* a base address (which would be the same as the index but with the LSB set to zero which is another bunch of operations), so it's not as efficient. so, it appears that an assembly language version using the DAG registers judiciously won't be any better than the C program below. but in that C program, there will be pipelining overhead transferring an ALU register, where the index manipulation is done, to a DAG register where it will be used in table lookup. Q1: anyone else use this algorithm? Q2: have you any insight in how to efficiently pipeline this in SHARC assembly? not that i want specific code (but it wouldn't hurt). i know about the Fn = MAX(Fx, Fy) instruction. it's more about the index manipulation. it's like there will be 9 or 10 instructions per loop and even if that is multiplying ceil(log2(N)) instead of N, if N is like 16, it gets a lot more expensive than, say a decaying envelope which is a max() and a multiply. sorry, by SHARC chops have long been atrophied. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge." #define A_REALLY_LARGE_NUMBER 1.0e20 typedef struct { unsigned long filter_length; // array_size/2 < filter_length <= array_size unsigned long array_size; // must be power of 2 for this simple implementation unsigned long input_index; // the actual sample placement is at (array_size + input_index); float* big_array_base; // the big array is malloc() separately and is actually twice array_size; } search_tree_array_data; void initSearchArray(unsigned long filter_length, search_tree_array_data* array_data) { array_data->filter_length = filter_length; array_data->array_size = 1; filter_length--; while (filter_length > 0) { array_data->array_size <<= 1; filter_length >>= 1; } // array_size is a power of 2 such that // filter_length <= array_size < 2*filter_length array_data->input_index = 0; array_data->big_array_base = (float*)malloc(sizeof(float)*2*array_data->array_size); // dunno what to do if malloc() fails. for (unsigned long n=0; n<2*array_data->array_size; n++) { array_data->big_array_base[n] = A_REALLY_LARGE_NUMBER; // init array. } // array_base[0] is never used. } /* * findMinSample(new_sample, &array_data) will place new_sample into the circular * buffer in the latter half of the array pointed to by array_data.big_array_base . * it will then compare the value in new_sample to its "sibling" value, takes the * minimum of the two and then pops up one generation to the parent node where * this parent also has a sibling and repeats the process. since the other parent * nodes already have the min value of the two child nodes, when getting to the * top-level parent node, this node will have the minimum value of all the samples * in the big_array. the number of iterations of this loop is ceil(log2(filter_length)). */ float findMinSample(float value, search_tree_array_data* array_data) { register float* big_array = array_data->big_array_base; register unsigned long index = array_data->array_size + array_data->input_index; // our main buffer is in the latter half of the big array. while (index > 1UL) { big_array[index] = value; register float sibling_value = big_array[index ^ 1UL]; // the upper bits of the sibling address are the same. if (value > sibling_value) { value = sibling_value; // use minimum of the two values } index >>= 1; // parent address is index/2 (drop remainder or "sibling bit") } array_data->input_index++; if (array_data->input_index >= array_data->filter_length) { array_data->input_index = 0; } return value; } float getInput(void); void putOutput(float); int main(int argc, const char* argv[]) { search_tree_array_data array_data; unsigned long filter_length = 2048; // filter_length need not be a power of 2 initSearchArray(filter_length, &array_data); while (1) // loop forever { float input_sample = getInput(); float output_sample = findMinSample(input_sample, &array_data); putOutput(output_sample); } return 0; }
fast sliding max algorithm in SHARC assembly.
Started by ●April 13, 2015
Reply by ●April 14, 20152015-04-14
On 4/13/15 10:32 PM, robert bristow-johnson wrote:> > so this is sorta a specific question (and better suited to comp.dsp than > dsp.stackexchange.com): > > there is this O(log2(N)) complexity max (or min) algorithm perhaps > originally from: > > Brookes: "Algorithms for Max and Min Filters with Improved Worst-Case > Performance" IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS�II: ANALOG AND > DIGITAL SIGNAL PROCESSING, VOL. 47, NO. 9, SEPTEMBER 2000 > > i have a very short C program for it (actually it's a min algorithm, but > it's the same thing) at the bottom. there is an exclusive-or operation > done on the index value which is necessary instead of incrementing. in > the old Mot 56K i was able to do this modulo2 address mode at every pair > of samples in a buffer that started at an even address. the 56K *infers* > its base address from the index address, but the SHARC makes you *load* > a base address (which would be the same as the index but with the LSB > set to zero which is another bunch of operations), so it's not as > efficient.just to spark your memory: the SHARC uses 4 registers (index I, modifier M, length L, base B) to do circular addressing and it's more "general", you can place any size circular buffer and base it at any address: B <= I < B + L whereas the 56K uses 3 registers (index R, modifier N, length M+1) to do circular addressing. the base or bottom of the circular buffer is inferred from this formula: base = 2^(ceil(log2(M+1)) * floor( R / 2^(ceil(log2(M+1)) ) so the base is at an integer multiple of a power of 2 that is at least as large as the length of the buffer, that's how the 56K infers that information from two of the existing three registers. i could use this to advantage to get to the "sibling" value in that maxtree algorithm that my C code depicts by setting M+1 to 2 and either incrementing or decrementing (didn't matter). so if one value was at an odd address, the sibling was at an even address where the upper bits were all identical. in the C code we do that with the XOR operation on the LSB.> > so, it appears that an assembly language version using the DAG registers > judiciously won't be any better than the C program below. but in that C > program, there will be pipelining overhead transferring an ALU register, > where the index manipulation is done, to a DAG register where it will be > used in table lookup. > > Q1: anyone else use this algorithm? > > Q2: have you any insight in how to efficiently pipeline this in SHARC > assembly? not that i want specific code (but it wouldn't hurt). i know > about the Fn = MAX(Fx, Fy) instruction. it's more about the index > manipulation. it's like there will be 9 or 10 instructions per loop and > even if that is multiplying ceil(log2(N)) instead of N, if N is like 16,i meant of log2(N) = 16 or N might be as big as 64K. so there could still be a good 150 instructions to do this...> it gets a lot more expensive than, say a decaying envelope which is a > max() and a multiply.like a capacitor-filtered rectifier. simple and cheap.> > sorry, my SHARC chops have long been atrophied.but my lamb chops are right on the money. much easier chopping up a lamb than a shark. lambs don't put up much of a fuss. still hoping that someone has a wonderful insight that i do not. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."