Reply by robert bristow-johnson●April 14, 20152015-04-14
On 4/13/15 10:32 PM, robert bristow-johnson wrote:
>
> so this is sorta a specific question (and better suited to comp.dsp than
> dsp.stackexchange.com):
>
> there is this O(log2(N)) complexity max (or min) algorithm perhaps
> originally from:
>
> Brookes: "Algorithms for Max and Min Filters with Improved Worst-Case
> Performance" IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS�II: ANALOG AND
> DIGITAL SIGNAL PROCESSING, VOL. 47, NO. 9, SEPTEMBER 2000
>
> i have a very short C program for it (actually it's a min algorithm, but
> it's the same thing) at the bottom. there is an exclusive-or operation
> done on the index value which is necessary instead of incrementing. in
> the old Mot 56K i was able to do this modulo2 address mode at every pair
> of samples in a buffer that started at an even address. the 56K *infers*
> its base address from the index address, but the SHARC makes you *load*
> a base address (which would be the same as the index but with the LSB
> set to zero which is another bunch of operations), so it's not as
> efficient.
just to spark your memory: the SHARC uses 4 registers (index I, modifier
M, length L, base B) to do circular addressing and it's more "general",
you can place any size circular buffer and base it at any address:
B <= I < B + L
whereas the 56K uses 3 registers (index R, modifier N, length M+1) to do
circular addressing. the base or bottom of the circular buffer is
inferred from this formula:
base = 2^(ceil(log2(M+1)) * floor( R / 2^(ceil(log2(M+1)) )
so the base is at an integer multiple of a power of 2 that is at least
as large as the length of the buffer, that's how the 56K infers that
information from two of the existing three registers. i could use this
to advantage to get to the "sibling" value in that maxtree algorithm
that my C code depicts by setting M+1 to 2 and either incrementing or
decrementing (didn't matter). so if one value was at an odd address,
the sibling was at an even address where the upper bits were all
identical. in the C code we do that with the XOR operation on the LSB.
>
> so, it appears that an assembly language version using the DAG registers
> judiciously won't be any better than the C program below. but in that C
> program, there will be pipelining overhead transferring an ALU register,
> where the index manipulation is done, to a DAG register where it will be
> used in table lookup.
>
> Q1: anyone else use this algorithm?
>
> Q2: have you any insight in how to efficiently pipeline this in SHARC
> assembly? not that i want specific code (but it wouldn't hurt). i know
> about the Fn = MAX(Fx, Fy) instruction. it's more about the index
> manipulation. it's like there will be 9 or 10 instructions per loop and
> even if that is multiplying ceil(log2(N)) instead of N, if N is like 16,
i meant of log2(N) = 16 or N might be as big as 64K. so there could
still be a good 150 instructions to do this...
> it gets a lot more expensive than, say a decaying envelope which is a
> max() and a multiply.
like a capacitor-filtered rectifier. simple and cheap.
>
> sorry, my SHARC chops have long been atrophied.
but my lamb chops are right on the money. much easier chopping up a
lamb than a shark. lambs don't put up much of a fuss.
still hoping that someone has a wonderful insight that i do not.
--
r b-j rbj@audioimagination.com
"Imagination is more important than knowledge."
Reply by robert bristow-johnson●April 13, 20152015-04-13
so this is sorta a specific question (and better suited to comp.dsp than
dsp.stackexchange.com):
there is this O(log2(N)) complexity max (or min) algorithm perhaps
originally from:
Brookes: "Algorithms for Max and Min Filters with Improved Worst-Case
Performance" IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS�II: ANALOG AND
DIGITAL SIGNAL PROCESSING, VOL. 47, NO. 9, SEPTEMBER 2000
i have a very short C program for it (actually it's a min algorithm, but
it's the same thing) at the bottom. there is an exclusive-or operation
done on the index value which is necessary instead of incrementing. in
the old Mot 56K i was able to do this modulo2 address mode at every pair
of samples in a buffer that started at an even address. the 56K
*infers* its base address from the index address, but the SHARC makes
you *load* a base address (which would be the same as the index but with
the LSB set to zero which is another bunch of operations), so it's not
as efficient.
so, it appears that an assembly language version using the DAG registers
judiciously won't be any better than the C program below. but in that C
program, there will be pipelining overhead transferring an ALU register,
where the index manipulation is done, to a DAG register where it will be
used in table lookup.
Q1: anyone else use this algorithm?
Q2: have you any insight in how to efficiently pipeline this in SHARC
assembly? not that i want specific code (but it wouldn't hurt). i know
about the Fn = MAX(Fx, Fy) instruction. it's more about the index
manipulation. it's like there will be 9 or 10 instructions per loop and
even if that is multiplying ceil(log2(N)) instead of N, if N is like 16,
it gets a lot more expensive than, say a decaying envelope which is a
max() and a multiply.
sorry, by SHARC chops have long been atrophied.
--
r b-j rbj@audioimagination.com
"Imagination is more important than knowledge."
#define A_REALLY_LARGE_NUMBER 1.0e20
typedef struct
{
unsigned long filter_length; // array_size/2 <
filter_length <= array_size
unsigned long array_size; // must be power of 2 for
this simple implementation
unsigned long input_index; // the actual sample
placement is at (array_size + input_index);
float* big_array_base; // the big array is malloc()
separately and is actually twice array_size;
} search_tree_array_data;
void initSearchArray(unsigned long filter_length,
search_tree_array_data* array_data)
{
array_data->filter_length = filter_length;
array_data->array_size = 1;
filter_length--;
while (filter_length > 0)
{
array_data->array_size <<= 1;
filter_length >>= 1;
}
// array_size is a power of
2 such that
// filter_length <=
array_size < 2*filter_length
array_data->input_index = 0;
array_data->big_array_base =
(float*)malloc(sizeof(float)*2*array_data->array_size); // dunno
what to do if malloc() fails.
for (unsigned long n=0; n<2*array_data->array_size; n++)
{
array_data->big_array_base[n] = A_REALLY_LARGE_NUMBER;
// init array.
}
// array_base[0] is never used.
}
/*
* findMinSample(new_sample, &array_data) will place new_sample into
the circular
* buffer in the latter half of the array pointed to by
array_data.big_array_base .
* it will then compare the value in new_sample to its "sibling"
value, takes the
* minimum of the two and then pops up one generation to the parent
node where
* this parent also has a sibling and repeats the process. since the
other parent
* nodes already have the min value of the two child nodes, when
getting to the
* top-level parent node, this node will have the minimum value of
all the samples
* in the big_array. the number of iterations of this loop is
ceil(log2(filter_length)).
*/
float findMinSample(float value, search_tree_array_data* array_data)
{
register float* big_array = array_data->big_array_base;
register unsigned long index = array_data->array_size +
array_data->input_index; // our main buffer is in the latter half
of the big array.
while (index > 1UL)
{
big_array[index] = value;
register float sibling_value = big_array[index ^ 1UL];
// the upper bits of the sibling address are the same.
if (value > sibling_value)
{
value = sibling_value; // use
minimum of the two values
}
index >>= 1; // parent
address is index/2 (drop remainder or "sibling bit")
}
array_data->input_index++;
if (array_data->input_index >= array_data->filter_length)
{
array_data->input_index = 0;
}
return value;
}
float getInput(void);
void putOutput(float);
int main(int argc, const char* argv[])
{
search_tree_array_data array_data;
unsigned long filter_length = 2048; //
filter_length need not be a power of 2
initSearchArray(filter_length, &array_data);
while (1) // loop forever
{
float input_sample = getInput();
float output_sample = findMinSample(input_sample, &array_data);
putOutput(output_sample);
}
return 0;
}