On Monday, April 27, 2015 at 5:19:52 PM UTC-7, robert bristow-johnson wrote:
> ...
> but i just cannot figure out how to do a sliding median without first 
> sorting the buffer (or at least sorting it half way) and then picking 
> the value halfway into the buffer (or midway between the bottom and the 
> top of the sorted buffer).
> ...

> r b-j                  

What I've used was known as a 'dual heap algorithm'. Some times it now appears as 'streaming median'.

In:
http://stackoverflow.com/questions/10930732/c-efficiently-calculating-a-running-median
look under the algorithm called 'streaming median'
Note: the desired order is selected by how the heaps are initialized by filling with min-representable and max-repesentable values.

Also:
transactions on signal processing 1991 vol39 n01 jan page 0204
Comparison of algorithms for standard median filtering 
Juhola, M. Katajainen, J. Raita, T. 
Dept. of Comput. Sci., Turku Univ., Finland 
Abstract:
Standard median filtering that is searched repeatedly for a median from a sample set which changes only slightly between the subsequent searches is discussed. Several well-known methods for solving this running median problem are reviewed, the (asymptotical) time complexities of the methods are analyzed, and simple variants are proposed which are especially suited for small sample sets, a frequent situation. Although the discussion is restricted to the one-dimensional case, the ideas are easily extended to higher dimensions. 

Dale B. Dalrymple

robert bristow-johnson <rbj@audioimagination.com> wrote:

> i also posted this to the programmers stack exchange.  but comp.dsp 
> peeps seem to know alot about stuff that's broader than DSP.

> so i know how to do a fast O(log2(N)) sliding max or sliding min algorithm.

The other posts probably answer your question, but for the non-sliding
case there is an O(n) median algorithm in Knuth, The Art of Computer
Programming", I believe volume 3.

-- glen

On 4/28/15 5:43 PM, Marcel Mueller wrote:
> On 28.04.15 20.06, martinvicanek wrote:
>> For a sliding window median the fastest algorithms use histograms, as in
>> Hunag et al 1979
>> https://www.freelists.org/archives/ingsw3/11-2013/pdfNNRwXsvXQN.pdf.
>> In 1D
>> Hunag's scheme is O(1), as updating the histogram is simpler than adding
>> data to an ordered list.
>
> Well, this only works if the domain of the samples is strictly limited.
> No big deal with 8 or 10 bit image data, no longer that efficient for 16
> bit data and completely garbage for 32 bit or floating point data. So I
> did not recommend it, since the OP did not specify the range of values.

at the time i wasn't worrying about the range of values.  until today, i 
really didn't imagine that it was salient to getting the median (i.e. i 
thought the "compare" instruction worked independent of any implied 
range).  still trying to grok this range and precision thingie.

the only issue i was worrying about is pretty basic:

   new sample, x[n], comes in,
   old sample, x[n-N], is pushed off the edge,
   N-1 of the N samples under consideration are the same as before,

what is the most efficient way (fewest instructions worst case) to 
recalculate the median?


-- 

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."

On 28.04.15 20.06, martinvicanek wrote:
> For a sliding window median the fastest algorithms use histograms, as in
> Hunag et al 1979
> https://www.freelists.org/archives/ingsw3/11-2013/pdfNNRwXsvXQN.pdf. In 1D
> Hunag's scheme is O(1), as updating the histogram is simpler than adding
> data to an ordered list.

Well, this only works if the domain of the samples is strictly limited. 
No big deal with 8 or 10 bit image data, no longer that efficient for 16 
bit data and completely garbage for 32 bit or floating point data. So I 
did not recommend it, since the OP did not specify the range of values.

Marcel

On 28.04.15 21.19, robert bristow-johnson wrote:
>> Furthermore, you do not need two buffers. A single sorted one is
>> sufficient.
>
> then how do you know which sample in the sorted buffer is the one that
> falls offa the edge when the window is bumped over by one sample location?

No problem. See below.

> it seems to me to be more logical to maintain a list that is
> chronologically ordered (like a FIFO) and then link the values in that
> list to their neighbors in the hypothetical sorted list.

Of course, you need this kind of list. But there is no need to store 
them in an array since all you need is forward iteration and append to 
the end. So a linked list is perfectly sufficient.

>> All you need to do for this purpose is to add a pointer to
>> each element that links to the next sample respectively, building a
>> linked list of samples.
>
> so the samples are sorted by value, but they are linked (with a
> bidirectional linked list, i think) to their chronological neighbors?

A single pointer to the successor is an unidirectional linked list.
You never need to know the predecessor.

>> Now all you need to do to remove a sample is to
>> take the pointer from the last removed element. Insert operations are
>> naturally ordered by sample value. But they need to update the link
>> pointer of the last sample.
>
> now, if we don't close up the hole created by removing the old sample
> and we just plop the new sample, without inserting, onto the end of the
> buffer, what keeps the buffer from just expanding forever?

The buffer is simply a pool of elements to be allocated from. Nothing 
else. The position in the buffer has no meaning at all.

To iterate to sample n+1 you first determine the oldest sample to 
remove, perform the required unlink operations on the tree structure. 
Then you have a tree item in your buffer that could be discarded. 
However, you will immediately need a new item to insert the next sample 
into the tree, so just take the free slot and reuse it for the insert 
operation. I.e. fill the sample, setup the link pointer in the lastly 
inserted item to the current slot, and finally insert the slot into the 
tree structure.
So the buffer is just an array with N tree nodes. It will never grow nor 
shrink.

If you have C++ a map<sample_value, element_type> with a custom 
allocator that allocates from your array of map<...>::value_type should 
do the job. element_type is just map<...>::iterator, i.e. the link to 
the next sample. I am just not sure whether C++ accepts this recursive 
type definition.

I am pretty sure that there are even faster algorithms possible. In fact 
you do not need a sorted structure of all sample values in the window 
all the time. It is likely that extreme sample values will never 
contribute to the median. So in fact you only need the set of samples 
greater than the median and the set of samples less than the median. 
However, when the median moves the set has to be partially sorted to 
determine the minimum or maximum value depending on the move direction. 
This could be cheaper than sorting the entire array all the time, at 
least amortized.

Another idea might be to use a fully balanced binary tree, e.g. an AVL 
tree. In this case the root node of the tree is always just the median 
and there are no further actions required to update the median pointer. 
The expense is that you need to do more node rotations than with a 
partially balanced tree.

Marcel

On 4/28/15 2:06 PM, martinvicanek wrote:
> For a sliding window median the fastest algorithms use histograms, as in
> Hunag et al 1979
> https://www.freelists.org/archives/ingsw3/11-2013/pdfNNRwXsvXQN.pdf. In 1D
> Hunag's scheme is O(1), as updating the histogram is simpler than adding
> data to an ordered list. More sophisticated schemes exist for 2D which are
> also O(1), e.g. https://nomis80.org/ctmf.pdf. For 8 bit resolution, say,
> you can use a histogram straight away, while for higher bit depths it is
> more efficient to maintain one coarse and one or more finer histograms.

downloaded both .pdf files.  just started reading.  never thought of 
speed (paid for with approximation) based on lowered resolution before.

-- 

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."

On 4/28/15 3:01 AM, Marcel Mueller wrote:
>
> Furthermore, you do not need two buffers. A single sorted one is
> sufficient.

then how do you know which sample in the sorted buffer is the one that 
falls offa the edge when the window is bumped over by one sample location?

it seems to me to be more logical to maintain a list that is 
chronologically ordered (like a FIFO) and then link the values in that 
list to their neighbors in the hypothetical sorted list.  but i don't 
see how to do an efficient O(log2(N)) binary search to find the insert 
or remove points without a parallel sorted array.

> All you need to do for this purpose is to add a pointer to
> each element that links to the next sample respectively, building a
> linked list of samples.

so the samples are sorted by value, but they are linked (with a 
bidirectional linked list, i think) to their chronological neighbors?

> Now all you need to do to remove a sample is to
> take the pointer from the last removed element. Insert operations are
> naturally ordered by sample value. But they need to update the link
> pointer of the last sample.

now, if we don't close up the hole created by removing the old sample 
and we just plop the new sample, without inserting, onto the end of the 
buffer, what keeps the buffer from just expanding forever?

hmmmm, i can kinda see that this buffer would never expand larger than 
twice the original size.  but i still haven't completely grok'd this.

-- 

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."

For a sliding window median the fastest algorithms use histograms, as in
Hunag et al 1979
https://www.freelists.org/archives/ingsw3/11-2013/pdfNNRwXsvXQN.pdf. In 1D
Hunag's scheme is O(1), as updating the histogram is simpler than adding
data to an ordered list. More sophisticated schemes exist for 2D which are
also O(1), e.g. https://nomis80.org/ctmf.pdf. For 8 bit resolution, say,
you can use a histogram straight away, while for higher bit depths it is
more efficient to maintain one coarse and one or more finer histograms.
---------------------------------------
Posted through http://www.DSPRelated.com

On 28.04.15 02.19, robert bristow-johnson wrote:
> but i just cannot figure out how to do a sliding median without first
> sorting the buffer (or at least sorting it half way) and then picking
> the value halfway into the buffer (or midway between the bottom and the
> top of the sorted buffer).
>
> given a finite-length buffer (length N) that is chronologically sorted
> (like a FIFO) and another with the same N values sorted by value, then
> when the buffer window "slides" by one and the oldest value falls offa
> the edge and a new one is input, to insert that new value into the
> sorted buffer, that is an O(N) operation (probabilistically).
>
> what's a faster way to do this?

You will never succeed with a sorted array as buffer, because the insert 
and delete operations are too expensive. You will need a tree structure, 
preferably a Red-Black-Tree.

Furthermore, you do not need two buffers. A single sorted one is 
sufficient. All you need to do for this purpose is to add a pointer to 
each element that links to the next sample respectively, building a 
linked list of samples. Now all you need to do to remove a sample is to 
take the pointer from the last removed element. Insert operations are 
naturally ordered by sample value. But they need to update the link 
pointer of the last sample. So you need to keep a pointer to the 
previously added sample as well. The operation is O(M log N), with M the 
number of processed samples and N the window size. Due to the nature of 
RB-Trees worst case complexity might be worse because of balancing, but 
I am not that familiar with RB-Trees.

Of course, the nodes of your RB-Tree might be stored in an array of size 
N. You need only to track the slot that was recently removed for reuse 
by the new sample. So all you need is the array and a few pointers: 
'head' (first sample), 'tail' (last sample for link updates), tree 
'root' and 'median' to speed up median lookup. The median pointer has to 
be moved left or right by one when the added and removed sample are on 
different sides of the median. For this purpose I recommend to use 
stable sorting, because then you know that an insert with the same value 
than the median is always right to the current median and a removed one 
with the same value is left.

Maybe there are other even more efficient algorithms around. This was 
just the first one that came into my mind, and it satisfies logarithmic 
complexity.

Marcel

i also posted this to the programmers stack exchange.  but comp.dsp 
peeps seem to know alot about stuff that's broader than DSP.

so i know how to do a fast O(log2(N)) sliding max or sliding min algorithm.

Brookes: "Algorithms for Max and Min Filters with Improved Worst-Case 
Performance" IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS&#4294967295;II: ANALOG AND 
DIGITAL SIGNAL PROCESSING, VOL. 47, NO. 9, SEPTEMBER 2000

but i just cannot figure out how to do a sliding median without first 
sorting the buffer (or at least sorting it half way) and then picking 
the value halfway into the buffer (or midway between the bottom and the 
top of the sorted buffer).

given a finite-length buffer (length N) that is chronologically sorted 
(like a FIFO) and another with the same N values sorted by value, then 
when the buffer window "slides" by one and the oldest value falls offa 
the edge and a new one is input, to insert that new value into the 
sorted buffer, that is an O(N) operation (probabilistically).

what's a faster way to do this?


-- 

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."