On Monday, April 27, 2015 at 5:19:52 PM UTC-7, robert bristow-johnson wrote:
> ...
> but i just cannot figure out how to do a sliding median without first
> sorting the buffer (or at least sorting it half way) and then picking
> the value halfway into the buffer (or midway between the bottom and the
> top of the sorted buffer).
> ...
> r b-j
What I've used was known as a 'dual heap algorithm'. Some times it now appears as 'streaming median'.
In:
http://stackoverflow.com/questions/10930732/c-efficiently-calculating-a-running-median
look under the algorithm called 'streaming median'
Note: the desired order is selected by how the heaps are initialized by filling with min-representable and max-repesentable values.
Also:
transactions on signal processing 1991 vol39 n01 jan page 0204
Comparison of algorithms for standard median filtering
Juhola, M. Katajainen, J. Raita, T.
Dept. of Comput. Sci., Turku Univ., Finland
Abstract:
Standard median filtering that is searched repeatedly for a median from a sample set which changes only slightly between the subsequent searches is discussed. Several well-known methods for solving this running median problem are reviewed, the (asymptotical) time complexities of the methods are analyzed, and simple variants are proposed which are especially suited for small sample sets, a frequent situation. Although the discussion is restricted to the one-dimensional case, the ideas are easily extended to higher dimensions.
Dale B. Dalrymple
Reply by glen herrmannsfeldt●April 29, 20152015-04-29
robert bristow-johnson <rbj@audioimagination.com> wrote:
> i also posted this to the programmers stack exchange. but comp.dsp
> peeps seem to know alot about stuff that's broader than DSP.
> so i know how to do a fast O(log2(N)) sliding max or sliding min algorithm.
The other posts probably answer your question, but for the non-sliding
case there is an O(n) median algorithm in Knuth, The Art of Computer
Programming", I believe volume 3.
-- glen
Reply by robert bristow-johnson●April 28, 20152015-04-28
On 4/28/15 5:43 PM, Marcel Mueller wrote:
> On 28.04.15 20.06, martinvicanek wrote:
>> For a sliding window median the fastest algorithms use histograms, as in
>> Hunag et al 1979
>> https://www.freelists.org/archives/ingsw3/11-2013/pdfNNRwXsvXQN.pdf.
>> In 1D
>> Hunag's scheme is O(1), as updating the histogram is simpler than adding
>> data to an ordered list.
>
> Well, this only works if the domain of the samples is strictly limited.
> No big deal with 8 or 10 bit image data, no longer that efficient for 16
> bit data and completely garbage for 32 bit or floating point data. So I
> did not recommend it, since the OP did not specify the range of values.
at the time i wasn't worrying about the range of values. until today, i
really didn't imagine that it was salient to getting the median (i.e. i
thought the "compare" instruction worked independent of any implied
range). still trying to grok this range and precision thingie.
the only issue i was worrying about is pretty basic:
new sample, x[n], comes in,
old sample, x[n-N], is pushed off the edge,
N-1 of the N samples under consideration are the same as before,
what is the most efficient way (fewest instructions worst case) to
recalculate the median?
--
r b-j rbj@audioimagination.com
"Imagination is more important than knowledge."
Reply by Marcel Mueller●April 28, 20152015-04-28
On 28.04.15 20.06, martinvicanek wrote:
> For a sliding window median the fastest algorithms use histograms, as in
> Hunag et al 1979
> https://www.freelists.org/archives/ingsw3/11-2013/pdfNNRwXsvXQN.pdf. In 1D
> Hunag's scheme is O(1), as updating the histogram is simpler than adding
> data to an ordered list.
Well, this only works if the domain of the samples is strictly limited.
No big deal with 8 or 10 bit image data, no longer that efficient for 16
bit data and completely garbage for 32 bit or floating point data. So I
did not recommend it, since the OP did not specify the range of values.
Marcel
Reply by Marcel Mueller●April 28, 20152015-04-28
On 28.04.15 21.19, robert bristow-johnson wrote:
>> Furthermore, you do not need two buffers. A single sorted one is
>> sufficient.
>
> then how do you know which sample in the sorted buffer is the one that
> falls offa the edge when the window is bumped over by one sample location?
No problem. See below.
> it seems to me to be more logical to maintain a list that is
> chronologically ordered (like a FIFO) and then link the values in that
> list to their neighbors in the hypothetical sorted list.
Of course, you need this kind of list. But there is no need to store
them in an array since all you need is forward iteration and append to
the end. So a linked list is perfectly sufficient.
>> All you need to do for this purpose is to add a pointer to
>> each element that links to the next sample respectively, building a
>> linked list of samples.
>
> so the samples are sorted by value, but they are linked (with a
> bidirectional linked list, i think) to their chronological neighbors?
A single pointer to the successor is an unidirectional linked list.
You never need to know the predecessor.
>> Now all you need to do to remove a sample is to
>> take the pointer from the last removed element. Insert operations are
>> naturally ordered by sample value. But they need to update the link
>> pointer of the last sample.
>
> now, if we don't close up the hole created by removing the old sample
> and we just plop the new sample, without inserting, onto the end of the
> buffer, what keeps the buffer from just expanding forever?
The buffer is simply a pool of elements to be allocated from. Nothing
else. The position in the buffer has no meaning at all.
To iterate to sample n+1 you first determine the oldest sample to
remove, perform the required unlink operations on the tree structure.
Then you have a tree item in your buffer that could be discarded.
However, you will immediately need a new item to insert the next sample
into the tree, so just take the free slot and reuse it for the insert
operation. I.e. fill the sample, setup the link pointer in the lastly
inserted item to the current slot, and finally insert the slot into the
tree structure.
So the buffer is just an array with N tree nodes. It will never grow nor
shrink.
If you have C++ a map<sample_value, element_type> with a custom
allocator that allocates from your array of map<...>::value_type should
do the job. element_type is just map<...>::iterator, i.e. the link to
the next sample. I am just not sure whether C++ accepts this recursive
type definition.
I am pretty sure that there are even faster algorithms possible. In fact
you do not need a sorted structure of all sample values in the window
all the time. It is likely that extreme sample values will never
contribute to the median. So in fact you only need the set of samples
greater than the median and the set of samples less than the median.
However, when the median moves the set has to be partially sorted to
determine the minimum or maximum value depending on the move direction.
This could be cheaper than sorting the entire array all the time, at
least amortized.
Another idea might be to use a fully balanced binary tree, e.g. an AVL
tree. In this case the root node of the tree is always just the median
and there are no further actions required to update the median pointer.
The expense is that you need to do more node rotations than with a
partially balanced tree.
Marcel
Reply by robert bristow-johnson●April 28, 20152015-04-28
On 4/28/15 2:06 PM, martinvicanek wrote:
> For a sliding window median the fastest algorithms use histograms, as in
> Hunag et al 1979
> https://www.freelists.org/archives/ingsw3/11-2013/pdfNNRwXsvXQN.pdf. In 1D
> Hunag's scheme is O(1), as updating the histogram is simpler than adding
> data to an ordered list. More sophisticated schemes exist for 2D which are
> also O(1), e.g. https://nomis80.org/ctmf.pdf. For 8 bit resolution, say,
> you can use a histogram straight away, while for higher bit depths it is
> more efficient to maintain one coarse and one or more finer histograms.
downloaded both .pdf files. just started reading. never thought of
speed (paid for with approximation) based on lowered resolution before.
--
r b-j rbj@audioimagination.com
"Imagination is more important than knowledge."
Reply by robert bristow-johnson●April 28, 20152015-04-28
On 4/28/15 3:01 AM, Marcel Mueller wrote:
>
> Furthermore, you do not need two buffers. A single sorted one is
> sufficient.
then how do you know which sample in the sorted buffer is the one that
falls offa the edge when the window is bumped over by one sample location?
it seems to me to be more logical to maintain a list that is
chronologically ordered (like a FIFO) and then link the values in that
list to their neighbors in the hypothetical sorted list. but i don't
see how to do an efficient O(log2(N)) binary search to find the insert
or remove points without a parallel sorted array.
> All you need to do for this purpose is to add a pointer to
> each element that links to the next sample respectively, building a
> linked list of samples.
so the samples are sorted by value, but they are linked (with a
bidirectional linked list, i think) to their chronological neighbors?
> Now all you need to do to remove a sample is to
> take the pointer from the last removed element. Insert operations are
> naturally ordered by sample value. But they need to update the link
> pointer of the last sample.
now, if we don't close up the hole created by removing the old sample
and we just plop the new sample, without inserting, onto the end of the
buffer, what keeps the buffer from just expanding forever?
hmmmm, i can kinda see that this buffer would never expand larger than
twice the original size. but i still haven't completely grok'd this.
--
r b-j rbj@audioimagination.com
"Imagination is more important than knowledge."
Reply by martinvicanek●April 28, 20152015-04-28
For a sliding window median the fastest algorithms use histograms, as in
Hunag et al 1979
https://www.freelists.org/archives/ingsw3/11-2013/pdfNNRwXsvXQN.pdf. In 1D
Hunag's scheme is O(1), as updating the histogram is simpler than adding
data to an ordered list. More sophisticated schemes exist for 2D which are
also O(1), e.g. https://nomis80.org/ctmf.pdf. For 8 bit resolution, say,
you can use a histogram straight away, while for higher bit depths it is
more efficient to maintain one coarse and one or more finer histograms.
---------------------------------------
Posted through http://www.DSPRelated.com
Reply by Marcel Mueller●April 28, 20152015-04-28
On 28.04.15 02.19, robert bristow-johnson wrote:
> but i just cannot figure out how to do a sliding median without first
> sorting the buffer (or at least sorting it half way) and then picking
> the value halfway into the buffer (or midway between the bottom and the
> top of the sorted buffer).
>
> given a finite-length buffer (length N) that is chronologically sorted
> (like a FIFO) and another with the same N values sorted by value, then
> when the buffer window "slides" by one and the oldest value falls offa
> the edge and a new one is input, to insert that new value into the
> sorted buffer, that is an O(N) operation (probabilistically).
>
> what's a faster way to do this?
You will never succeed with a sorted array as buffer, because the insert
and delete operations are too expensive. You will need a tree structure,
preferably a Red-Black-Tree.
Furthermore, you do not need two buffers. A single sorted one is
sufficient. All you need to do for this purpose is to add a pointer to
each element that links to the next sample respectively, building a
linked list of samples. Now all you need to do to remove a sample is to
take the pointer from the last removed element. Insert operations are
naturally ordered by sample value. But they need to update the link
pointer of the last sample. So you need to keep a pointer to the
previously added sample as well. The operation is O(M log N), with M the
number of processed samples and N the window size. Due to the nature of
RB-Trees worst case complexity might be worse because of balancing, but
I am not that familiar with RB-Trees.
Of course, the nodes of your RB-Tree might be stored in an array of size
N. You need only to track the slot that was recently removed for reuse
by the new sample. So all you need is the array and a few pointers:
'head' (first sample), 'tail' (last sample for link updates), tree
'root' and 'median' to speed up median lookup. The median pointer has to
be moved left or right by one when the added and removed sample are on
different sides of the median. For this purpose I recommend to use
stable sorting, because then you know that an insert with the same value
than the median is always right to the current median and a removed one
with the same value is left.
Maybe there are other even more efficient algorithms around. This was
just the first one that came into my mind, and it satisfies logarithmic
complexity.
Marcel
Reply by robert bristow-johnson●April 27, 20152015-04-27
i also posted this to the programmers stack exchange. but comp.dsp
peeps seem to know alot about stuff that's broader than DSP.
so i know how to do a fast O(log2(N)) sliding max or sliding min algorithm.
Brookes: "Algorithms for Max and Min Filters with Improved Worst-Case
Performance" IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS�II: ANALOG AND
DIGITAL SIGNAL PROCESSING, VOL. 47, NO. 9, SEPTEMBER 2000
but i just cannot figure out how to do a sliding median without first
sorting the buffer (or at least sorting it half way) and then picking
the value halfway into the buffer (or midway between the bottom and the
top of the sorted buffer).
given a finite-length buffer (length N) that is chronologically sorted
(like a FIFO) and another with the same N values sorted by value, then
when the buffer window "slides" by one and the oldest value falls offa
the edge and a new one is input, to insert that new value into the
sorted buffer, that is an O(N) operation (probabilistically).
what's a faster way to do this?
--
r b-j rbj@audioimagination.com
"Imagination is more important than knowledge."