Reply by dbd May 1, 20152015-05-01
On Monday, April 27, 2015 at 5:19:52 PM UTC-7, robert bristow-johnson wrote:
> ... > but i just cannot figure out how to do a sliding median without first > sorting the buffer (or at least sorting it half way) and then picking > the value halfway into the buffer (or midway between the bottom and the > top of the sorted buffer). > ...
> r b-j
What I've used was known as a 'dual heap algorithm'. Some times it now appears as 'streaming median'. In: http://stackoverflow.com/questions/10930732/c-efficiently-calculating-a-running-median look under the algorithm called 'streaming median' Note: the desired order is selected by how the heaps are initialized by filling with min-representable and max-repesentable values. Also: transactions on signal processing 1991 vol39 n01 jan page 0204 Comparison of algorithms for standard median filtering Juhola, M. Katajainen, J. Raita, T. Dept. of Comput. Sci., Turku Univ., Finland Abstract: Standard median filtering that is searched repeatedly for a median from a sample set which changes only slightly between the subsequent searches is discussed. Several well-known methods for solving this running median problem are reviewed, the (asymptotical) time complexities of the methods are analyzed, and simple variants are proposed which are especially suited for small sample sets, a frequent situation. Although the discussion is restricted to the one-dimensional case, the ideas are easily extended to higher dimensions. Dale B. Dalrymple
Reply by glen herrmannsfeldt April 29, 20152015-04-29
robert bristow-johnson <rbj@audioimagination.com> wrote:
 
> i also posted this to the programmers stack exchange. but comp.dsp > peeps seem to know alot about stuff that's broader than DSP.
> so i know how to do a fast O(log2(N)) sliding max or sliding min algorithm.
The other posts probably answer your question, but for the non-sliding case there is an O(n) median algorithm in Knuth, The Art of Computer Programming", I believe volume 3. -- glen
Reply by robert bristow-johnson April 28, 20152015-04-28
On 4/28/15 5:43 PM, Marcel Mueller wrote:
> On 28.04.15 20.06, martinvicanek wrote: >> For a sliding window median the fastest algorithms use histograms, as in >> Hunag et al 1979 >> https://www.freelists.org/archives/ingsw3/11-2013/pdfNNRwXsvXQN.pdf. >> In 1D >> Hunag's scheme is O(1), as updating the histogram is simpler than adding >> data to an ordered list. > > Well, this only works if the domain of the samples is strictly limited. > No big deal with 8 or 10 bit image data, no longer that efficient for 16 > bit data and completely garbage for 32 bit or floating point data. So I > did not recommend it, since the OP did not specify the range of values.
at the time i wasn't worrying about the range of values. until today, i really didn't imagine that it was salient to getting the median (i.e. i thought the "compare" instruction worked independent of any implied range). still trying to grok this range and precision thingie. the only issue i was worrying about is pretty basic: new sample, x[n], comes in, old sample, x[n-N], is pushed off the edge, N-1 of the N samples under consideration are the same as before, what is the most efficient way (fewest instructions worst case) to recalculate the median? -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
Reply by Marcel Mueller April 28, 20152015-04-28
On 28.04.15 20.06, martinvicanek wrote:
> For a sliding window median the fastest algorithms use histograms, as in > Hunag et al 1979 > https://www.freelists.org/archives/ingsw3/11-2013/pdfNNRwXsvXQN.pdf. In 1D > Hunag's scheme is O(1), as updating the histogram is simpler than adding > data to an ordered list.
Well, this only works if the domain of the samples is strictly limited. No big deal with 8 or 10 bit image data, no longer that efficient for 16 bit data and completely garbage for 32 bit or floating point data. So I did not recommend it, since the OP did not specify the range of values. Marcel
Reply by Marcel Mueller April 28, 20152015-04-28
On 28.04.15 21.19, robert bristow-johnson wrote:
>> Furthermore, you do not need two buffers. A single sorted one is >> sufficient. > > then how do you know which sample in the sorted buffer is the one that > falls offa the edge when the window is bumped over by one sample location?
No problem. See below.
> it seems to me to be more logical to maintain a list that is > chronologically ordered (like a FIFO) and then link the values in that > list to their neighbors in the hypothetical sorted list.
Of course, you need this kind of list. But there is no need to store them in an array since all you need is forward iteration and append to the end. So a linked list is perfectly sufficient.
>> All you need to do for this purpose is to add a pointer to >> each element that links to the next sample respectively, building a >> linked list of samples. > > so the samples are sorted by value, but they are linked (with a > bidirectional linked list, i think) to their chronological neighbors?
A single pointer to the successor is an unidirectional linked list. You never need to know the predecessor.
>> Now all you need to do to remove a sample is to >> take the pointer from the last removed element. Insert operations are >> naturally ordered by sample value. But they need to update the link >> pointer of the last sample. > > now, if we don't close up the hole created by removing the old sample > and we just plop the new sample, without inserting, onto the end of the > buffer, what keeps the buffer from just expanding forever?
The buffer is simply a pool of elements to be allocated from. Nothing else. The position in the buffer has no meaning at all. To iterate to sample n+1 you first determine the oldest sample to remove, perform the required unlink operations on the tree structure. Then you have a tree item in your buffer that could be discarded. However, you will immediately need a new item to insert the next sample into the tree, so just take the free slot and reuse it for the insert operation. I.e. fill the sample, setup the link pointer in the lastly inserted item to the current slot, and finally insert the slot into the tree structure. So the buffer is just an array with N tree nodes. It will never grow nor shrink. If you have C++ a map<sample_value, element_type> with a custom allocator that allocates from your array of map<...>::value_type should do the job. element_type is just map<...>::iterator, i.e. the link to the next sample. I am just not sure whether C++ accepts this recursive type definition. I am pretty sure that there are even faster algorithms possible. In fact you do not need a sorted structure of all sample values in the window all the time. It is likely that extreme sample values will never contribute to the median. So in fact you only need the set of samples greater than the median and the set of samples less than the median. However, when the median moves the set has to be partially sorted to determine the minimum or maximum value depending on the move direction. This could be cheaper than sorting the entire array all the time, at least amortized. Another idea might be to use a fully balanced binary tree, e.g. an AVL tree. In this case the root node of the tree is always just the median and there are no further actions required to update the median pointer. The expense is that you need to do more node rotations than with a partially balanced tree. Marcel
Reply by robert bristow-johnson April 28, 20152015-04-28
On 4/28/15 2:06 PM, martinvicanek wrote:
> For a sliding window median the fastest algorithms use histograms, as in > Hunag et al 1979 > https://www.freelists.org/archives/ingsw3/11-2013/pdfNNRwXsvXQN.pdf. In 1D > Hunag's scheme is O(1), as updating the histogram is simpler than adding > data to an ordered list. More sophisticated schemes exist for 2D which are > also O(1), e.g. https://nomis80.org/ctmf.pdf. For 8 bit resolution, say, > you can use a histogram straight away, while for higher bit depths it is > more efficient to maintain one coarse and one or more finer histograms.
downloaded both .pdf files. just started reading. never thought of speed (paid for with approximation) based on lowered resolution before. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
Reply by robert bristow-johnson April 28, 20152015-04-28
On 4/28/15 3:01 AM, Marcel Mueller wrote:
> > Furthermore, you do not need two buffers. A single sorted one is > sufficient.
then how do you know which sample in the sorted buffer is the one that falls offa the edge when the window is bumped over by one sample location? it seems to me to be more logical to maintain a list that is chronologically ordered (like a FIFO) and then link the values in that list to their neighbors in the hypothetical sorted list. but i don't see how to do an efficient O(log2(N)) binary search to find the insert or remove points without a parallel sorted array.
> All you need to do for this purpose is to add a pointer to > each element that links to the next sample respectively, building a > linked list of samples.
so the samples are sorted by value, but they are linked (with a bidirectional linked list, i think) to their chronological neighbors?
> Now all you need to do to remove a sample is to > take the pointer from the last removed element. Insert operations are > naturally ordered by sample value. But they need to update the link > pointer of the last sample.
now, if we don't close up the hole created by removing the old sample and we just plop the new sample, without inserting, onto the end of the buffer, what keeps the buffer from just expanding forever? hmmmm, i can kinda see that this buffer would never expand larger than twice the original size. but i still haven't completely grok'd this. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
Reply by martinvicanek April 28, 20152015-04-28
For a sliding window median the fastest algorithms use histograms, as in
Hunag et al 1979
https://www.freelists.org/archives/ingsw3/11-2013/pdfNNRwXsvXQN.pdf. In 1D
Hunag's scheme is O(1), as updating the histogram is simpler than adding
data to an ordered list. More sophisticated schemes exist for 2D which are
also O(1), e.g. https://nomis80.org/ctmf.pdf. For 8 bit resolution, say,
you can use a histogram straight away, while for higher bit depths it is
more efficient to maintain one coarse and one or more finer histograms.
---------------------------------------
Posted through http://www.DSPRelated.com
Reply by Marcel Mueller April 28, 20152015-04-28
On 28.04.15 02.19, robert bristow-johnson wrote:
> but i just cannot figure out how to do a sliding median without first > sorting the buffer (or at least sorting it half way) and then picking > the value halfway into the buffer (or midway between the bottom and the > top of the sorted buffer). > > given a finite-length buffer (length N) that is chronologically sorted > (like a FIFO) and another with the same N values sorted by value, then > when the buffer window "slides" by one and the oldest value falls offa > the edge and a new one is input, to insert that new value into the > sorted buffer, that is an O(N) operation (probabilistically). > > what's a faster way to do this?
You will never succeed with a sorted array as buffer, because the insert and delete operations are too expensive. You will need a tree structure, preferably a Red-Black-Tree. Furthermore, you do not need two buffers. A single sorted one is sufficient. All you need to do for this purpose is to add a pointer to each element that links to the next sample respectively, building a linked list of samples. Now all you need to do to remove a sample is to take the pointer from the last removed element. Insert operations are naturally ordered by sample value. But they need to update the link pointer of the last sample. So you need to keep a pointer to the previously added sample as well. The operation is O(M log N), with M the number of processed samples and N the window size. Due to the nature of RB-Trees worst case complexity might be worse because of balancing, but I am not that familiar with RB-Trees. Of course, the nodes of your RB-Tree might be stored in an array of size N. You need only to track the slot that was recently removed for reuse by the new sample. So all you need is the array and a few pointers: 'head' (first sample), 'tail' (last sample for link updates), tree 'root' and 'median' to speed up median lookup. The median pointer has to be moved left or right by one when the added and removed sample are on different sides of the median. For this purpose I recommend to use stable sorting, because then you know that an insert with the same value than the median is always right to the current median and a removed one with the same value is left. Maybe there are other even more efficient algorithms around. This was just the first one that came into my mind, and it satisfies logarithmic complexity. Marcel
Reply by robert bristow-johnson April 27, 20152015-04-27
i also posted this to the programmers stack exchange.  but comp.dsp 
peeps seem to know alot about stuff that's broader than DSP.

so i know how to do a fast O(log2(N)) sliding max or sliding min algorithm.

Brookes: "Algorithms for Max and Min Filters with Improved Worst-Case 
Performance" IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS&#4294967295;II: ANALOG AND 
DIGITAL SIGNAL PROCESSING, VOL. 47, NO. 9, SEPTEMBER 2000

but i just cannot figure out how to do a sliding median without first 
sorting the buffer (or at least sorting it half way) and then picking 
the value halfway into the buffer (or midway between the bottom and the 
top of the sorted buffer).

given a finite-length buffer (length N) that is chronologically sorted 
(like a FIFO) and another with the same N values sorted by value, then 
when the buffer window "slides" by one and the oldest value falls offa 
the edge and a new one is input, to insert that new value into the 
sorted buffer, that is an O(N) operation (probabilistically).

what's a faster way to do this?


-- 

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."