DSPRelated.com
Forums

Fast sorting algorithm

Started by Marcel Luttgens March 20, 2004
Andor Bariska <an2or@nospam.net> wrote in message news:<405c3d21$1@pfaff2.ethz.ch>...
> Marcel Luttgens wrote: > > The following sorting algorithm (written in Basic), which could be easily adapted > > to other kinds of data, is theoretically much faster than present algorithms. > > I don't know why you are posting this here (people around here don't > take sorting very seriously, check a recent thread about sorting on DSPs :). > > You have hit on what is called pigeonhole search (the pigeonholes are > your n(i)).
No, because objects in the array can be identical. Try to run the Basic program.
> > It in fact runs in linear time. Its drawback is that you can only sort > values taken from a discrete set. More here: > http://en.wikipedia.org/wiki/Pigeonhole_sort >
What I presented is the main routine, which can easily be used to sort character chains (names, etc...) by using the ASCII values, which are of course integers.
> > Regards, > Andor
Marcel
"Till Crueger" <TillFC@gmx.net> wrote in message news:<c3hsk0$sea$1@f1node01.rhrz.uni-bonn.de>...
> On Sat, 20 Mar 2004 06:59:13 +0000, Marcel Luttgens wrote: > > You are right, but this is a new approach, and the progam should take > > the nature of the data into account. > > It should be possible to emulate the data with some set of integers, > > sort them and then retrieve the original data. > > Hmm, The only way I can think of right now to emulate any data with > integers is to sort it and number it. But for this you would already need > another sorting algorithm. If you have any other way to represent for > example strings throuch integers, so that the Alphabetic order is kept in > the representation, that would really be a great way to sort through large > sets of Data. >
For instance in Basic, the function ASC(x$) gives the ASCII value (an integer) of the first character of the chain x$.
> > I only gave the principle. With this approach, the number of steps > > depends essentially on the number of data to be sorted, not on its > > square, which is the case of all known sorting algorithms (AFAIK). > > I think there are several sorting algorithms wich have a complexity > x*log(x) but I might be mistaken there. > > TIll
Marcel
"Unbeliever" <alfkatz@remove.the.bleedin.obvious.ieee.org> wrote in message news:<405c34eb$0$31906$afc38c87@news.optusnet.com.au>...
> > The following sorting algorithm (written in Basic), which could be easily > adapted > > to other kinds of data, is theoretically much faster than present > algorithms. > > > > Marcel Luttgens > > > > 'Integers to sort: > > > > DATA 15,5,8,3,25,15,7,1,17,25,4,19,2,15,18,7,18 > > > > 'Number of integers = 17 > > 'Biggest integer = 25 > > > > DIM n(25) > > > > FOR i = 1 TO 17 > > READ a > > PRINT a; > > n(a) = n(a) + 1 > > NEXT i > > PRINT > > > > FOR i = 1 TO 25 > > FOR j = 1 TO n(i) > > IF n(i) <> 0 THEN PRINT i; > > NEXT j > > NEXT i > > This is a counter suitable for counting arrays of small integral values, not > a sort, though it can be used as a sort for integral values where the number > of integral values (m) is known. When used as such it is O (n * m). >
Yes
> OK for sorting where and < n. Where m>n, as m and n become large, this > rapidly becomes slower than the naive O(n*n) sorts such as bubble, > insertion, selection and shell sorts and much slower than the O (nlog n) > sorts such as quick heap and mergesorts.
There is no problem with m. If some integers are very large, they can be temporarily replaced by smaller ones.
> > m rapidly becomes unmanageable for string data. > > It cannot be used for non integral data.
Yes, but ASCII values are integers, so string data like words can be sorted.
> > Keep thinking, though. > > Cheers, > Alf >
Marcel
glen herrmannsfeldt wrote:

> Bob Cain wrote: > >> Till Crueger wrote: >> >>> On Sat, 20 Mar 2004 11:07:18 +0000, Bob Cain wrote: > > >>>> Any data in a computer whose representation in bits has a fixed >>>> ordering >>>> of signifigance can be sorted in O(N) operations by a radix sort. > > >>> Thanks for pointing that one out. I didn't know radix sort yet. Do I see >>> it correctly though, that the complexity is then also dependend on the >>> length of the words, since you have to sort the array once for every >>> digit? > > >> Right. From which an argument can be made that it is still >> O(N*log(N)) but that doesn't change the fact that in real application >> it's O(N). The thing that's nice is that a comparison operation can >> be replaced by a bit test. All those with ones at the current bit >> position go to the top and all the zeros to the bottom. That's not as >> signifigant as it once was. > > > It can also be done in bases other than two. IBM used to make card > sorting machines that would read one column of a punched card and > drop the card in the appropriate bin. The result is a decimal > based radix sort. For alphanumeric data you need two passes per > card column, one for the digits and one for the zones. > > -- glen
Knuth. I think, gives an example with a deck of cards. First sort into 13 piles by value, without regard to suit. Then reconstitute the deck in value order, and sort by suit. (The card sorters could sort on more than one column at a time.) Jerry -- Engineering is the art of making what you want from things you can get. &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;
On Sun, 21 Mar 2004 01:07:23 +0000, Marcel Luttgens wrote:

>> Hmm, The only way I can think of right now to emulate any data with >> integers is to sort it and number it. But for this you would already >> need another sorting algorithm. If you have any other way to represent >> for example strings throuch integers, so that the Alphabetic order is >> kept in the representation, that would really be a great way to sort >> through large sets of Data. >> >> > For instance in Basic, the function ASC(x$) gives the ASCII value (an > integer) of the first character of the chain x$.
Well, I ment a function that would reduce the complexity. Such as a function that you could give an array of n strings and it will return numbers from 0 to n-1 for each string. Just taking the ASCII value will not really reduce the complexity of the problem. The range of numbers you would get by just using the ASC function would be 27^length (without special chars), since you would also have to take into account the other characters of the strings. This would make your loop where you go through your array really time consuming. Till -- Please add "Salt and Peper" to the subject line to bypass my spam filter
glen herrmannsfeldt wrote:


> > It can also be done in bases other than two. IBM used to make card > sorting machines that would read one column of a punched card and > drop the card in the appropriate bin. The result is a decimal > based radix sort. For alphanumeric data you need two passes per > card column, one for the digits and one for the zones.
That's exactly where Luther got the idea for the radix 2 sort and the data structure built around it. He was a genius computer operator before becoming a genius computer programmer. :-) Bob -- "Things should be described as simply as possible, but no simpler." A. Einstein
Marcel Luttgens wrote:

> The following sorting algorithm (written in Basic), which could be > easily adapted to other kinds of data, is theoretically much > faster than present algorithms. > > Marcel Luttgens > > 'Integers to sort: > > DATA 15,5,8,3,25,15,7,1,17,25,4,19,2,15,18,7,18 > > 'Number of integers = 17 > 'Biggest integer = 25 > > DIM n(25) > > FOR i = 1 TO 17 > READ a > PRINT a;
Hi Marcel, I just looked at your piece of code and wondered how the next line works:
> n(a) = n(a) + 1
if this isn't a typo: which value do the n(a) have before entering the loop? and which content would vector n have after the loop? or should it read something like: n(a) = n(a-1) + 1 with n(0)=0 Bernhard
> NEXT i > PRINT > > FOR i = 1 TO 25 > FOR j = 1 TO n(i) > IF n(i) <> 0 THEN PRINT i; > NEXT j > NEXT i
-- before sending to the above email-address: replace deadspam.com by foerstergroup.de
Bernhard Holzmayer <holzmayer.bernhard@deadspam.com> wrote in message news:<12000672.EL1ESm1pHD@holzmayer.ifr.rt>...
> Marcel Luttgens wrote: > > > The following sorting algorithm (written in Basic), which could be > > easily adapted to other kinds of data, is theoretically much > > faster than present algorithms. > > > > Marcel Luttgens > > > > 'Integers to sort: > > > > DATA 15,5,8,3,25,15,7,1,17,25,4,19,2,15,18,7,18 > > > > 'Number of integers = 17 > > 'Biggest integer = 25 > > > > DIM n(25) > > > > FOR i = 1 TO 17 > > READ a > > PRINT a; > > Hi Marcel, I just looked at your piece of code and wondered how the > next line works: > > > n(a) = n(a) + 1 > > if this isn't a typo: > which value do the n(a) have before entering the loop? > and which content would vector n have after the loop?
n(a) contains the number of occurences of the same integer.
> > or should it read something like: n(a) = n(a-1) + 1 with n(0)=0 > > Bernhard > > > NEXT i > > PRINT > > > > FOR i = 1 TO 25 > > FOR j = 1 TO n(i) > > IF n(i) <> 0 THEN PRINT i; > > NEXT j > > NEXT i
As you are interested in sorting, this is a new procedure (written in Qbasic, try to run it): 'NO-SWAP SORTING PROGRAM CLS 'DATA contains the chains to be sorted. Multiple occurence 'of the same chain is allowed. Of course, the program can be 'adapted to chains contained in files. DATA "My","You","I","Keyb","Fool","A","Thank","Key","Sort","Get","No","Kid","Key" n = 13 'n = number of items DIM d$(n) 'will receive the original chains DIM r(n) 'will contain the rank of the chains DIM sort$(n) 'will contain the sorted chains 'Filling d$(n) FOR i = 1 TO n READ a$ d$(i) = a$: PRINT d$(i); " "; NEXT i PRINT 'Rank determination. A quicker procedure could be used, 'the following one being intended as an exemple of a sorting 'algorithm which doesn't use swapping, a rather lengthy operation. 'Iow, a "No-swap" program should be faster than classical sorting 'algoriths using swapping. m = 1 Again: FOR i = 1 TO n IF d$(m) > d$(i) THEN inf = inf + 1 NEXT i r(m) = inf + 1: PRINT r(m); m = m + 1: inf = 0: IF m <= n THEN GOTO Again PRINT 'Filling d$(n). In case of multiple occurences, some 'values of d$ remain void (for instance, d$(4) = ""). FOR i = 1 TO n sort$(r(i)) = d$(i) NEXT i 'Filling the voids and printing the sorted chains. FOR i = 1 TO n IF sort$(i) = "" THEN sort$(i) = sort$(i - 1) PRINT sort$(i); " "; NEXT i END Marcel
"Unbeliever" <alfkatz@remove.the.bleedin.obvious.ieee.org> wrote in message news:<405c34eb$0$31906$afc38c87@news.optusnet.com.au>...

> Keep thinking, though.
I did, see below. Thanks.
> > Cheers, > Alf
'NO-SWAP SORTING QBASIC PROGRAM CLS 'DATA contains the chains to be sorted. Multiple occurence 'of the same chain is allowed. Of course, the program can be 'adapted to chains contained in files. DATA "My","You","I","Keyb","Fool","A","Thank","Key","Sort","Get","No","Kid","Key" n = 13 'n = number of items DIM d$(n) 'will receive the original chains DIM r(n) 'will contain the rank of the chains DIM sort$(n) 'will contain the sorted chains 'Filling d$(n) FOR i = 1 TO n READ a$ d$(i) = a$: PRINT d$(i); " "; NEXT i PRINT 'Rank determination. A quicker procedure could be used, 'the following one being intended as an exemple of a sorting 'algorithm which doesn't use swapping, a rather lengthy operation. 'Iow, a "No-swap" program should be faster than classical sorting 'algoriths using swapping. m = 1 Again: FOR i = 1 TO n IF d$(m) > d$(i) THEN inf = inf + 1 NEXT i r(m) = inf + 1: PRINT r(m); m = m + 1: inf = 0: IF m <= n THEN GOTO Again PRINT 'Filling d$(n). In case of multiple occurences, some 'values of d$ remain void (for instance, d$(4) = ""). FOR i = 1 TO n sort$(r(i)) = d$(i) NEXT i 'Filling the voids and printing the sorted chains. FOR i = 1 TO n IF sort$(i) = "" THEN sort$(i) = sort$(i - 1) PRINT sort$(i); " "; NEXT i END Marcel
Marcel Luttgens wrote:

> Bernhard Holzmayer <holzmayer.bernhard@deadspam.com> wrote in > message news:<12000672.EL1ESm1pHD@holzmayer.ifr.rt>... >> Marcel Luttgens wrote: >> >> > The following sorting algorithm (written in Basic), which could >> > be easily adapted to other kinds of data, is theoretically much >> > faster than present algorithms. >> > >> > Marcel Luttgens >> > >> > 'Integers to sort: >> > >> > DATA 15,5,8,3,25,15,7,1,17,25,4,19,2,15,18,7,18 >> > >> > 'Number of integers = 17 >> > 'Biggest integer = 25 >> > >> > DIM n(25) >> > >> > FOR i = 1 TO 17 >> > READ a >> > PRINT a; >> >> Hi Marcel, I just looked at your piece of code and wondered how >> the next line works: >> >> > n(a) = n(a) + 1 >> >> if this isn't a typo: >> which value do the n(a) have before entering the loop? >> and which content would vector n have after the loop? > > n(a) contains the number of occurences of the same integer.
I just wondered if/how the container has been initialized. I guess you just assume that it is zero by default.
> >> >> or should it read something like: n(a) = n(a-1) + 1 with n(0)=0 >> >> Bernhard >> >> > NEXT i >> > PRINT >> > >> > FOR i = 1 TO 25 >> > FOR j = 1 TO n(i) >> > IF n(i) <> 0 THEN PRINT i; >> > NEXT j >> > NEXT i > > As you are interested in sorting, this is a new procedure > (written in Qbasic, try to run it): >
Sorry, I'm not interested in sorting algorithms. Nor in Qbasic, and I don't have an appropriate interpreter around. I was simply trying to understand your code. Thanks for your description. Bernhard