DSPRelated.com
Forums

Fast sorting algorithm

Started by Marcel Luttgens March 20, 2004
On Wed, 24 Mar 2004 04:05:10 +0000, Marcel Luttgens wrote:

> 'NO-SWAP SORTING QBASIC PROGRAM > > CLS > > 'DATA contains the chains to be sorted. Multiple occurence 'of the same > chain is allowed. Of course, the program can be 'adapted to chains > contained in files. > > DATA > "My","You","I","Keyb","Fool","A","Thank","Key","Sort","Get","No","Kid","Key" > > n = 13 'n = number of items > > DIM d$(n) 'will receive the original chains DIM r(n) 'will > contain the rank of the chains DIM sort$(n) 'will contain the sorted > chains > > 'Filling d$(n) > > FOR i = 1 TO n > READ a$ > d$(i) = a$: PRINT d$(i); " "; > NEXT i > PRINT > > 'Rank determination. A quicker procedure could be used, 'the following > one being intended as an exemple of a sorting 'algorithm which doesn't > use swapping, a rather lengthy operation. 'Iow, a "No-swap" program > should be faster than classical sorting 'algoriths using swapping. > > m = 1 > Again: > FOR i = 1 TO n > IF d$(m) > d$(i) THEN inf = inf + 1 > NEXT i > > r(m) = inf + 1: PRINT r(m); > m = m + 1: inf = 0: IF m <= n THEN GOTO Again PRINT > > 'Filling d$(n). In case of multiple occurences, some 'values of d$ > remain void (for instance, d$(4) = ""). > > FOR i = 1 TO n > sort$(r(i)) = d$(i) > NEXT i > > 'Filling the voids and printing the sorted chains. > > FOR i = 1 TO n > IF sort$(i) = "" THEN sort$(i) = sort$(i - 1) PRINT sort$(i); " "; > NEXT i > > END
Hmm, I think this gives you a complexity of O(x^2) since you have to go through the array once for each element. I also don't really see the difference to a normal bubble sort, except that it is implementet in a way that might give you a liitle bit of enhancement since you don't use Swaping. Keep on thinking Till -- Please add "Salt and Peper" to the subject line to bypass my spam filter
"Till Crueger" <TillFC@gmx.net> wrote in message news:<c3uber$u3a$1@f1node01.rhrz.uni-bonn.de>...
> On Wed, 24 Mar 2004 04:05:10 +0000, Marcel Luttgens wrote: > > > 'NO-SWAP SORTING QBASIC PROGRAM > > > > CLS > > > > 'DATA contains the chains to be sorted. Multiple occurence 'of the same > > chain is allowed. Of course, the program can be 'adapted to chains > > contained in files. > > > > DATA > > "My","You","I","Keyb","Fool","A","Thank","Key","Sort","Get","No","Kid","Key" > > > > n = 13 'n = number of items > > > > DIM d$(n) 'will receive the original chains DIM r(n) 'will > > contain the rank of the chains DIM sort$(n) 'will contain the sorted > > chains > > > > 'Filling d$(n) > > > > FOR i = 1 TO n > > READ a$ > > d$(i) = a$: PRINT d$(i); " "; > > NEXT i > > PRINT > > > > 'Rank determination. A quicker procedure could be used, 'the following > > one being intended as an exemple of a sorting 'algorithm which doesn't > > use swapping, a rather lengthy operation. 'Iow, a "No-swap" program > > should be faster than classical sorting 'algoriths using swapping. > > > > m = 1 > > Again: > > FOR i = 1 TO n > > IF d$(m) > d$(i) THEN inf = inf + 1 > > NEXT i > > > > r(m) = inf + 1: PRINT r(m); > > m = m + 1: inf = 0: IF m <= n THEN GOTO Again PRINT > > > > 'Filling d$(n). In case of multiple occurences, some 'values of d$ > > remain void (for instance, d$(4) = ""). > > > > FOR i = 1 TO n > > sort$(r(i)) = d$(i) > > NEXT i > > > > 'Filling the voids and printing the sorted chains. > > > > FOR i = 1 TO n > > IF sort$(i) = "" THEN sort$(i) = sort$(i - 1) PRINT sort$(i); " "; > > NEXT i > > > > END > > > Hmm, I think this gives you a complexity of O(x^2) since you have to go > through the array once for each element. I also don't really see the > difference to a normal bubble sort, except that it is implementet in a way > that might give you a liitle bit of enhancement since you don't use > Swaping. > Keep on thinking > Till
Yes, but you overlooked what I wrote: "A quicker procedure could be used,the following one being intended as an exemple of a sorting 'algorithm which doesn't use swapping, a rather lengthy operation." Marcel
muttgens@wanadoo.fr (Marcel Luttgens) wrote in message news:<b45b8808.0403200204.48bcd2c7@posting.google.com>...
> The following sorting algorithm (written in Basic), which could be easily adapted > to other kinds of data, is theoretically much faster than present algorithms. > > Marcel Luttgens > > 'Integers to sort: > > DATA 15,5,8,3,25,15,7,1,17,25,4,19,2,15,18,7,18 > > 'Number of integers = 17 > 'Biggest integer = 25 > > DIM n(25) > > FOR i = 1 TO 17 > READ a > PRINT a; > n(a) = n(a) + 1 > NEXT i > PRINT > > FOR i = 1 TO 25 > FOR j = 1 TO n(i) > IF n(i) <> 0 THEN PRINT i; > NEXT j > NEXT i
Hi This is just what is called a basket sort. It can be used to sort things that have only a few fields, this is faster than quick sort by a lot. In fact, given enough data, regardless of the number fields the basket sort will be faster. Lets say one wanted to sort full 16 bit integers. This would require a large array unless you did this as a radixed sort. The idea here is that you use array sizes that are some integer root of the size. For 16 bits, square root works well since that produces array sizes of 256 instead of 64K. You make two passes through the data. First you do the low order 8 bits and then you repeat for the high order. You do need to watch out for the sign bit. This is easiest to deal with by splitting the second printout part in two where you pick out the positive stuff first and then go back for the negative. ( two loops of 128 each ). This sort looks like it is actually a mixture of the basket sort and the distribution sort because it keeps track of counts rather than values or pointer. One most computers, the pure distribution sort will be a little faster than the true basket sort. It has the main advantage in that if you are keeping the data ( needed when doing most radixed methods ) it can be done in half the memory space. Both of these sorts have the advantage that they sort in N(K+F)+C, where N is the number of items, K is the cost for a single radixed item, F is the number of radixed bins and C is some constant overhead( initializing and such ). This means that given enough data this will always be faster than sort with N*LogN or N^2*LogN type sorts. The trade off point is related mostly to K and not so much to C, since C is mostly things like array initializing. Anyway, I once won a magazine sort constest for best sort that used a combination factor of speed and memory useage with a radixed basket sort. It was fun because for 1000 intergers, it ran something like 30 times faster than the closest sort. In fact, it ran so fast that using the time function on a PC, the timing resolution was not fine enough to give meaning full times. I played with my code compared to a quick sort. I found that for 16 bit integers, the balance point was about 10 items. This tells me that one should always look at options before assuming that the best general case sort is the best for the data you have. Dwight
muttgens@wanadoo.fr (Marcel Luttgens) wrote in message news:<b45b8808.0403200204.48bcd2c7@posting.google.com>...
> The following sorting algorithm (written in Basic), which could be easily adapted > to other kinds of data, is theoretically much faster than present algorithms. > > Marcel Luttgens > > 'Integers to sort: > > DATA 15,5,8,3,25,15,7,1,17,25,4,19,2,15,18,7,18 > > 'Number of integers = 17 > 'Biggest integer = 25 > > DIM n(25) > > FOR i = 1 TO 17 > READ a > PRINT a; > n(a) = n(a) + 1 > NEXT i > PRINT > > FOR i = 1 TO 25 > FOR j = 1 TO n(i) > IF n(i) <> 0 THEN PRINT i; > NEXT j > NEXT i
Hi This seems to be a variation of a distribution sort or a form of basket sort. These sorts are quite fast for things with a limited number of fields. For a large number of fields. Things like quicksort take over. Still, for something like sorting 16 bit integers. This general type of sort will beat quicksort for anything greater than about 10 items. Even for sorts with a large number of fields, this algorithm quickly catches up with quicksort, if the data set is large enough. Years ago I wrote a distribution sort for a friend that was sorting a mail list for the boy scouts. He was shock at how much faster it was with only about 500 names to sort than a good clean quicksort. I'd also done a similar one for a sorting contest it was so much faster than the nearest competition at sorting 1000 integers that there timing scheme could accurately time it. Dwight