Can you describe my (new, original) sorting algorithm mathematically?

Colibri · September 12, 2009, 6:01pm

[Moderating]

As a matter of policy, we generally don’t delete or edit posts merely because they contain errors. Go ahead and post any corrections you wish to make in this thread.

Colibri
General Questions Moderator

D_Parker · September 17, 2009, 10:08am

My analysis and time testing setups are seperate.

Analysis of Snakesort is pretty easy.
It is a natural merge sort (nothing more than small lists merged together to get larger lists, until one sorted list exists). The only real difference between Snakesort and a natural mergesort, is the first round of merges. As the program takes small lists from the “random” file Ellis deals with both ascending and descending lists. However, after the first merge loop all merges will be done on ascending data, so accomodating descending data is a waste of time and effort. I modified the program to handle the descending lists only on the first loop, and saw a 15% improvement. Actually, I sorted the small descending lists before the merging starts, and eliminated all other handling of descending lists. However, there aren’t many other improvements that I see can be made. Once the lists are all ascending, you have a natural mergesort, and unless you improve the merge process itself, you’re not going to improve this much. As far as a worst case, look for the worst cases for merges and you will find this programs weak spot. Merging doesn’t have many worst case scenarios though, that I’m aware of.

The problem came during my setup for speed testing. Somehow, I managed to copy my sorted lists into the random files used for testing Snakesort. At first, I didn’t believe the speed of the sort, myself (merges aren’t that fast). However, I checked “Most” everything, and thought the setup was correct. The testing showed Snakesort was faster than Quicksort. However, this only occurs while sorting an already SORTED list. This is because the sort is not doing any real work, yet. The program simply exits, after defining the lists from the “random” data, if a single list is recieved, This is what occurs when a sorted list is fed to the program. The program is not so efficient, however, when truely random data, or even nearly sorted data is fed to it. It performs at about 175% slower than quicksort, on random files and closer to 200% slower for nearly sorted data. I tested the program with files ranging in size, of from 100,000 to 50,000,000 Longs. I rechecked all of my testing setup, including the files being fed to the program (though I did not verify every number in the large files, I did look to make sure they were not sorted this time, unless intended).

I’m truely sorry for misleading anyone. That was not my intention. I was trying to tell everyone what I believed was a program worth further investigation. I would be happy to provide actual test data, including input files “VERY LARGE”, and test results, if anyone is interested.

D_Parker · September 17, 2009, 12:22pm

My analysis shows that Snakesort is a Stable algorithm, as are most other merge sorts, including natural merge sort.

D_Parker · September 17, 2009, 12:30pm

Let me correct myself. It is only stable when the input file is not reverse sorted. Snakesort simply reverses the file, in this situation, and the manner in which it is reversed is unstable.

Chronos · September 17, 2009, 7:33pm

Stability shouldn’t matter too much, though, since it’s straightforward to stabilize an otherwise unstable algorithm. Just append another field to each element and populate it with the rank of that item in the original order, then use that new field as least-significant in your comparisons.

D_Parker · September 22, 2009, 12:50am

Chronos,

That might be practical for shorter lists, but becomes impractical, from a memory standpoint, for very long lists (10-50 million items in length). There is also the matter of overhead, to the algorithm of maintaining and checking, this extra value for stability, that should have been there in the first place.

Stabilization “should” be planned into a really practical algorithm at the design phase, if the algorithm needs to be stable. Some algorithms are easier to stabilize than others. And, some become VERY popular because their speed, or some other feature, usually programmer knowledge, outweighs the stability factor (Quicksort is a good example of speed chosen in place of mergesort, or some other sorts stability).

Most people, a lot of fairly good programmers included, don’t even know what “Stable” means. Most of the time application developers have a sort application to rely on, which is always stable. The problem comes in when the sort application provided is not used by the developer for their application. The application developer becomes an algorithm user, rather than an application user. They should know what the algorithms do, and how, but often do not. They put quicksort, or another unstable algorithm, into places that a stable algorithm belongs, and the end user has to deal with the fact that the application makes their data look bad, or the fact that they cannot get their data into the order they wish. Or, worse, as the application database grows, the application slows to a crawl, because an inefficient sort algorithm was used. All developers “should” have a basic knowledge of sorts, and which to use in a particular application.

The sort algorithm developer should determine, while designing the algorithm, whether or not it will be stable. Though, some algorithms come to mind without stability attached. I have a couple of algorithms that I just cannot seem to find a practical way to stabilize, and they may be publisized with them being unstable. I try, very hard, to plan stability into all my sort algorithms, because I think that feature is most important. Stability is the most important feature of any sort algorithm intended for use in a sort application. I have stable sort algorithms that are faster than quicksort, so I don’t have to give up speed for stability. Someday I may make my sort algorithms public knowledge.

However, with all that said (sorry for the lengthy reply), and with the memory so available today, even on small machines, this is one way to make an algorithm stable and could be used for fairly long lists. I’d recommend using a mergesort, or ternary quicksort, or any other stable sort, if stability is required. I think your method would negate any speed offered by an unstable algorithm.

Darrold

Chronos · September 22, 2009, 4:15am

Really? Sort algorithms are pretty much Chapter 1 of any course on algorithms, a course which really ought to be required for any computer degree. I mean, I can understand if someone can’t remember how to implement a heapsort, say, but folks should at least know that all good sort algorithms are n log(n), and what it means for an algorithm to be stable or unstable.
(self-nitpick: Yes, I do know that there are some applications where, say, insertion sort is better than any n log(n) method, but those are pretty specialized, and anyone with common sense would recognize them right away anyway.)

EllisDee · September 22, 2009, 7:17am

I generally don’t care about stability when sorting, but if I do I usually use insertion sort, though I’ll switch to merge sort if I’m dealing with more than a thousand or so items. I dislike merge sort because of the code sprawl. Snake sort is far too much code and too esoteric to be practical, IMO. Similar to how smooth sort is too much code and too complex to be practical. Then again, my personal favorite algorithm is comb sort simply because it’s the easiest one to adapt for sorting udt arrays, which cannot be sorted using a generic function. Each (type of) udt array needs its very own sorting function.

Is that really what they teach? Because all comparison sorts suck balls compared to radix sort, which shatters n log(n) run times.

Indistinguishable · September 22, 2009, 6:08pm

Well, it’s what they teach for the problem of doing a general comparison sort*; if you’re tackling the problem of sorting data which you can manipulate/inspect in ways other than comparison, then, yeah, you may want to use another, more efficient method applicable in that case.

*: The full details, for anyone who doesn’t know: if you can only obtain up to some fixed number of bits of data in a single step, then of course it will take up to log(n!) [= Theta(n log(n))] many steps to even discern the entire ordering of the input (which, if you can only act upon the input by performing a particular rearrangement of it, will be necessary for sorting)

ftg · September 22, 2009, 8:42pm

Ex CS professor here. I’ve taught sorting and such to thousands of students. The ability to retain such knowledge for a typical college student these days is near zero. I couldn’t get most students to recall on Wednesday something I specifically told them to remember on Monday. Maybe 1 in 100 could remember the concept or purpose of a stable sort 5 years later.

(I have some quite well known ex-students who have publically posted methods to do sorting that start with step 1: Hash the keys. Something I specifically teach won’t work and why. Ulp.)

Ellis Dee: Radix sort is only fast on short keys. Virtually all textbooks ignore this which is a real shame. I break down the advantages and disadvantages of each style for my students. (Including using optimizations that the texts don’t cover.) My rule of thumb is that radix starts to lose on keys longer than log[sup]2[/sup] n. But like all things sorting, it is affected by the nature of the keys, the number of records vs. memory and a thousand other factors.

There is no such thing as a one universal, one-size-fits-all, super-duper sorting algorithm.

Indistinguishable · September 23, 2009, 1:22am

I don’t even understand; why would anyone want to hash the keys en route to sorting them? What potential benefit could one have, even misguidedly, hoped to have gained by this?

ultrafilter · September 23, 2009, 2:02am

I think you have to be sufficiently confused to even consider it.

ftg · September 23, 2009, 1:06pm

That’s the point I was making. Their step 2 would be “then iterate thru the hash table to get the values in sorted order.” Like I said: Ulp.

And these were (and are) considered top students. Full professors now. One was mentioned in a Slashdot article the other day.

If these were good students who can’t remember the basics of sorting, imagine what the rest are like.

Chronos · September 23, 2009, 6:04pm

I’m a TA, and I can certainly agree that there are some abysmally bad students out there, but most of the bad students in physics are non-physics majors, who view the physics class as an irrelevant distraction from whatever it is they want to do for a living. So yes, I am surprised that people actually going into computer-related fields would be so brain-dead on the subject.

To be absolutely fair to the hash-sorters, though, one application for sorting is as the first step in removing duplicate entries from a list. If that’s the reason why you’re sorting in the first place, then it might make sense to hash first, especially if your comparisons are relatively expensive.

beergeek279 · September 23, 2009, 9:29pm

Well, I can assure you that the CS majors of the world want to get out there and write programs, and equally disdain the theoretical end of the field. I could see why a programmer who will never build their own compiler would disdain a formal language course (“What does A -> Aa | λ have to do with writing software?”), but even a cursory understanding of algorithm analysis could be a vital tool in an individual’s programming toolbox.

D_Parker · September 24, 2009, 1:31pm

I could use some assistance. I need a radix sort. Preferably one setup for numbers, but I can convert one for strings if that’s all you have. I need one for testing purposes.
Thanks

EllisDee · September 25, 2009, 2:36am

Here’s an implementation for strings:

D_Parker · September 28, 2009, 7:53am

Thank You, Ellis.

Topic		Replies	Views
Two questions about the Merge Sort Algorithm Factual Questions	25	1657	July 22, 2005
A sorting puzzle for geeky types Miscellaneous and Personal Stuff I Must Share	21	1593	July 15, 2007
On factoring really big numbers Factual Questions	24	1212	January 27, 2005
Ordering a list of variables making only direct comparisons - minimum needed? Factual Questions	23	1724	February 2, 2016
Efficient sort algorithm for a human being Factual Questions	38	3600	December 12, 2012

Can you describe my (new, original) sorting algorithm mathematically?

Related topics