QuickSort help, deriving an iterative from recursive

I’m working on a project for algorithms, we aren’t graded on our code, just our report of the test results. Just covering my arse against the ‘no homework help’ policy.

I wrote QuickSort method using recursion that worked perfectly in testing because I was only feeding it small sets of data from the keyboard. When I attempt to run it on the test data, between 125000 and 2000000 elements, I run out of memory because it has to recurse so much. I’m having trouble breaking the recursive version back down into an iterative one.

Googling the hell out of this has yielded only course notes on recursive vs iterative algorithms in general. Any help in going from my current recursive work or pointing me at some code/psuedocode for an iterative quick sort would be much appreciated.

Code follows:



std::vector <unsigned int> DQuickSort(std::vector <unsigned int> A, int first, int last)
{	
	
	//pivot value
	int pivot;

	//if its not a single element array
	if (first < last)
	{
		//split the array
		DSplit(A, first, last, pivot);
		//sort each half
		A = DQuickSort(A, first, pivot - 1);
		A = DQuickSort(A, pivot + 1, last);
	}

	return A;
}

void DSplit(std::vector <unsigned int> &A, int first, int last, int &pivot)
{
	//temp value and a comparison value
	unsigned int temp = 0;
	unsigned int x = A[first];

	pivot = first;

	//step through the array
	for (int i = first + 1; i <= last; i++)
	{
		OpCount++;

		if (A* < x)
		{
			//move the pivot
			pivot++;
			//swap A* with A[pivot]
			temp = A*;
			A* = A[pivot];
			A[pivot] = temp;
		}
	}
	
	//swap the first and the pivot
	temp = A[first];
	A[first] = A[pivot];
	A[pivot] = temp;
	//reset the comparision value
	//x = A[first];
}


It’s too late at night for my brain to work on code, but you’re not running out of memory. You’re running out of stack space. I’m fairly sure that there’s a java runtime flag that would let you adjust your stack to a size large enough to accomodate your program.

-lv

[QUOTE=NevarMore]
Googling the hell out of this has yielded only course notes on recursive vs iterative algorithms in general. Any help in going from my current recursive work or pointing me at some code/psuedocode for an iterative quick sort would be much appreciated.
QUOTE]

I found this…

http://www.mathcs.carleton.edu/courses/course_resources/cs227_w96/wightmaj/quickdef.html

…supposedly contains an iterative and recursive version…

here’s a link describing it.

http://www.mathcs.carleton.edu/courses/course_resources/cs227_w96/wightmaj/algory.html

Don’t copy too much.

p.s. I searched on google “code sample iterative recursive” hit number 18, ymmv

Yeah, I thought of that too, but wouldn’t it be more fun to make 'im code the iterative version?

I meant, EDUCATIONAL!

p.s. hijack, as a charter member, I believe I should be allowed to post every 30 seconds!

Yep I found that exact site, however the source code seems to be clipped. Searching through the file only finds the declaration and a call to QuickSortI but it is not defined in that file. Ill have to read it tomorrow when I’m less cracked out.

I will be checking your search terms. I was using something a bit different to search.

This is an interesting problem.

If the data sets you are trying to sort are random, then you should require only about 20-25 levels of recursion (21 if it behaves perfectly), since a properly written quicksort on a random data set, should, on average, recursively call itself on half-size data sets at each level.

Quicksort has the problem that if the wrong pivot is chosen, you can end up with the partitioning into subproblems being entirely lopsided. Thats where you end up with goofy runtime and stack space usage. You might end up having a number of calls, in the worst case, proportional to the size of your array. This is, of course, bad.

In other words, if your pivot selection is the worst, then you’ll break the problem down so that you have to make as many recursive function calls as elements, and you’ll use up all the space reserved for storing function call information. Blow your stack.

There are ways to counteract this. You can try to choose a pivot value amongst a fixed sized number of the values in your array. You can do some sort of averaging. None of them work 100% of the time.

Here’s the kicker:

You can’t solve this fundamental problem (as far as I know) by making it iterative.

Because of the way that quicksort breaks down its subproblems into partitions that may not be of equal size (its good if they are, but you can’t guarantee it), then you need to store where these partitions occur at each level. That is, if the first partition happens at n/2+5 (5 to the right of the middle on a set of size n), and then at the next layer, the first n/2+5 elements is partitioned at n/4-3 and the other half at 3n/4-2, you need all these pivot indices to properly perform the pivoting.

This is implicit in the recursive variation. It’s explicit in the iterative version. You store, in some sort of separate array or stack that you control the various partition indices.

So you’re really not saving anything. Well, yes, it’ll be faster, most likely, if you’re really careful, but it won’t, in the worst case, use less space, except by a constant factor.

Of course, it’s better to use lots of space on the heap (creating a new array to explicitly store your partitioning points) than trying to do it all by way of recursion on the stack, since the heap is huge, and the stack is relatively small. So there’s probably a modest speed improvement, maybe a space improvement, and you don’t blow your stack in the worst case.

Is it worth it?

Better to find a smarter way to choose your pivot to help control how the recursion splits the subproblems. Why? Because this will help you in both the recursive and iterative cases.

If you choose a smart pivot, the recursion won’t blow your stack. Or at least it’ll be much much harder to find a dataset that will blow your stack. So long as you stay below 100 levels of recursion, you’re probably just fine, which is more elements than you can store in an array on your computer provided that the partitioning is breaking the problem down into balanced subproblems.

And then, you won’t need an iterative solution.

Except to geek out, like I would, and try to do it anyway.

By searching for ‘iterative quicksort’ in google, I came up with this:

http://ei.cs.vt.edu/~cs1704/spring.98/10Sorts.pdf

I should mention: The worst pivot to choose, if you choose a single element, is either the greatest or least element. If this is always done, you end up with the extreme recursion.

If you could choose the ‘middle’ element always, you’d have a perfect quicksort, as your partition would always be in the middle. But you can’t do this without having them already sorted.

You can try taking the average of all the values (but then your pivot isn’t one of the elements, minor tweak to the algorithm). Or you can try taking the middle of 3 values (say, the beginning, middle, and end of the current partition being sorted).

These help. They dont’ guarantee. Recursive or iterative.

Sigh. One last post. On reading some of the googled stuff, I note that you’d probably use a lot less space doing the iterative version if you are careful about what partitioning bounds you store, even in the worst case.

Neat what you learn when you hunt around.

Still, choosing a better pivot’s a lot less painful to start out with.

Sorry if I’ve rambled, but it is late, and you probably knew all this.

I hadn’t considered the pivot choice. I don’t think we had discussed it in class or addressed it in any assignments. I’ll change that and see what happens.

Normally you just choose a specific item, say the first item, in the partition to be sorted.

No specific item, especially in a random array, is any better than any other. So at best you’re trying to fight the unlikely circumstance of having a ‘degenerate case’ where almost every choice of pivot is a bad one.

Just to put it another way.

=)

An interesting note is that if the partition is sorted, or almost sorted, the first or last element is almost certainly a bad one. Better to choose the one in the middle. But even better to choose the middle of 3 elements, or an average, or something else that takes more into account than just a single random element.

Glad I could be of some help.

(An interesting note is that there is a variation of quicksort that I learned about recently, and actually _implemented) that guarantees O(n log n) worst case, rather than typical quicksort which is O(n log n) average, and O(n ^ 2) worst case.

(For nitpickers, I don’t know where the omega and theta signs are, so don’t give me grief on the notation).

There’s a dirt-simple trick you can use that’ll very likely get you a good pivot value: simply choose three (or five) elements at random and take their average. There’s still a chance you’ll get a bad pivot value, but it’s not a large chance.

There’s also an algorithm to calculate the median of unsorted data in linear time. It’s described in Cormen et al., 2e (and possibly 1e). This won’t interfere with the asymptotic performance of quicksort, but it does add real time. It might be worth checking out.

Holy effing profanity profanity!!

All I had to do was add this:


std::vector <unsigned int> DQEntry(std::vector <unsigned int> A)
{
	DQuickSort(A, 0, A.size() - 1);
	return A;
}

And change my DQuickSort function to handle A by reference rather than by return. Damned thing was passing the entire vector around each time. No wonder it chunked all over the place.

Egad, egad, egad.

Computers are not recursive. All recursive code is compiled into iterative code by the compiler (using stacks if needed).

It is a trivial proof in Computer Science that shows how to convert any recursive program into an iterative one. Note that almost all of the standard “has to be done recursively” examples given in low level CS classes can in fact be done faster and without a stack iteratively.

In the particular case of so-called Quicksort, the method to limit the stack size to log n is quite trivial: Push the smaller range on first, then the larger range. The larger range will be immediately popped at the top of the loop. This is an extremely well known factoid and I am shocked that people Googling on iterative Quicksort haven’t seen this.

Yeah, okay, I already admitted I goofed.

Sigh.

And now that I think about it, while I realize all recursive algorithms can be made iterative by inventing whatever stacks you need, there’s still an important thing to note:

You can’t improve the worst-case performance, and thus the running time, of quicksort by making it iterative, which was something I was trying to address as a culprit in the original problem.

I still feel like a schmuck about the iterative running space, though. :smiley: So I am suitably humbled.

But anyway, it’s moot. Question’s been beaten to death.