Quicksort is not my favourite sorting algorithm.: misc

figg in misc_tech

Quicksort is not my favourite sorting algorithm.

Sep 15, 2009 10:36

We've been trying to hire some more programmers at work, and being able to sort the wheat from the chaff is a useful thing. Normally we ask a handful of basic java questions (abstract vs interface) and writing a compareTo method.

Asking ridiculously easy questions is a good thing, as it was surprising to me how often people get them wrong. Given a chance, I'd love to ask harder questions, like:

What's wrong with Quicksort?

Quicksort is often the first sorting algorithm that comes to mind. It's easy to define and generally accepted to be 'fast enough'. Unfortunately, most people aren't aware of the drawbacks of using quicksort:

It's an unstable sort - I would like my sort algorithm to preserve the existing order within my lists.

It has a worst case runtime of O(n2) - it may run quickly for some data, but it is quite easy to generate nasty lists.

(I know that apache had a denial of service attack as it used an algorithm with quadratic behaviour to remove '..' from URLs)

A large problem with quicksort is that it's easy to implement badly, and hard to implement well.

Implementing quicksort well is described in 'Engineering a Sort Function' by McIllroy and Bentley. They describe how to pick a pivot element and how to partition and swap efficiently. And the trick of defaulting to insertion sort for small lists. This describes the current implementation of quicksort in java and the c standard libraries.

Generating malicious input is described by McIllroy in 'A Killer Adversary for Quicksort'. The trick is to write a comparison function that constructs the worst input on the fly.

So what would I like to see people using instead? Merge sort and Radix sort would be my preferred candidates.

Merge sort is stable and has a worst case runtime of O(nlogn). It is also very easy to write, and if you're sneaky you can make it run closer to O(n) on mostly sorted data. The trick is to partition the list into ascending and descending sequences, and then merge these.

This trick is used in a few places. Notably in python's default sorting method 'timsort'. Tim peters has written an excellent explanation of the code, and a few clever optimizations for speeding things up in an imperative version.

Adaptive merge sort can be written in functional languages too. 'Runsort' outlines the same trick in prolog, and a handful of tricks to make it faster. (I managed to rattle off a version of this in haskell)

Radix sort on the other hand shines when sorting lists of strings (i.e. when comparisons become relatively expensive), usually outperforming quicksort by a factor of two. A good introduction to this is in Engineering Radix Sort' by McIllroy (again!) et al.

They describe a variant of Radix Sort - 'American Flag Sort' that tries to minimize space overhead by permuting the list in-place.

Admittedly - quicksort is not 'the great satan of sorting functions', but knowing when to use it involves knowing the weaknesses. Even so, Naive quicksort is still the easiest sort function to write on a whiteboard.

(And the recent two-pivot quicksort variant is still a damn cool bit of work.)