I've been benchmarking various sorting algorithms against the built-in sort property and tango.core.Array.sort. Benchmark output follows:

Using arrays of lengths [15,1000,100000,5000000] containing numbers in the range -1000-1000.
Note that odd lengths will be increased by one for the 'median-of-3 killer' sequences used to test quicksort. This means that if a length is odd, one of the arrays will actually have a length one greater than the rest.
Generating 16 arrays................ done.
Each of the 5 algorithms will run 5 times over 4 arrays of each length, for a total of 80 passes for each algorithm, for a total of 400 passes globally.
Heap sort.... .... .... .... ....
15: 0 0 0 | 0 0 0 | 0 0 0 | 0 0 0
1000: 1 1 1 | 1 1 1 | 1 1 1 | 1 1 1
100000: 277 272 285 | 241 232 259 | 232 227 238 | 255 246 269
5000000: 38341 38303 38393 | 17230 17203 17245 | 15873 15853 15891 | 19273 19230 19372
Built-in sort (UNFAIR: DOESN'T CALL FUNCTION FOR COMPARISON).... .... .... .... ....
15: 0 0 0 | 0 0 0 | 0 0 0 | 0 0 0
1000: 1 1 1 | 1 1 1 | 0 0 0 | 1 1 1
100000: 255 251 268 | 160 159 161 | 154 154 155 | 347 333 358
5000000: 16492 16439 16552 | 11571 11526 11666 | 11280 11224 11376 | 27067 26990 27180
Tango sort.... .... .... .... ....
15: 0 0 0 | 0 0 0 | 0 0 0 | XXX
1000: 1 1 1 | 20 20 20 | 24 24 25 | XXX
100000: 194 179 233 | 6372 6347 6384 | 6761 6719 6800 | XXX
5000000: 11201 9917 12065 | 330943 330626 331187 | 351592 351474 351658 | XXX
Introspective sort.... .... .... .... ....
15: 0 0 0 | 0 0 0 | 0 0 0 | 0 0 0
1000: 1 1 1 | 0 0 0 | 0 0 0 | 1 1 1
100000: 190 181 209 | 81 72 115 | 84 70 113 | 244 197 294
5000000: 10088 9916 10251 | 6261 6044 6572 | 5937 5625 6313 | 16955 16393 17550
Merge sort.... .... .... .... ....
15: 0 0 0 | 0 0 0 | 0 0 0 | 0 0 0
1000: 1 1 1 | 2 2 2 | 0 0 0 | 0 0 0
100000: 256 245 281 | 158 151 172 | 113 106 128 | 133 132 136
5000000: 18696 18442 18771 | 9050 9014 9087 | 9348 9331 9358 | 11141 11028 11423

The numbers on the left represent array lengths. For each length, there are four sets of data. In each set, the first number is the average, the second the minimum, and the third the maximum time taken, measured using StopWatch?.

The first set of data is for randomized data in the shown range, generated with Random.
The second set is for a reverse-sorted array.
The third set is for a sorted array.
The fourth set are "median-of-3 killer" sequences. Tango's sort actually crashes due to stack overflow for the bigger arrays here, and hence I've manually corrected the data above to say "XXX".

For information on introspective sort and the "median-of-3 killers" see this well-written paper.

I suggest that Tango switch to introsort for its core sorting algorithm, as it has the best general-case performance, being on par with the current sort with the randomized data and besting it by a large margin with the other data sets. Note that though merge sort is faster for the median-of-3 killers, it uses O(n) extra storage space.

I also suggest that, to improve performance in the general case, a templated version of sort which does not take a predicate be written. In my benchmarking I've noticed that run times can even halve when "cmp(a, b)" is rewritten to "a < b". It would be simple to, for instance, have sort take as a template argument a boolean which dictates whether to sort in ascending or descending order. (Note that Tango's current sort as well as introsort outperform the built-in sort, even though they prevent the compiler from inlining comparisons. Allowing inlining would give them a far greater advantage.)

For now I'm only attaching the benchmark code, which includes all of the sort algorithms mentioned above, as well as some others. If asked, I can write versions that are more ready to go in Tango, templated as mentioned above.