**Lecture 39 - Sorting**

Our next sorting algorithm proceeds as follows:

- First, our base case: If the array contains 0 or 1 elements, there is nothing to do. It is already sorted.
- If the array has two or more elements in it, we will break it in half, sort the two halves, and then go through and merge the elements.

The Java method to do it:

public void sort(int[] array) { // create tempArray for use in merging int[] tempArray = new int[array.length]; mergeSort(array, 0, array.length-1, tempArray); } /* * PRE: left and right are valid indexes of array. * tempArray.length == array.length * POST: Sorts array from left to right. */ public void mergeSort(int[] array, int left, int right, int[] tempArray) { if (left < right) { int middle = (right + left) / 2; mergeSort(array, left, middle, tempArray); mergeSort(array, middle + 1, right, tempArray); merge(array, left, middle, right, tempArray); } }

The method `merge` takes the sorted elements in `array[left..middle]` and `array[middle+1..right]` and merges then
together using the array `tempArray`, and then copies them back
into `array`.

/** PRE: left <= middle <= right and left,middle,right are valid * indices for array * tempArray.length == array.length * array[left..middle] and array[middle+1..right] must both be sorted. * POST: Merges the two halves (array[left..middle] and * array[middle+1..right]) together, and array[left..right] is then * sorted. */ private void merge(int []array, int left, int middle, int right, int[] tempArray) { int indexLeft = left; int indexRight = middle + 1; int target = left; // Copy both pieces into tempArray. for (int i = left; i <= right; i++) { tempArray[i] = array[i]; } // Merge them together back in array while there are // elements left in both halves. while (indexLeft <= middle && indexRight <= right) { if (tempArray[indexLeft] < tempArray[indexRight]) { array[target] = tempArray[indexLeft]; indexLeft++; } else { array[target] = tempArray[indexRight]; indexRight++; } target++; } // Move any remaining elements from the left half. while (indexLeft <= middle) { array[target] = tempArray[indexLeft]; indexLeft++; target++; } // Move any remaining elements from the right half. while (indexRight <= right) { array[target] = tempArray[indexRight]; indexRight++; target++; } }

Again we'd like to count the number of comparisons necessary in order
to sort an array of *n* elements. Notice that all the comparisons
happen in the merge method. If we are trying to merge two sorted
lists, every time we compare two elements from the lists we will put
one in its correct position. When we run out of the elements in one
of the lists, we put the remaining elements into the last slots of the
sorted list. As a result, merging two lists which have a total of *n*
elements requires at most *n-1* comparisons.

Suppose we start with a list of n elements. Let *T(n)* be a function
telling us the number of comparisons necessary to mergesort an array
with *n* elements. As we noted above, we break the list in half,
mergesort each half, and then merge the two pieces. Thus the total
amount of comparisons needed are the number of comparisons to
mergesort each half plus the number of comparisons necessary to merge
the two halves. By the remarks above, the number of comparisons to do
the final merge is no more than *n-1*. Thus *T(n) <= T(n/2) + T(n/2)
+ n-1*. For simplicity we'll replace the *n-1* comparisons for the
merging by the even larger *n* in order to make it easier to see how
to approximate this result. We have *T(n) = 2 ·T(n/2) + n* and
if we find a function that satisfies that equation, then we have an
upper bound on the number of comparisons made during a mergesort.

Looking at our algorithm, no comparisons are necessary when the size
of the array is 0 or 1. Thus T(0) = T(1) = 0. Let us see if we can
solve this for small values of *n*. Because we are constantly
dividing the number of elements in half it will be most convenient to
start with values of *n* which are a power of two. Here we list a
table of values:

n | T(n) |
---|---|

1 = 2^{0} | 0 |

2 = 2^{1} | 2*T(1)+2 = 2 = 2*1 |

4 = 2^{2} | 2*T(2)+4 = 8 = 4*2 |

8 = 2^{3} | 2*T(4)+8 = 24 = 8*3 |

16 = 2^{4} | 2*T(8)+16 = 64 = 16*4 |

32 = 2^{5} | 2*T(16)+32 = 160 = 32*5 |

... | ... |

n = 2^{k} | 2*T(n/2)+n = n*k |

Notice that if *n = 2 ^{k}* then

This explains why, when we run the algorithms, the time for mergesort is almost insignificant compared to that for selection sort. Below are some results for these two sorting algorithms along with the results for the searching algorithms we looked at earlier this week.

Search/num elts | 10 | 100 | 1000 | 1,000,000 | |

Linear search | n/2 | 5 | 50 | 500 | 500,000 |

Binary search | log _{2} n | 4 | 7 | 10 | 20 |

Selection sort | (n ^{2} - n)/(2) | 45 | 4950 | 499.500 | 499,999,500,000 |

Merge sort | n ·log _{2} n | 40 | 700 | 10,000 | 20,000,000 |

Also, notice that while binary search is more efficient than linear search, especially for large arrays, sorting is much more expensive than linear search. Therefore, it only makes sense to sort an array if we will actually be doing a reasonable number of searches.

Demo.