CS062, Lecture 8

Searching

Searching and sorting are important operations and also important example for use of complexity analysis.

Code for all searches is on-line in Sort program example href="SortingDemo.java">SortingDemo.java Linear search is O(n), while binary search i O(log n) comparisons.

Concrete comparison of worst cases: # of comparisons:

Search\# elts 10 100 1000 1,000,000
linear 10 100 1000 1,000,000
binary 8 14 20 40

Recursion & Induction

RECURSIVE ALGORITHMS & Proofs of Correctness

Many algorithms can be designed with recursive algorithms. Once you are used to them, they can be easier to understand (& prove correct) than iterative algorithms:

protected void recSelSort(int lastIndex, 
                          Comparable[] elts) {
    if (lastIndex > 0)   // more than 1 element to sort
    {
        int extreme = 0;    // index of element w/ largest value 
                    
        // Find "extreme", index of elts w/ largest value. 
        for (int searchIndex = 1; searchIndex <= lastIndex; searchIndex++) 
        {
            if (elts[extreme].lessThan(elts[searchIndex])) 
                extreme = searchIndex;
        }
       // elt at "extreme" <= elts[index..lastIndex] 

        // swap largest elt (at extreme) w/ one at lastIndex
        Comparable tempElt = elts[extreme]; 
        elts[extreme] = elts[lastIndex];
        elts[lastIndex] = tempElt;
        // elts[lastIndex] now largest in   
        //  elts[0..lastIndex]
            
        recSelSort(lastIndex-1,elts);
       // elts[0..lastIndex] are sorted.
    } 
How can prove correct?

Mathematical Induction:

A. Prove base case(s). (Usually this is trivial)

B. Show that if algorithm works correctly for all simpler input, then will work for current input.

Reason by induction on size of array (i.e. on lastIndex)

Base: If lastIndex <= 0 then at most one element in elts, and hence sorted - correct.

Induction: Suppose works if lastIndex < n. show it works if last = n (> 0)

(I.e. believe recursive call works, try to show for entire algorithm given call works!)

Loop finds largest element and then swaps with elts[lastIndex].

Thus elts[lastIndex] holds largest elt of list, while others held in elts[0..lastIndex-1]

Since lastIndex- 1 < lastIndex, know (by induction hypothesis) that
recSelSort(lastIndex-1,elts) sorts elts[0..lastIndex-1].

Because elts[0..lastIndex-1] in order and elts[lastIndex] is >= all of them, elts[0..lastIndex] is sorted.

Claim: recSelSort(n-1,elts) (i.e, on n elements) involves n*(n-1)/2 comparisons of elements of array.

Base: n = 0 or 1, 0 comparisons and n*(n-1)/2 = 0.

Induction hypothesis:
Suppose recSelSort(k-1,elts) takes k*(k-1)/2 comparisons for all k < n. Show true for n as well!

Look at algorithm: Run algorithm on recSelSort(n-1,elts). Therefore, last = n-1.

Go through for loop "last" = n-1 times, w/ one comparison each time through.

Thus n-1 comparisons. Only other comparisons are in recursive call:
recSelSort(last-1,elts) where last = n-1.

But by induction (since last < n), this takes last*(last-1)/2 = (n-1)*(n-2)/2 comparisons.

Therefore altogether have (n-1) + (n-1)*(n-2)/2 = (n-1)*2/2 + (n-1)*(n-2)/2
= (n-1)*(2 + n-2)/2 = (n-1)*n/2 = n(n-1)/2 comparisons.

Finished proof.

Therefore RecSelSort takes O(n2) comparisons.

Note: Iterative version of SelectionSort is similar, but needs an extra for loop. See on-line code in Sort.

recursive insertion sort

protected void recInsSort(int last, Comparable[] elts){
    if (last > 0){
        recInsSort(last-1, elts); // Sort elts[0..last-1]
            
        int posn= last-1;   // index where last shd be inserted

        // Search for first elt (from rear) <=  elts[last]
        while (posn >= 0 && elts[last].lessThan(elts[posn]))
            posn--;

        posn++; // insert elts[index] at posn

        // move elts[posn .. last-1] to put in elts[last] 
        Comparable tempElt = elts[last];
        for (int moveIndex = last-1; moveIndex >= posn; moveIndex--)
            elts[moveIndex+1] = elts[moveIndex];

      // insert element into proper posn
        elts[posn] = tempElt;
      // now elts[0..last] are in order
    }
}

Correctness:

straightforward: By induction on size of last parameter.

Complexity

Complexity is O(n2) again because can show recInsSort(n-1,elts) takes <= n*(n-1)/2 comparisons by induction. (I.e., show for n = 1. Then assume induction hypothesis for k < n, and show for n.)

Because while loop may quit early, insertion sort only uses half as many comparisons (on average) than selection sort. Thus usually twice as fast (but same "O" complexity).

Merge sort

Divide and conquer sort:

  /** 
    POST -- elementArray is sorted into non-decreasing order  
  **/
    public void sort(Comparable[] elementArray)
    {
        recMergeSort(0,elementArray.length -1,elementArray);
    }

    /**
        pre: first, last are legal indices of elementArray
        post:  elementArray[firstIndex..last] is sorted in 
                    non-decreasing order
    **/
    protected void recMergeSort(int first, int last, Comparable[] elementArray)
    {
        int middle = (first+last)/2;    // middle index of array
        
        if (last - first > 0)        // more than 1 elt
        {
                // Sort first half of list  
            recMergeSort(first,middleIndex,elementArray);   
                // Sort second half of list
            recMergeSort(middleIndex+1,last,elementArray);  
                // Merge two halves
            mergeRuns(first,middleIndex,last,elementArray);
        }
    }

    
Easy to show recMergeSort is correct if mergeRuns is.

Method mergeRuns is where all the work takes place. Note that you can't merge the halves in place. Hence we need an auxiliary array to copy into.

Note that mergeRuns is not recursive!

/** 
    PRE -- sortArray[first..middle] and 
        sortArray[middle+1..last] are sorted,
        and each range is non-empty.
    POST -- sortArray[first..last] is sorted
**/                             
    protected void mergeRuns (int first, int middle, int last, 
                                            Comparable[] sortArray)
    {
        int elementCount = last-first+1;    // # elts in array
            // temp array to hold elts to be merged
        Comparable[] tempArray = new Comparable[elementCount];      

        // copy elts of sortArray into tempArray in preparation 
        // for merging 
        for (int index=0; index < elementCount; index++)
            tempArray[index] = sortArray[first+index];
        
        
        int outIndex = first;   // posn written to in outArray 
        int run1 = 0;               // index of first, second runs
        int run2 = middle-first+1; 
        int endRun1 = middle-first; // end of 1st, 2nd runs
        int endRun2 = last-first;   

        // merge runs until one of them is exhausted 
        while (run1 <= endRun1 && run2 <= endRun2) 
        {
            if (tempArray[run1].lessThan(tempArray[run2]))  
            {   // if elt from run1 is smaller add it to sortArray
                sortArray[outIndex] = tempArray[run1];      
                run1++;
            }
            else        
            {               // add elt from run2 to sortArray
                sortArray[outIndex] = tempArray[run2];      
                run2++;
            }
            outIndex++;
        }  // while

        // Out of elts from one run, but other may have elts
        // add remaining elements from run1 if any left 
        while (run1 <= endRun1) 
        {
            sortArray[outIndex] = tempArray[run1];
            outIndex++;
            run1++;
        }

        // add remaining elements from run2 if any left 
        while (run2 <= endRun2) 
        {
            sortArray[outIndex] = tempArray[run2];
            outIndex++;
            run2++;
        }
}

It is easy to convince yourself that mergeRuns is correct. (A formal proof of correctness of iterative algorithms is actually harder than for recursive programs!)

It is also easy to see that if the portion of the array under consideration has k elements (i.e., k = last-first+1), then the complexity of mergeRuns is O(k):

If only look at comparisons then clear that every comparison (i.e., call to lessThan) in the if statement in the while loop results in an element being copied into sortArray.

In the worst case, you run out of all elts in one run when there is only 1 element left in the other run: k-1 comparisons, giving O(k)

If count copies of elements, then also O(k) since k copies in copying sortArray into tempArray, and then k more copies in putting elts back (in order) into sortArray.

Can use this to determine the complexity of recMergeSort.
Claim complexity is O(n log n) for sort of n elements.

Easiest to prove this if n = 2m for some m.

Prove by induction on m that sort of n = 2m elements takes <= n log n = 2m * m compares.

Base case: m=0, so n = 1. Don't do anything, so 0 compares <= 20 * 0.

Suppose true for m-1 and show for m.

recMergeSort of n = 2m elements proceeds by doing recMergeSort of two lists of size n / 2 = 2m-1, and then call of mergeRuns on list of size n = 2m.

Therefore,

#(compares) <=  2m-1 * (m-1) + 2m-1 * (m-1) + 2m
            = 2*(2m-1 * (m-1)) + 2m
            = 2m * (m-1) + 2m
            = 2m * ((m-1) + 1)
            = 2m * m
Therefore #(compares) <= 2m * m = n log n

End of proof.

Thus if n = 2m for some m, #(compares) <= n log n to do recursiveMergeSort

It is not hard to show that a similar bound holds for n not a power of 2.

Therefore O(n log n) compares. Same for number of copies.

Can cut down number of copies significantly if merge back and forth between two arrays rather than copy and then merge.