CS150 - Fall 2011 - Class 23

  • exercises

  • administrative
       - final exam (let me know by the end of this week if you'd like to switch)   

  • "For me, great algorithms are the poetry of computation. Just like verse, they can be terse, allusive, dense and even mysterious. But once unlocked, they cast a brilliant new light on some aspect of computing.'' -- Francis Sullivan

  • What is an algorithm?
       - way for solving a problem
       - method of steps to accomplish a task

  • Examples
       - sort a list of numbers
       - find a route from one place to another (cars, packet routing, phone routing, ...)
       - find the longest common substring between two strings
       - add two numbers
       - microchip wiring/design (VLSI)
       - solving sudoku
       - cryptography
       - compression (file, audio, video)
       - spell checking
       - pagerank
       - classify a web page
       - ...

  • Main parts to algorithm analysis
       - developing algorithms that work
       - making them faster
       - analyzing/understanding the efficiency/run-time

  • What questions might we want to ask/measure about an algorithm?
       - how long does it take to run? How efficient is it?
       - how much memory does it use?
       - does it finish?
       - is it right?
       - how hard is it to code?

  • Sorting
       Input: A list of numbers nums
       Output: The list of numbers in sorted order, i.e. nums[i] <= nums[j] for all i < j

       - cards
          - sort cards: all cards in view
          - sort cards: only view one card at a time
       - many different ways to sort a list

  • Selection sort
       - high-level:
          - starting from the beginning of the list and working to the back, find the smallest element in the remaining list
             - in the first position, put the smallest item in the list
             - in the second position, put the next smallest item in the list
             - ...
          - to find the smallest item in the remaining list, simply traverse it, keeping track of the smallest value

       - look at selection_sort in sort.py code
          - How many operations does the algorithm take? How long will it take? How efficient is it?
             - what counts as an operation?
                - Different operations take different amounts of time. Even from run to run, things such as caching, etc. will complicate things
             - will depend on the input
          - We want a tool to allow us to talk about and compare different algorithms while hiding the details that don't matter
          - asymptotic analysis
             - Key idea: how does the run-time grow as we increase the input size?
                - in our case, as we sort more numbers, roughly how will the run-time increase
                - for example, if we double the number of numbers we're sorting, what will happen to the run-time?
                   - unchanged?
                   - double?
                   - triple?
                   - quadruple?
             - Big-O notation: an upper bound on the run-time
                - Gives us the big picture, without worrying about details
                - Given a function/method how will it grow? linearly? quadratically?
                - Examples:
                   - n^2 is O(n^2)
                   - n^2 + n + 200 is O(n^2)
                   - 5n + 10 is O(n)
                   - ...

             - runtimes table
                - this gives us groups of methods/functions that behave similarly

          - What is the running time of selection sort?
             - We'll use the variable n to describe the length of the array/input
             - How many times do we go through the for loop in selectionSort?
                - n times
             - Each time through the for loop in selectionSort, we call indexOfSmallest. How many times do we go through the for loop in indexOfSmallest?
                
                - end_index - start_index + 1
                - first time, n-1, second, n-2, third, n-3 ...
                - O(n)
             - what is the overall cost for selectionSort?
                - we go through the for loop n times
                - each time we go through the for loop we incur a cost of roughly n
                - O(n^2)
       
  • Insertion sort
       - high-level: starting from the beginning of the list and working towards the end, keep the list items we've seen so far in sorted order. For each new item, traverse the list of things sorted already and insert it in the correct place.
       - look at insertion_sort function in sorting.py code
          - what is the running time?
             - How many times do we iterate through the while loop?
                - in the best case: no times
                   - when does this happen?
                   - what is the running time? linear, O(n)
                - in the worst case: j - 1 times
                   - when does this happen?
                   - what is the running time?
                      - \sum_{j=1}^n-1 j = ((n-1)n)/2
                      - O(n^2)
                - average case: (j-1)/2 times
                   - O(n^2)

  • Merge sort
       - high-level
          - if you had two lists that were already sorted, how could you create a large list that was sorted and contained all of the elements in both lists?
             - walk through them at the same time keeping an index in both of them
             - compare the current values in the two lists and copy the smaller one over into the final list and move the index up on that list
          - how can we use this method to sort numbers?
             - think of a list as starting out as a bunch of sorted lists with just one item in them
             - "merge" adjacent numbers in the list
             - this gives you a bunch of sorted lists with two items in it
             - merge these sorted lists of two into sorted lists with 4 things
             - ...
          - what is the runtime?
             - look at the layers
             - each layer processes n items
             - how many layers are there?
                - each time we split the data in half
                - 2^i = n
                - log(n) levels
             - O( n log n )
       - look at merge_sort function in sorting.py code
          - we can do the above idea recursively:
             - recursive case:
                - split the list in two
                - call merge_sort on the two halves
                - "assume" that the recursive calls work and sort the two halves, then merge these two halves
             - base case:
                - lists with 0 or 1 items in them are already sorted

  • Timing
       - we can analyze the times of the different algorithms as we increase the size of the list to be sorted
       - run compare_sorting function in sorting.py code
          - we see that both insertion sort and selection sort are much slower than merge_sort
             - as predicted the growth is quadratic
             - insertion sort is a littler faster than selection sort
       - run plot_merge_sort function in sorting.py code
          - merge_sort is fast, but it's not linear... it's O(n log n)