CS62 - Spring 2010 - Lecture 17

  • heap of trouble?

  • announcements
       - office hours will start late ~11 on Wed.
       - class participation if you've seen the material before...
       - cs lunch on Thursday at noon in Frank West

  • tail recursion
       - a recursive method is tail recursive if the last thing it does is call itself recursively
       - which of the following are tail recursive?

       public static int factorial(int n){
          if( n <= 1 ){
             return 1;
          }else{
             return n * factorial(n-1);
          }
       }

       public static int mystery1(int num){
          if( num%2 == 0 ){
             return num;
          }else{
             return mystery1(num/2);
          }
       }

       public static int pow1(int n, int i){
          if (i == 0){
             return 1;
          }else{
             return n * pow1(n, i-1);
          }
       }

       public static int pow2(int n, int i, int acc){
          if (i == 0){
             return acc;
          }else{
             return pow2(n, i-1, n * acc);
          }
       }

       - what about the heapify method in ArrayListPriorityQueue?
       - why do we care about tail recursion?
          - let's look at the call for pow1(10)
             - what would the call stack look like when we get to the base case?
                - we'll have a stack frame for each of the 10, 9, 8, ..., 0 calls to pow(10)
                - why do we need these?
                   - when we finally get to the base case, we can return the value, pop that frame off of the call-stack and then use the returned value in the next frame on the stack
                   - eventually, we'll pop all of the call frames off the stack and we will return our answer
          - let's look at the call for pow2(10)
             - what would the call stack look like when we get to the base case?
                - same as above
             - do we need the call-stack in this case? What happens when we get to the base case?
                - once we get to the base case, we actually have our answer and we just need to return the result
          - for tail recursive methods, the last thing a method does is make the recursive call, therefore, we don't need to keep track of the actual call-stack since when we're done, we can just return our answer
          - what is the benefit of this?
             - less memory usage
                - for example, if we called pow1 with a large enough i (e.g. pow1(1, 1000000)) we will get a stack overflow exception, however, in theory, we could avoid this since the stack isn't necessary (Java doesn't optimize for tail recursion, though :( )
             - actually faster, since we don't have to deal with creating, pushing and poping stack frames
          - tail recursion is interesting for two reasons:
             - tail recursive methods are easy to convert into iterative methods
             - some compilers will optimize for tail recursion
          
       - converting to tail recursion
          - often, we can convert a method to a tail recursive method with a little bit of work
          - a common approach is what we had to do above for the pow1 method: introduce an additional accumulator variable that you pass along to keep track of the running result. When you hit the base case, you just return the result.
          - write a tail recursive version of the factorial method

  • search
       - look at the SimpleMap interface in Hashtables code
          - basic set of operations to keep track of a set
             - add things to the set (via put)
             - check if something exists in the set (via containsKey)
             - remove things from the set
          - how quickly can we implement this using a:
             - ArrayList
                - version 1: append to the end
                   - put: O(1)
                   - contains: O(n)
                - version 2: keep in sorted order
                   - put: O(n)
                   - contains: O(log n)
             - BST (balanced, e.g. Red-Black tree)
                - put: O(log n)
                - contains: O(log n)
          - can we do better?
             - what if we knew that the keys were in a certain range, e.g. between 0 and 1000?
                - make an array of booleans from 0 to 1000
                - the put method would simply switch that entry to true (remove to false)
                - contains would just return whether that entry was true or not
                - running time?
                   - put: O(1)
                   - contains: O(1)
                - what is the problem with this type of approach?
                   - not very memory efficient. What if we just have 10 things in our set?
                   - sometimes infeasible
                      - what if we want to keep track of the last names?
                         - last census: 88,799 last names
                         - how big of an array would we need, let's say assuming they're all < 10 characters long?
                            - 26^10 = a really big number
                         - even if we could store that, we'd be wasting a bunch of space

  • universe of keys
       - we have some universe of keys (often called U) that we want to store, be it numbers, strings, objects, etc.
       - the problem with using an array-based approach is that the array has to be the size of the universe of keys

  • hash functions
       - a hash function is a function that maps the universe of keys to a restricted range, call it m, where m << |U|, that is m is much smaller than the universe of keys
       - how does this help us?
          - now we don't have to have an array of size |U|, just have to have an array of size m
          - a hashtable is a data structure that uses an array of some sort to store the items. Using a hash function, any item is mapped to the array.
          - to find if an item exists in the hash table, we hash the item and see if it exists in the table at the specified entry
       - what can happen if m < |U|?
          - we can have two things map to the same position in the array even though they're not equivalent, that is h(x) == h(y) even though !x.equals(y)
          - this is called a "collision"
          - a good hash function will try to avoid them but if m < |U|, they are inevitable
             - why?
                - pigeonhole principle: if n items are put into m pigeonholes with n > m, then at least on pigeonhole must contain more than one item
                - simple idea, but often useful for proving things

  • hashCode
       - every object in java has a method called hashCode that returns an attempt at a unique integer for that object
       - how does this happen?
          - it's another method (like equals and toString) that is inherited from the Object class
       - the hashCode method for Object is based on the objects location in memory and does a fairly good job of providing unique numbers, however...
       - if you plan on using Maps or hashtables with an Object, you should consider overriding the hashCode method
       - the two requirements of the hashCode method are (http://java.sun.com/j2se/1.4.2/docs/api/java/lang/Object.html#hashCode()):
          - "Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified."
          - "If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result."
       - A number of the common classes (like String, Integer, etc) do have overridden hashCode methods
       - how would you write a hashCode method for String?
          - via ASCII we can easily get a number between 0 and 255 for each character. Now what?
          - could just use the first letter?
             - meets the requirement, but not a very good hash function since we'll get clumps
          - add them up?
             - a little bit better, but still not great (see Figure 15.7 from Bailey)
          - treat it just like a big 255-based number:
             \sum_{i = 0}^{l-1} s[i]c^i
          
             where c = 256
             - see Figure 15.9 from Bailey
       - how did Bailey generate these tables?

  • collision resolution by chaining
       - ideas for solving this problem of collisions?
       - a common approach is to allow multiple items to occupy a given entry in our array. How?
          - rather than just having an the item stored at the entry, store a linked list
       - put: if two items hash to the same location in the array, just add them to the linked list
       - contains: do a search of all of the entries at that entry to see if the item being search for is there
       - walk through an example
       - show ChainedHashtable class in Hashtables code
          - hashCodes are integers
          - our table has a fixed length
          - how do we remedy this?
             - % table.length (look at getEntry method)
       - what is the run-time of the put and containsKey methods?