CS62 - Spring 2010 - Lecture 17

heap of trouble?

announcements
   - office hours will start late ~11 on Wed.
   - class participation if you've seen the material before...
   - cs lunch on Thursday at noon in Frank West

tail recursion
   - a recursive method is tail recursive if the last thing it does is call itself recursively
   - which of the following are tail recursive?

   public static int factorial(int n){
      if( n <= 1 ){
         return 1;
      }else{
         return n * factorial(n-1);
      }
   }

   public static int mystery1(int num){
      if( num%2 == 0 ){
         return num;
      }else{
         return mystery1(num/2);
      }
   }

   public static int pow1(int n, int i){
      if (i == 0){
         return 1;
      }else{
         return n * pow1(n, i-1);
      }
   }

   public static int pow2(int n, int i, int acc){
      if (i == 0){
         return acc;
      }else{
         return pow2(n, i-1, n * acc);
      }
   }

   - what about the heapify method in ArrayListPriorityQueue?
   - why do we care about tail recursion?
      - let's look at the call for pow1(10)
         - what would the call stack look like when we get to the base case?
            - we'll have a stack frame for each of the 10, 9, 8, ..., 0 calls to pow(10)
            - why do we need these?
               - when we finally get to the base case, we can return the value, pop that frame off of the call-stack and then use the returned value in the next frame on the stack
               - eventually, we'll pop all of the call frames off the stack and we will return our answer
      - let's look at the call for pow2(10)
         - what would the call stack look like when we get to the base case?
            - same as above
         - do we need the call-stack in this case? What happens when we get to the base case?
            - once we get to the base case, we actually have our answer and we just need to return the result
      - for tail recursive methods, the last thing a method does is make the recursive call, therefore, we don't need to keep track of the actual call-stack since when we're done, we can just return our answer
      - what is the benefit of this?
         - less memory usage
            - for example, if we called pow1 with a large enough i (e.g. pow1(1, 1000000)) we will get a stack overflow exception, however, in theory, we could avoid this since the stack isn't necessary (Java doesn't optimize for tail recursion, though :( )
         - actually faster, since we don't have to deal with creating, pushing and poping stack frames
      - tail recursion is interesting for two reasons:
         - tail recursive methods are easy to convert into iterative methods
         - some compilers will optimize for tail recursion

   - converting to tail recursion
      - often, we can convert a method to a tail recursive method with a little bit of work
      - a common approach is what we had to do above for the pow1 method: introduce an additional accumulator variable that you pass along to keep track of the running result. When you hit the base case, you just return the result.
      - write a tail recursive version of the factorial method

search
   - look at the SimpleMap interface in Hashtables code
      - basic set of operations to keep track of a set
         - add things to the set (via put)
         - check if something exists in the set (via containsKey)
         - remove things from the set
      - how quickly can we implement this using a:
         - ArrayList
            - version 1: append to the end
               - put: O(1)
               - contains: O(n)
            - version 2: keep in sorted order
               - put: O(n)
               - contains: O(log n)
         - BST (balanced, e.g. Red-Black tree)
            - put: O(log n)
            - contains: O(log n)
      - can we do better?
         - what if we knew that the keys were in a certain range, e.g. between 0 and 1000?
            - make an array of booleans from 0 to 1000
            - the put method would simply switch that entry to true (remove to false)
            - contains would just return whether that entry was true or not
            - running time?
               - put: O(1)
               - contains: O(1)
            - what is the problem with this type of approach?
               - not very memory efficient. What if we just have 10 things in our set?
               - sometimes infeasible
                  - what if we want to keep track of the last names?
                     - last census: 88,799 last names
                     - how big of an array would we need, let's say assuming they're all < 10 characters long?
                        - 26^10 = a really big number
                     - even if we could store that, we'd be wasting a bunch of space

universe of keys
- we have some universe of keys (often called U) that we want to store, be it numbers, strings, objects, etc.
- the problem with using an array-based approach is that the array has to be the size of the universe of keys

hash functions
   - a hash function is a function that maps the universe of keys to a restricted range, call it m, where m << |U|, that is m is much smaller than the universe of keys
   - how does this help us?
      - now we don't have to have an array of size |U|, just have to have an array of size m
      - a hashtable is a data structure that uses an array of some sort to store the items. Using a hash function, any item is mapped to the array.
      - to find if an item exists in the hash table, we hash the item and see if it exists in the table at the specified entry
   - what can happen if m < |U|?
      - we can have two things map to the same position in the array even though they're not equivalent, that is h(x) == h(y) even though !x.equals(y)
      - this is called a "collision"
      - a good hash function will try to avoid them but if m < |U|, they are inevitable
         - why?
            - pigeonhole principle: if n items are put into m pigeonholes with n > m, then at least on pigeonhole must contain more than one item
            - simple idea, but often useful for proving things

hashCode
   - every object in java has a method called hashCode that returns an attempt at a unique integer for that object
   - how does this happen?
      - it's another method (like equals and toString) that is inherited from the Object class
   - the hashCode method for Object is based on the objects location in memory and does a fairly good job of providing unique numbers, however...
   - if you plan on using Maps or hashtables with an Object, you should consider overriding the hashCode method
   - the two requirements of the hashCode method are (http://java.sun.com/j2se/1.4.2/docs/api/java/lang/Object.html#hashCode()):
      - "Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified."
      - "If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result."
   - A number of the common classes (like String, Integer, etc) do have overridden hashCode methods
   - how would you write a hashCode method for String?
      - via ASCII we can easily get a number between 0 and 255 for each character. Now what?
      - could just use the first letter?
         - meets the requirement, but not a very good hash function since we'll get clumps
      - add them up?
         - a little bit better, but still not great (see Figure 15.7 from Bailey)
      - treat it just like a big 255-based number:
         \sum_{i = 0}^{l-1} s[i]c^i

         where c = 256
         - see Figure 15.9 from Bailey
   - how did Bailey generate these tables?

collision resolution by chaining
   - ideas for solving this problem of collisions?
   - a common approach is to allow multiple items to occupy a given entry in our array. How?
      - rather than just having an the item stored at the entry, store a linked list
   - put: if two items hash to the same location in the array, just add them to the linked list
   - contains: do a search of all of the entries at that entry to see if the item being search for is there
   - walk through an example
   - show ChainedHashtable class in Hashtables code
      - hashCodes are integers
      - our table has a fixed length
      - how do we remedy this?
         - % table.length (look at getEntry method)
   - what is the run-time of the put and containsKey methods?