CS201 - Spring 2014 - Class 28

  • exercise

  • admin
       - office hours will start late ~10:30 on Tue.

  • quick recap of data structures
       - No free lunch: there is no single best data structure
          - different data structures (and different implementations of those structures) are good at different things
       - What are the following structures good for/at?
          - Arrays/ArrayLists
             - storing sequential data
             - O(1) access to elements at particular indices
             - ArrayLists allow for amortized O(1) adds
          - Linked lists
             - add/remove from beginning/end in O(1)
             - can delete from the middle if we have a reference to the node
          - Binary search trees
             - If balanced, O(log n):
                - add
                - delete
                - search
          - Heaps
             - good for min/max requests
             - O(log n) add and extractMin/Max
       - We've also looked at some meta-data structures that help facilitate certain operations
          - Stacks/Queues
          - Trees

  • sets
       - we'd like to be able to support set-like data structures
          - objects can be added
          - and we can ask if an object belongs to a set
       - look at the Set interface in Hashtables code
       - can we support this type of thing with anything we have so far?
          - binary search trees do this in O(log n)
       - can we do better?
          - are there additional things we can do with binary search trees that we don't need?
             - there is still an ordering
                - things like successor can be done quickly
                - can print out the data in order

  • set applications?

  • universe of keys
       - we have some universe of keys (often called U) that we want to store, be it numbers, strings, objects, etc.
          - for example:
             - all Middlebury ID numbers
             - all social security number
             - all last names
             - all names of people (first and last together)
             - all tweets from today
             - ...
       - if you know the min and max key, any approach?
          - store them in an array
       - Any problems?
          - for any given run, we don't need to store ALL keys, just a subset
          - the array has to be at least the size of the universe of keys (all possible keys!)
          - lost of wasted space

  • hash functions
       - a hash function is a function that maps the universe of keys to a restricted range, call it m, where m << |U|, that is m is much smaller than the universe of keys
       - how does this help us?
          - now we don't have to have an array of size |U|, just have to have an array of size m
          - a hashtable is a data structure that uses an array of some sort to store the items. Using a hash function, any item is mapped to the array.
          - to find if an item exists in the hash table, we hash the item and see if it exists in the table at the specified entry
       - what can happen if m < |U|?
          - we can have two things map to the same position in the array even though they're not equivalent, that is h(x) == h(y) even though !x.equals(y)
          - this is called a "collision"
          - a good hash function will try to avoid them but if m < |U|, they are inevitable
             - why?
                - pigeonhole principle: if n items are put into m pigeonholes with n > m, then at least on pigeonhole must contain more than one item
                - simple idea, but often useful for proving things

  • hashCode
       - every object in java has a method called hashCode that returns an attempt at a unique integer for that object
       - how does this happen?
          - it's another method (like equals and toString) that is inherited from the Object class
       - the hashCode method for Object is based on the objects location in memory and does a fairly good job of providing unique numbers, however...
       - if you plan on using Maps or hashtables with an Object, you should consider overriding the hashCode method
       - the two requirements of the hashCode method are ( http://docs.oracle.com/javase/7/docs/api/java/lang/Object.html#hashCode() ):
          - "Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified."
          - "If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result."
       - A number of the common classes (like String, Integer, etc) do have overridden hashCode methods