CS201 - Spring 2014 - Class 25

  • exercises

  • binary search tree height
       - most methods on a binary search tree are bounded by its height
       - what is the worse case height?
          - O(n) the twig
          - when does this happen?
             - insert elements in sorted or reverse sorted order
       - what is the best case height?
          - O(log_2 n)
          - when it's a complete tree

       - Randomized BST: the expected height of a randomly built binary search tree is O(log n), i.e. a tree where the values inserted are randomly selected
          - this is only useful if we know before hand all of the data we'll be inserting
          - does this give you an idea for a sorting algorithm?
             - randomly insert the data into a binary search tree
             - in-order traversal of the tree
             - running time
                - best-case: O(n log n)
                - worst-case: O(n^2) - we could still get unlucky
                - average-case: O(n log n)

  • balanced trees
       - even randomized trees still don't give us guaranteed best-case O(log n) height on the tree
       - however, there are approaches that can guarantee this by making sure the tree doesn't become too "unbalanced"
          - AVL trees
          - red-black tress
          - B-trees (used in databases and for "on-disk" trees)

  • red-black trees
       - a binary search tree with additional constraints
          - a binary search tree
          - each node is also labeled with a color, red or black
          - the root is always black (this isn't technically required, but doesn't hurt us and makes our life easier)
          - all red nodes have two children that are colored black
          - for a given node, the number of black nodes on any path from that node to any leaf is the same
       
       - how does this guarantee us our height?
          - what is the shortest possible path from the root to any leaf?
             - all black nodes
          - what is the longest possible path from the root to any leaf?
             - alternating red and black nodes (since a red node has to have two black children)
          - what is the biggest difference between the longest and shortest path?
             - since all paths must have the same number of black nodes, the longest path can be at most twice as long
             - the tree can be no more than an order of 2 imbalanced, which will still guarantee us O(log n) height, since 2 is just a constant multiplier

       - insertion into a red-black tree
          - we insert as normal into the binary tree at a leaf
          - we color the node inserted as red
          - then we need to fix up the tree to make it maintain the constraints
          - like delete for normal BSTs, there are a number of cases, with some more complicated than others
          - beyond the scope of this class, but they utilize "rotations" of the tree to alter the structure

       - rotations:
          - basic idea is to rotate the child up into the parent position and then give the child on the side of the rotation to the old parent
          - left-rotation
             - x with left subtree alpha and right subtree y with left subtree beta and right subtree gamma
             - becomes: y with right subtree gamma and left subtree x with left subtree alpha and right subtree beta
          - right rotation is in the opposite direction
          - how might this help us?
          - insert: 1, 2, 3 into the tree
             - inserting 1 and 2 is fine
             - after inserting 3, we have a twig
             - if we rotate left, it looks more like a balanced tree
    - look at demo: http://www.cs.usfca.edu/~galles/visualization/RedBlack.html

  • n-ary trees

  • data structures with a purpose
       - as I've mentioned before, there is no one single best data structure
       - data structures help us speed up certain operations
       - what was the purpose of binary search trees?
          - speed up searching for items when we have a dynamically changing set
             - balanced BSTs have O(log n) search, insert and delete

  • priority queues
       - what did queues allow us to do efficiently?
          - keep track of a sequential ordering of items
          - add to the back and remove from the front in constant time
       - Queues work well for operations when everything is equal, but this is often not the case
       - A priority queue is a queue where order is determined by an associated priority
          - items with the lowest priority exit the queue before items with a larger priority
       - look at PriorityQueue interface in PriorityQueue code
          - very simple interface (like queue)
          - we can add elements
          - the only way we can remove elements is via the extractMin method, which removes the smallest elements from the set
       - when/where might priority queues be useful? common in scheduling tasks:
          - process scheduling
             - there are many processes running on your computer at any given time
             - each application you run has one or more processes associated with it
             - the operating system has many processes associated with it
             - why do we need priorities associated with processes?
                - some process are just more important than others
                - enforce fairness (we can adjust priorities of those processes that aren't getting much processor time)
             - the "top" command (on macs and linux machines) shows you the processes and their priorities (on windows this information is in the task manager, type ctrl+alt+del, select task manager and then select the processes tab)
                - shows a variety of information on the machine about the number of processes, cpu usage, memory usage, etc.
                - also shows each individual process and the cpu usage and the priority
                - typing 'q' exits top
          - network traffic scheduling
             - different information floating around the net may have higher priority than others
             - what might be some examples?
                - real-time/streamed data has higher priority over things like e-mail, etc.
                - certain customers might have higher priority
                - P2P protocol traffic (like bittorrent) often has lower priority

  • implementing a priority queue
       - what would be possible approaches?
          - use an ArrayList (or similar expandable linear structure)
          - two options:
             1) add at the back of the array
                - add: O(1)
                - extractMin: O(n)
             2) keep in sorted order with highest priority at the back
                - add: O(n)
                - extractMin: O(1)

          - look at SimpleArrayListPriorityQueue class in PriorityQueue code
          
  • restricting generic types
       - If we declare a generic type variable (e.g. <E>) this can be instantiated with ANY class
       - There are situations where we need to restrict the type of things that can be instantiated in the class variable
          - most often when you need to require that the class have certain attributes, e.g.
             - implement a particular interface
             - extend a particular class
          - you can add restrictions to the the type variable
          - For example: <E extends Comparable<E>>
             - defines a type parameter E
             - the classes that can be used to instantiate this type parameter must implement "Comparable<E>"
             - in the code, we can then assume that anything of type E has the compareTo method!