CS62 - Spring 2010 - Lecture 15

  • administrative
       - get back into the habit of doing the assigned practice problems
       - we will be starting C++ in 2 weeks
          - There is an optional book on the course web page. It is not required, but I'd recommend it. It's not at the bookstore, but you should be able to find in online at your favorite online bookstore.
       - Note that there is required reading for lab on Wednesday. It shouldn't take more than 15 minutes or so, but it will make the lab much more valuable if you've read the material beforehand.

  • what is a binary search tree?
       - binary tree where everything in the left subtree < this.data() <= everything in the right subtree
       - how does this help us?
          - when searching for an item, we know which subtree to search in
       - what is the best and worst case running time for search?
          - worse: O(n) when we have a twig
          - best: O(log n) when we have a full tree
       - write a method that counts the number of time a particular element occurs in the search tree (assuming we can have duplicates)
          public int occurCount(E item){
             if( isEmpty() ){
                return false;
             }
          else if( data.equals(item) ){
             return 1 + right.occurCount(item);
          }else{
             if( item.compareTo(data) < 0 ){
                   return left.occurCount(item);
                }else {
                   return right.occurCount(item);
                }
             }
          }

  • red-black trees
       - what is the problem with traditional binary search trees?
          - most of the operations run in time O(h)
          - depending on how the tree is constructed/elements inserted, we can get an unbalanced tree where O(h) is on the order of O(n)
       - red-black trees are a type of "balanced" tree where we make sure that the tree maintains roughly balanced, keeping the h = O(log n)
   - a binary search tree with additional constraints
          - a binary search tree
          - each node is also labeled with a color, red or black
          - all empty nodes are black (i.e. children of leaves)
          - the root is always black (this isn't technically required, but doesn't hurt us and makes our life easier)
          - all red nodes have two black children
          - for a given node, the number of black nodes on any path from that node to any leaf is the same
       - how does this guarantee us our height?
          - what is the shortest possible path from the root to any leaf?
         - all black nodes
          - what is the longest possible path from the root to any leaf?
             - alternating red and black nodes (since a red node has to have two black children)
       - what is the biggest difference between the longest and shortest path?
          - since all paths must have the same number of black nodes, the longest path can be at most twice as long
          - the tree can be no more than an order of 2 imbalanced, which will still guarantee us O(log n) height, since 2 is just a constant multiplier
       - insertion into a red-black tree
          - we insert as normal into the binary tree at a leaf
          - we color the node inserted as red
          - then we need to fix up the tree to make it maintain the constraints
             - like delete for normal BSTs, there are a number of cases, with some more complicated than others
             - beyond the scope of this class, but they utilize "rotations" of the tree to alter the structure
       - rotations:
          - basic idea is to rotate the child up into the parent position and then give the child on the side of the rotation to the old parent
          - left-rotation
             - x with left subtree alpha and right subtree y with left subtree beta and right subtree gamma
             - becomes: y with right subtree gamma and left subtree x with left subtree alpha and right subtree beta
       - right rotation is in the opposite direction
       - how might this help us?
          - insert: 1, 2, 3 into the tree
          - inserting 1 and 2 is fine
          - after inserting 3, we have a twig
          - if we rotate left, it looks more like a balanced tree
       - look at demo: http://www.ece.uc.edu/~franco/C321/html/RedBlack/rb.orig.html
       
  • n-ary trees

  • data structures with a purpose
       - as I've mentioned before, there is no one single best data structure
       - data structures help us speed up certain operations
       - what was the purpose of binary search trees?
          - speed up searching for items when we have a dynamically changing set
             - balanced BSTs have O(log n) search, insert and delete

  • priority queues
       - what did queues allow us to do efficiently?
          - keep track of a sequential ordering of items
          - add to the back and remove from the front in constant time
       - Queues work well for operations when everything is equal, but this is often not the case
       - A priority queue is a queue where order is determined by an associated priority
          - items with the lowest priority exit the queue before items with a larger priority
       - look at PriorityQueue interface in PriorityQueue code
          - very simple interface (like queue)
          - we can add elements
          - the only way we can remove elements is via the extractMin method, which removes the smallest elements from the set
       - when/where might priority queues be useful? common in scheduling tasks:
          - process scheduling
             - there are many processes running on your computer at any given time
             - each application you run has one or more processes associated with it
             - the operating system has many processes associated with it
             - why do we need priorities associated with processes?
                - some process are just more important than others
                - enforce fairness (we can adjust priorities of those processes that aren't getting much processor time)
             - the "top" command (on macs and linux machines) shows you the processes and their priorities (on windows this information is in the task manager, type ctrl+alt+del, select task manager and then select the processes tab)
                - shows a variety of information on the machine about the number of processes, cpu usage, memory usage, etc.
                - also shows each individual process and the cpu usage and the priority
                - typing 'q' exits top
          - network traffic scheduling
             - different information floating around the net may have higher priority than others
             - what might be some examples?
                - real-time/streamed data has higher priority over things like e-mail, etc.
                - certain customers might have higher priority
                - P2P protocol traffic (like bittorrent) often has lower priority

  • implementing a priority queue
       - what would be possible approaches?
          - use an ArrayList (or similar expandable linear structure)
             - look at SimpleArrayListPriorityQueue class in PriorityQueue code
             - what are the runtimes of:
                - insert? O(1) (amortized)
                - extractMin? O(n)
             - can we do better?
                - insert: O(n)
                - extractMin: O(1)
          - Another approach: use a binary tree :)

  • heaps
       - a heap is a binary tree where:
          - the value of a parent is less than or equal to the value of it's children
          - common additional resriction: the tree is complete
             - recall: a complete tree is a full binary tree except the leaves are filled in from left to right
       - draw a binary heap:
          - [16], [8, 10], [3, 7, 9, 5] [2, 4, 1]
       - A few other observations about binary heaps...
          - the smallest value in a heap is the root node
          - like binary trees, all nodes in a heap are themselves heaps
          - level does NOT indicate size

  • representing a heap
       - we could store the heap using references as we have with other binary trees
       - we can also store it using an array (or ArrayList) by leveraging the fact that it is a complete tree
          left(i) = 2i + 1
          right(i) = 2i + 2
          parent(i) = floor((i-1)/2)
       - for example, the tree above would be:
          [16, 14, 10, 8, 7, 9, 3, 2, 4, 1]
          [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
       - what is the left child of 10?
          - index 2*2+1 = 5, or the value 9
       - what is the parent of value 2?
          - index (7-1)/2 = 3, or the value 8
       - what are the advantages of array-based heaps/binary trees?
          - memory efficiency
       - can we do this with all binary trees?
          - yes
       - why don't we?
          - unless the tree is full or complete, there can be a large mount of wasted space