CS62 - Spring 2010 - Lecture 15

  • what is a binary search tree?
       - binary tree where everything in the left subtree < this.data() <= everything in the right subtree
       - how does this help us?
          - when searching for an item, we know which subtree to search in
       - what is the best and worst case running time for search?
          - worse: O(n) when we have a twig
          - best: O(log n) when we have a full tree
       - write a method that counts the number of time a particular element occurs in the search tree (assuming we can have duplicates)
          public int occurCount(E item){
             if( isEmpty() ){
                return false;
          else if( data.equals(item) ){
             return 1 + right.occurCount(item);
             if( item.compareTo(data) < 0 ){
                   return left.occurCount(item);
                }else {
                   return right.occurCount(item);

  • red-black trees
       - what is the problem with traditional binary search trees?
          - most of the operations run in time O(h)
          - depending on how the tree is constructed/elements inserted, we can get an unbalanced tree where O(h) is on the order of O(n)
       - red-black trees are a type of "balanced" tree where we make sure that the tree maintains roughly balanced, keeping the h = O(log n)
   - a binary search tree with additional constraints
          - a binary search tree
          - each node is also labeled with a color, red or black
          - all empty nodes are black (i.e. children of leaves)
          - the root is always black (this isn't technically required, but doesn't hurt us and makes our life easier)
          - all red nodes have two black children
          - for a given node, the number of black nodes on any path from that node to any leaf is the same
       - how does this guarantee us our height?
          - what is the shortest possible path from the root to any leaf?
         - all black nodes
          - what is the longest possible path from the root to any leaf?
             - alternating red and black nodes (since a red node has to have two black children)
       - what is the biggest difference between the longest and shortest path?
          - since all paths must have the same number of black nodes, the longest path can be at most twice as long
          - the tree can be no more than an order of 2 imbalanced, which will still guarantee us O(log n) height, since 2 is just a constant multiplier
       - insertion into a red-black tree
          - we insert as normal into the binary tree at a leaf
          - we color the node inserted as red
          - then we need to fix up the tree to make it maintain the constraints
             - like delete for normal BSTs, there are a number of cases, with some more complicated than others
             - beyond the scope of this class, but they utilize "rotations" of the tree to alter the structure
       - rotations:
          - basic idea is to rotate the child up into the parent position and then give the child on the side of the rotation to the old parent
          - left-rotation
             - x with left subtree alpha and right subtree y with left subtree beta and right subtree gamma
             - becomes: y with right subtree gamma and left subtree x with left subtree alpha and right subtree beta
       - right rotation is in the opposite direction
       - how might this help us?
          - insert: 1, 2, 3 into the tree
          - inserting 1 and 2 is fine
          - after inserting 3, we have a twig
          - if we rotate left, it looks more like a balanced tree
       - look at demo: http://www.ece.uc.edu/~franco/C321/html/RedBlack/rb.orig.html
  • data structures with a purpose
       - as I've mentioned before, there is no one single best data structure
       - data structures help us speed up certain operations
       - what was the purpose of binary search trees?
          - speed up searching for items when we have a dynamically changing set
             - balanced BSTs have O(log n) search, insert and delete

  • priority queues
       - what did queues allow us to do efficiently?
          - keep track of a sequential ordering of items
          - add to the back and remove from the front in constant time
       - Queues work well for operations when everything is equal, but this is often not the case
       - A priority queue is a queue where order is determined by an associated priority
          - items with the lowest priority exit the queue before items with a larger priority
       - look at PriorityQueue interface in PriorityQueue code
          - very simple interface (like queue)
          - we can add elements
          - the only way we can remove elements is via the extractMin method, which removes the smallest elements from the set
       - when/where might priority queues be useful? common in scheduling tasks:
          - process scheduling
             - there are many processes running on your computer at any given time
             - each application you run has one or more processes associated with it
             - the operating system has many processes associated with it
             - why do we need priorities associated with processes?
                - some process are just more important than others
                - enforce fairness (we can adjust priorities of those processes that aren't getting much processor time)
             - the "top" command (on macs and linux machines) shows you the processes and their priorities (on windows this information is in the task manager, type ctrl+alt+del, select task manager and then select the processes tab)
                - shows a variety of information on the machine about the number of processes, cpu usage, memory usage, etc.
                - also shows each individual process and the cpu usage and the priority
                - typing 'q' exits top
          - network traffic scheduling
             - different information floating around the net may have higher priority than others
             - what might be some examples?
                - real-time/streamed data has higher priority over things like e-mail, etc.
                - certain customers might have higher priority
                - P2P protocol traffic (like bittorrent) often has lower priority

  • implementing a priority queue
       - what would be possible approaches?
          - use an ArrayList (or similar expandable linear structure)
             - look at SimpleArrayListPriorityQueue class in PriorityQueue code
             - what are the runtimes of:
                - insert? O(1) (amortized)
                - extractMin? O(n)
             - can we do better?
                - insert: O(n)
                - extractMin: O(1)
          - Another approach: use a binary tree :)

  • heaps
       - a heap is a binary tree where:
          - the value of a parent is less than or equal to the value of it's children
          - common additional resriction: the tree is complete
             - recall: a complete tree is a full binary tree except the leaves are filled in from left to right
       - draw a binary heap:
          - [16], [8, 10], [3, 7, 9, 5] [2, 4, 1]
       - A few other observations about binary heaps...
          - the smallest value in a heap is the root node
          - like binary trees, all nodes in a heap are themselves heaps
          - level does NOT indicate size

  • representing a heap
       - we could store the heap using references as we have with other binary trees
       - we can also store it using an array (or ArrayList) by leveraging the fact that it is a complete tree
          left(i) = 2i + 1
          right(i) = 2i + 2
          parent(i) = floor((i-1)/2)
       - for example, the tree above would be:
          [16, 14, 10, 8, 7, 9, 3, 2, 4, 1]
          [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
       - what is the left child of 10?
          - index 2*2+1 = 5, or the value 9
       - what is the parent of value 2?
          - index (7-1)/2 = 3, or the value 8
       - what are the advantages of array-based heaps/binary trees?
          - memory efficiency
       - can we do this with all binary trees?
          - yes
       - why don't we?
          - unless the tree is full or complete, there can be a large mount of wasted space