CS62 - Spring 2010 - Lecture 11

  • quiz: Problem 9.5
       - Write a method of a singly LinkedList called reverse, that reverses the order of the elements in the list. This method should be desctructive--it should modify the list upon which it acts.

  • administrative
       - read 10.6 for the lab
       - midterm
          - 3/4 (next week!)
          - will cover everything up through 2/25 (list applications), but will not cover binary trees
          - I'll post some sample questions sometime soon
          - lab 6 will be a review session for the midterm
       - next two assignments
          - assignment 5 will be due at midnight 3/5 (the friday after the midterm)
          - assignment 6 will be due at midnight 3/12
       - no class on 3/11
       - will have lab on 3/10

  • Linear structures
       - similar to lists in that there is a sequential nature to the data
       - unlike lists, though, we can only add and remove items, but cannot access items by index or iterate through the items
       - look at Linear interface in StacksAndQueues code
          - we can:
             - add items
             - remove an item (and return it)
             - take a look at the next item
             - see if we have items left
          - notice that we don't have any notion of an index

  • stacks
       - Last In First Out (LIFO)
       - two basic operations: push and pop
          - push adds another item on to the top of the stack
          - pop removes the item on the top of the stack
       - think about a stack of plates at a buffet. The last plate to be put on top will be the first plate to be removed.
       - draw a picture
          - stacks usually grow up
          - everything happens on the top
       - what are they used for?
          - run-time (or call) stack example
             - we can write a simple method sum and follow it through the call-stack using the debugger
             public static int sum(int n){
                String temp = "Num: " + n;
                System.out.println(temp);
          
                if( n <= 1 ){
                   return 1;
                }else{
                   return n + sum(n-1);
                }
             }
          - searching
          - parsing (linguistics, code)
       - look at the Stack interface in StacksAndQueues code
          - push
          - pop
          - peek (when we just want to see what's on top, but don't want to modify the stack)
          - empty
       - how could we implement this?
          - Linked list
             - always just manipulate the head of the beginning of the linked list
             - to push an item, just add it to the front
             - to remove an item, remove it from the front
             - look at LinkedStack in StacksAndQueues code
             - singly linked or doubly linked?
             - runtime of the different operations:
                - push: O(1)
                - pop: O(1)
                - peek: O(1)
                - empty: O(1)
          - ArrayList
             - where should the top of the stack go?
                - remember we'll be adding and deleting at the top of the stack
                - put the top of the stack at the END of the ArrayList
             - to push an item, add it to the end of the ArrayList
             - to remove an item, remove it from the end of the ArrayList
             - look at ArrayListQueue in StacksAndQueues code
             - runtime of the different operations:
                - push: O(1)
                - pop: O(1)
                - peek: O(1)
                - empty: O(1)
          - which is better?
             - ArrayList are "amortized" O(1) run-time, however, any individual push operation could be O(n)
             - Memory trade-off is less clear
                - ArrayList could have lots of "open" memory
                - LinkedList has an extra reference for each data item
       - java.util.Stack (http://java.sun.com/j2se/1.4.2/docs/api/java/util/Stack.html)

  • queues
       - if you're from the UK, you call a "line" (like waiting in line) a queue
       - First In First Out (FIFO)
       - two basic operations: enqueue and dequeue
          - enqueue adds another item on the the end of the queue
          - dequeue removes an item from the front of the queue
       - notice that like a stack the only way we manipulate the data is by adding and removing items, the difference is where we add and remove the items
       - draw a picture
       - what are they used for?
          - scheduling tasks
             - which process should run next
          - modeling real world phenomena (lines show up in lots of places)
          - searching
       - look at the Queue interface in StacksAndQueues code
          - enqueue
          - dequeue
          - peek (when we just want to see what's on top, but don't want to modify the stack)
          - empty
       - how could we implement this?
          - Linked list
             - where should we add and remove items?
                - doesn't actually matter, but we do need a doubly linked list either way
                - we'll add them at the back and remove them from the front
             - look at LinkedQueue in StacksAndQueues code
             - runtime: assuming we use a doubly linked list, O(1)
          - ArrayList based
             - where should we add and remove the items?
                - if we add them to the back, then the remove (dequeue) operation is going to be expensive
                - if we add them to the front, then the add (enqueue) operation is going to be expensive
             - we'll add at the back again and remove them from the front
             - look at LinkedQueue in StacksAndQueues code
             - runtime of different operations:
                - enqueue: O(1)
                - dequeue: O(n)
                - peek: O(1)
                - empty: O(1)
             - Even though both implement a the List interface, it makes a difference which underlying data structure we use... pick the right one!
          - Array based
             - what if we wanted to implement it using an array, with the additional knowledge that we know the largest capacity required?
             - keep track of where our head is and where our tail is and have the data wrap around at the end of the array
                - always add at the tail
                - always remove from the head
             - is there an easy way to calculate indices?
                - use modular arithmetic!
                - if we want to increment the head, for example:
                   - instead of: headIndex++
                   - headIndex = (headIndex + 1) % data.length
             - look at ArrayQueue in StacksAndQueues code
       - http://java.sun.com/javase/6/docs/api/java/util/Queue.html

  • look at the Linear interface in StacksAndQueues code
       - all of the implementations of stacks and queues we saw today implemented the Linear interface. Why? and how?
          - for stacks
             - add is push
             - remove is pop
          - for queues
             - add is enqueue
             - remove is dequeue
       - this will allow us to use a stack OR a queue in methods where we just need a generic add/remove/empty operation
       - see what the difference is between how they affect behaviors

  • search
       - basic search framework
          - we have places/items
          - given an item, we can find out what items are adjacent
       - what types of things might we search?
          - maps
          - web
          - social networks (six degrees of Kevin Bacon game)
       - look at the basic search method in Search code
          - keep track of two sets of items
             - those to be visited
             - those that we have already visited (why do we need this?)
          - the only requirement we have for the items we're traversing is that they specify their neighbors
          - as long as we still have items to visit
             - get the next item
             - if we haven't visited it already:
                - add it to visited
                - add all of it's neighbors
                   - why do we have to check in two places if visited contains the item?
       - what happens if we use a stack?
          - depth first search (DFS)
          - we go further and further out, until we reach an ending and only then do we go back and try the next immediate neighbor from our starting point
       - what happens if we use a queue?
          - breadth first search (BFS)
          - we explore all of the immediate neighbors first, before going any further
          - can be seen as exploring one level out at a time
          - if we're looking for something, then it will find the shortest path
       - Could we rewrite either of these recursively?
          - would be a bit of a challenge for BFS
          - for DFS, much more straightforward: use the call-stack to keep track of where we can from
       - look at dfsRecursive method in Search code
          - we need a helper method to pass along the visited List, but notice we don't need a queue anymore!
             

  • searching the web (web crawlers)
       - search engines crawl the web in a similar fashion as we saw above to figure out what's out there
       - the web pages are the nodes and the neighbors are determined by hyperlinks in the text
       - a web page resides on a server out there somewhere
       - being a good web crawler
          - robots.txt
             - sometimes pages may be publicly available, but they don't want them searched (or indexed)
             - the web server can announce this via robots.txt
             - see http://www.robotstxt.org/ for more details
             - a crawler should obey the requests of robots.txt and not crawl pages it shouldn't be
          - frequency
             - it's very easy to inundate a web server by requesting many pages from the same server very quickly (which can easily happen when you're following links, since many links are internal)
             - the easiest way to be server friendly is to not crawl more than one page every say every 500 ms (which is too slow for general purpose, but will work well for our experiments)