CS136, Lecture 9

  1. Complexity
  2. Searching
    1. Linear search
    2. Binary search
  3. Data Structures
    1. We want parameterized data structures in Java!
    2. List
The purpose of this course is not just to help you write bigger and more complex programs, but to help you be smarter programmers. Thus focus on careful design and analysis of data structures and algorithms to solve problems. This will be an important focus of the examinations.

I will presume that you have read the "text", on-line lecture notes, and sample code discussed in class.

Please come prepared for labs by having thought through the material and created a design for the program so that you can use the lab time effectively.

Complexity

Rather than keeping an exact count of operations, use order of magnitude count of complexity.

In general if have polynomial of the form a0 nk + a1 nk-1 + ... + ak , say it is O(nk).

Most common are

O(1) - for any constant

O(log n), O(n), O(n log n), O(n2), ..., O(2n)

Usually use these to measure time and space complexity of algorithms.

Insertion of new first element in an array of size n is O(n) since must bump all other elts up by one place.

Insertion of new last element in a vector of size n is O(1) if enough room for it, O(n) otherwise.

Saw increasing array size by 1 at a time to build up to n takes time n*(n-1)/2, which is O(n2).

Saw increasing array size to n by doubling each time takes time n-1, which is O(n).

Make table of values to show difference.

Suppose have operations with time complexity O(log n), O(n), O(n log n), O(n2), and O(2n).

And suppose all work on problem of size n in time t. How much time to do problem 10, 100, or 1000 times larger?

size 10 n 100 n 1000 n
O(log n) >3t 10t >30t
O(n) 10t 100t 1,000t
O(n log n) >30t 1,000t >30,000t
O(n2) 100t 10,000t 1,000,000t
O(2n) ~t10 ~t100 ~t1000
TIME TO SOLVE PROBLEM

*Note that the last line depends on the fact that the constant is 1, otherwise the times are somewhat different.

Suppose get new machine that allows certain speed-up. How much larger problems can be solved? If original machine allowed solution of problem of size k in time t, then

speed-up 1x 10x 100x 1000x
O(log n) k k10 k100 k1000
O(n) k 10k 100k 1,000k
O(n log n) k <10k <100k <1,000k
O(n2) k 3k+ 10k 30k+
O(2n) k k+3 k+7 k+10
SIZE OF PROBLEM

We will use big Oh notation to help us measure complexity of algorithms.

Searching

Searching and sorting are important operations and also important example for use of complexity analysis.

Only deal with searches here, come back to do sorts.

Code for all searches is on-line in Sort program example

Linear search

Pretty straightforward. Compare element looking for with successive elements of the list until either find it or run out of elements.

If list has n elements, then n compares in worst case.

Binary search

Binary search cleverer on ordered list. Look at middle element: Notice this is recursive.

With each recursive call do at most two compares.

What is maximum number of recursive calls?

At most (log n) + 1 invocations of routine & therefore at most 2*((log n) + 1) comparisons. O(log n) comparisons.

Concrete comparison of worst cases: # of comparisons:

Search\# elts 10 100 1000 1,000,000
linear 10 100 1000 1,000,000
binary 8 14 20 40

Can actually make binary search faster if don't compare for equality until only 1 elt left!

Data Structures

Most of the rest of this course we will focus on the design of data structures to satisfy particular problems. For each of these we shall first specify what operations we need to support, then examine various implementations of the data structure, and finally compare them to see the advantages and disadvantages of each.

The data structures we examine are sometimes called container classes because they contain collections of elements. Virtually all of the data structures we will be studying in this course have interfaces which extend Container (in the structure package):

package structure;
public interface Container 
{
    public int size();
    // post: returns number of elts contained in container.

    public boolean isEmpty();
    // post: returns true iff container is empty

    public void clear();
    // post: clears container
}

We want parameterized data structures in Java!

In Java, it is easy to define an array of any type of element. All of the array operations work no matter what type of element the array is composed of.

Unfortunately, Java does not currently allow the user to define data structures with the same flexibility. For example, we saw earlier that Vectors were defined to hold values of type Object. This had the disadvantage that if we put in some specific type of element, we often had to insert a downcast in order to use an element when removed from the Vector.

Of course we could have defined vectors specifically to hold elements of any specific type -- Renderable, for example. The problem is that for each application we might have to write a different version.

Until Java puts in features to write parameterized data structures (they're thinking about it) we instead will imitate Vector and write data structures which are designed to hold elements of type Object. This way we will be able to insert elements of any object type, while base types can be packed into their corresponding object forms and inserted.

List

We will begin our study of data structures with lists. These are structures whose elements are in a linear order.

public interface List extends Container {
    public Iterator elements(); // ignore for now!
    // post: returns an iterator allowing 
    //   ordered traversal of elements in list

    public int size();          // from Container
    // post: returns number of elements in list

    public boolean isEmpty();   // from Container
    // post: returns true iff list has no elements

    public void clear();        // from Container
    // post: empties list

    public void addToHead(Object value);
    // post: value is added to beginning of list

    public void addToTail(Object value);
    // post: value is added to end of list

    public Object peek();
    // pre: list is not empty
    // post: returns first value in list

    public Object tailPeek();
    // pre: list is not empty
    // post: returns last value in list

    public Object removeFromHead();
    // pre: list is not empty
    // post: removes first value from the list

    public Object removeFromTail();
    // pre: list is not empty
    // post: removes the last value from the list

    public boolean contains(Object value);
    // post: returns true iff list contains an object equal     
    //  to value

    public Object remove(Object value);
    // post: removes and returns element equal to value
    //       otherwise returns null
}

We can imagine other useful operations on lists, such as return nth element, etc., but we'll stick with this simple specification for now.

The text has a simple example of reading in successive lines from a text and adding each line to the end of a list if it doesn't duplicate an element already in the list. This is easily handled with the operations provided.

Suppose we decided to implement List using a vector:

public class VectList implements List
{
    protected Vector listElts;

    public VectList()
    {
        listElts = new Vector();
    }
....
}
How expensive would each of the operations be (worst case) if the VectList contains n elements?

Some are easy. Following are O(1). Why?

    size(), isEmpty(), peek(), tailPeek(),  removeFromTail()

Others take more thought:
    clear();      // O(n) currently, because reset all slots to null,
                  // but could be O(1) 
    addToHead(Object value);    //O(n) - must move contents
    removeFromHead();           //O(n) - must move contents
    contains(Object value);     //O(n) - must search
    remove(Object value);       //O(n) - must search & move contents

The last is the trickiest:
    addToTail(Object value);

If the vector holding the values is large enough, then it is clearly O(1), but if needs to increase in size then O(n). If use the doubling strategy then saw this is O(1) on average, but O(n) on average if increase by fixed amount.

All of the other operations have the same "O" complexity in the average case as for the best case.