CS 51Homework Laboratory # 10
Web log trivia

Objective: To gain experience working with Strings and extracting information.

For this program, you will be working with a very large file (over 6 Megabytes!), the log file for several days for the CS department web server. This file contains a log entry for every access of a web page on our servers. Here is a sample of what the entries look like:

134.173.95.84 - - [11/Nov/2007:12:50:35 -0800] "GET /classes/cs151/ HTTP/1.1" 
134.173.95.84 - - [11/Nov/2007:12:50:38 -0800] "GET /classes/cs151/assignments.html HTTP/1.1"
134.173.95.84 - - [11/Nov/2007:12:50:57 -0800] "GET /classes/cs151/assignments/insurance.xml HTTP/1.1"
66.249.73.50 - - [11/Nov/2007:12:51:01 -0800] "GET /~tzuyi/Classes/ProblemSets/ps10a.ps HTTP/1.1"
66.249.73.50 - - [11/Nov/2007:12:51:04 -0800] "GET /~tzuyi/Classes/Notes/030129.pdf HTTP/1.1"
66.249.73.50 - - [11/Nov/2007:12:51:48 -0800] "GET /~tzuyi/Classes/Notes/tree_code.pdf HTTP/1.1"
66.249.73.50 - - [11/Nov/2007:12:52:17 -0800] "GET /~tzuyi/Classes/Notes/041116.ppt.pdf HTTP/1.1"
65.55.209.100 - - [11/Nov/2007:12:52:47 -0800] "GET /~kim/CSC051S06/demos/PolyStructure/structure/Iterable.java HTTP/1.0"
65.55.209.102 - - [11/Nov/2007:12:54:19 -0800] "GET /cs080/ HTTP/1.0" 
Actually, I cheated a bit and truncated each entry to get rid of some long, but not relevant info at the end of each line.

The beginning of each entry consists of the IP address of the computer making the request. This consists of 4 groups of digits, separated by periods. Starting immediately after the first "[" is the date and time (in "universal" 24 hour format) of the request. Finally the URL of the page requested comes after the "GET" and continues to the next occurrence of a double quote. Thus the first line represents a request from IP address 134.173.95.84. It was made on 11 November of this year at 12:50:35 pm Pacific Standard time. It involved a request for the page classes/cs151/ (the extra HTTP/1.1 describes the protocol used) on www.cs.pomona.edu.

The starter folder you copy will include a a class ParseEntry that you will need to write. It should be used to break up a line from the log into four strings corresponding to the IP address, date, time, and URL. The constructor takes a String corresponding to a line of the file as a parameter, and the class should provide methods getAddress(), getDate(), getTime(), and getURL() that return Strings representing the appropriate part of each entry. Warning: A few lines do not contain the "GET". For those lines, getURL() should return the string "No URL".

The log file "access.log" can be found in your start-up folder. Your job is to use the log file to answer the following questions:

  1. How many off-campus accesses were made to the server (i.e., where the address starts with something other than 134.173.). Same question for only CS51 pages. That is, only count those entries whose URL starts with classes/cs051.
  2. What students on campus accessed the CS 51 web pages after midnight and before 6 a.m. from their rooms? (On campus sites have addresses starting with 134.173. Computers in our lab have address starting with 134.173.66.) To help answer this question we have provided you with a class NameServer. It has a parameterless constructor and a method lookup(address) which takes an address (a string) and returns a string which is the symbolic version of the IP address. Campus dorm IP addresses end with res.pomona.edu. For each computer in a dorm that accessed the CS 51 web site, please list the earliest and latest times within the midnight to 6 a.m. time frame that they accessed the web site. One line of your output should look something like:
            89-120.res.pomona.edu from 04:01:21 to 05:40:41
        
  3. Which hours during the day had the highest and lowest number of accesses to the web pages? Give the same information for only those entries that are on CS 51 pages. [Hint: Use an array to accumulate the data.]
You may work in pairs on this assignment and turn in only one program with both names on it.

Implementation.

Begin by writing the ParseEntry class and try it out. We have provided code within the begin method of class Trivia that will call method testParseEntry that we have provided in order to test whether your methods of class ParseEntry work. If it works correctly, it should print out:

    74.6.24.136: 11/Nov/2007: 06:26:43: classes/cs051/demos/Interesting/?N=D
    URL represents: lj511642.crawl.yahoo.net

When your are convinced that your ParseEntry class is correct, please remove the call to testParseEntry() from the begin method and erase the method.

The begin method opens a file and prepares it for reading (we'll talk about exactly how that works soon). It then calls the method answerQuestion(). You are to fill in the body of answerQuestion so that you can get the answers to the questions above. the structure of the method is as follows:

    private void answerQuestion() throws IOException {
        // declarations and initialization code
        String line = theFile.readLine();
        while(line != null) { 
             // do whatever is necessary to process line
            line = theFile.readLine();
        }
        // display answers as necessary
    }

Please have your answers displayed in the TextArea named display that is created in the begin method. Remember to use the append method to add new content to the TextArea.

Submitting Your Work

All programs will be due by Wednesday at 5 p.m., though I hope you will be well prepared enough for lab that you will finish by the end of lab. When your work is complete you should deposit in the appropriate dropoff folder a copy of the entire folder containing all of your .java files. Before you do this, make sure the folder name includes your name(s) and the phrase "Lab 10". Also make sure to double check your work for correctness, organization and style.

Grading Point Allocations

Value

Feature
Syntax Style (3 pts total)
1 pt. Descriptive comments
1/2 pt. Good names
1 pt. Good use of constants
1/2 pt. Appropriate formatting
Semantic style (3 pts total)
1 pt. conditionals and loops
1 pt. General correctness/design/efficiency issues
1 pts. Parameters, variables, and scoping
Correctness (4 pts total)
1 pt. parsing log lines
3 pt. correct answers to questions


Computer Science

051
Department of Computer Science
Pomona College