Objectives

For this assignment, you will:

Description

All of the sorting algorithms we’ve looked at so far assume that the data is in memory and that we can swap data elements efficiently. Sometimes, because of the size of the data you cannot fit all of it in memory. In these situations, many of the traditional sorting algorithms fail miserably; the algorithms do not preserve data locality and end up accessing the disk frequently, resulting in very slow running times.

For this assignment, you will be implementing an on-disk sorting algorithm that is designed to use the disk efficiently to sort large data sets. The sorting algorithm will work in two phases:

The correctness of the assignments in this class will be automatically verified. For this reason, you must follow all naming conventions specified in this assignment.

Classes

OnDiskSort class

We have provided you with a skeleton OnDiskSort class that you will need to fill in the details for. We encourage you to add additional private methods, but do not change the names or parameters of the methods we have provided you. This will make our life much easier when we grade the assignment. As an aside, we have made some of the methods protected where normally we would have made them private to, again, assist us in grading.

You will need to fill in the following methods:

To assist you, we have also provided a few helper methods in the OnDiskSort class that you may find useful. They primarily do some simple operations with files. If there is any confusion about what these methods do, please come talk to us. In addition, these helper methods may also help you understand basic Java file I/O.

MergeSort class

Implementation of the Mergesort algorithm

QuickSort class

Implementation of the Quicksort algorithm

Sorter interface

An interface for sorting algorithms. Implemented by the MergeSort and Quicksort class.

WordScanner class

Implements the Java Iterator interface. An iterator over Strings read in from file.

Getting Started

  1. As usual, create a new project in Eclipse and copy the starter code into the src directory of your newly created project from:

    /common/cs/cs062/assignments/assignment03/
  2. You will also need a directory in which to put the files to be sorted. Create a directory called “sorting_run” in the parent directory (or in the same directory as the src and bin folder). In that directory put a file containing a copy of King’s “I have a dream” speech. It is in a file named “Ihaveadream.txt” and is in with files from last week’s assignment. Be sure to name these exactly as given here. (If not, then the program won’t find them and it will crash.) See the main method of OnDiskSort for the names. Note that we may test your code using a different directory for temporary files, so your code shouldn’t for OnDiskSort shouldn’t use the name “sorting_run” except in its main method as a default value.

  3. You are now ready to get started! Again, try to code incrementally one method at a time.

Grading

You will be graded based on the following criteria:

Criterion Points
Merge 3
MergeFiles 3
Sort 3
Uses only two Strings in merge 2
Cleans up temporary files appropriately 2
General correctness 2
Appropriate comments (including JavaDoc) 3
Appropriate use of generics 2
Style and formatting 2
Submitted correctly 1
Extra credit 2

NOTE: Code that does not compile will get a zero! Make sure that your code compiles before submitting.

Submitting Your Work

  1. A reminder: in addition to Javadoc comments, your code must adhere to the (CS 62 Style Guide)[http://www.cs.pomona.edu/classes/cs062/notes/cs062-style.html].

  2. As usual, export the entire folder from Eclipse to your desktop.

  3. Change the name of the folder to Assignment03_LastNameFirstName. Make sure the folder contains both a src and a bin directory.

  4. Make sure to fill out the asg03.json file (found in the same place as the assignment starter code) with your username and whether or not you did extra credit. Add it to the top level of your submission directory.

  5. Unlike last time, use the new submission script. This script will make sure that everything is named correctly and that you don’t forget any of the required folders. To use it, first copy your assignment folder onto one of the lab machines (or a remotely-accessible machine like little.cs.pomona.edu), and then run the submission script at /common/cs/cs062/bin/submit with the following arguments:

    Once you call the script, it should print “You are:” followed by your username. If it prints the wrong username, you’re using someone else’s login and your submission won’t be recognized as yours. After confirming your username, the script will tell you what file it submitted for what assignment/class and the name of the submitted file. If you get the arguments wrong, it will print an error message instead (which should be informative). Remember you can use tab-completion for filenames to spell them correctly (filenames with spaces in them can accidentally be treated as multiple separate arguments unless you’re careful). Using the submit command looks like this:

    /common/cs/cs062/bin/submit cs062 asg03 ~/Desktop/Assignment03_Name

    You can use a program like ssh (should be installed by default on Macs) or PuTTY (available for free for Windows) to login to little.cs.pomona.edu remotely to run the submit command.

Additional Information

File I/O in Java

For those that haven’t had any file I/O experience in Java, we’ll give a brief intro here, but also take a look at the streams cheat sheet available off of the course Documentation page. You can also look up information about the classes seen in the code and discussed here via the Java libraries link there. For most I/O, you’ll need to import java.io.*.

The two main classes you’ll be concerned with when doing file I/O in java are BufferedReader for reading data and PrintWriter for writing data. To read data, you can create a new reader by:

BufferedReader in = new BufferedReader(new FileReader(...))

where “…” can be either replaced with a String or can be replaced with a File object. To write data, you can create a new writer by:

PrinterWriter out = new PrintWriter(new FileOutputStream(...))

In both cases, you will need to surround these with a try-catch to handle the IOException.

The file system

The file system on these computers starts at the root directory ‘/’. Everything is then expanded out based on directories. For example “/home/america/” means that in the root directory ‘/’ there is a directory named “home” and inside this directory is another directory named “pmawhorter”. The ‘/’ symbol is called the file separator and is different depending on the operating system (e.g. it’s ‘’ on windows computers).

Filenames can be specified as relative filenames, where the filename is specified relative to the current location of the program (or user), and absolute filenames where the entire path, starting from the root directory, is specified. For example, the absolute filename for the src directory of my Assignment03 folder in my Eclipse workspace is:

/home/pmawhorter/Documents/cs062/workspace/Assignment03/src

The relative filename for the src directory depends on the current directory. If the current location of the program (or user) is the bin directory, then the relative filename is “../src.” The “..” symbol means go up one directory. So, from the bin directory, the relative filename “../src” means go up one directory and then go to the src directory. Note that relative filenames do NOT start with a ‘/’.

It can be confusing telling exactly where your program currently is when running it, so often the best approach when writing programs is to use a full path which starts at the root directory. (We won’t do that this time because we want the program you turn in to work when copied to the TA’s directory.)

If you ever want to know where you are when you’re in the Terminal, the pwd command (which stands for print working directory). You can try it out by just typing pwd and hitting return (though that won’t work in Eclipse – you must be in the Terminal)!. Rather than hard-coding in a file or directory, you can also pop up a dialog box and let the user choose the file. We did that with last week’s assignment. In fact, the startup code given to you in the main class of TextGenerator is a good example of how to use file input.