CSCI 136: Assignment 3 -- Words

Due 3/1/98


Description

In this assignment we will try an experiment in Artificial Intelligence. The idea is to write a program which will read in text and then use that text to generate some new text. The method for generating the text uses simple probability - we read the text character by character. We keep track of how often each three-character sequence appears. From this we can compute the probability that a certain character will immediately follow two given characters. For example, if the text is: "the theatre is their thing", e appears after th 3 times, and i appears after th 1 time; no other letters appear after th. So the probability that e follows th is .75; the probability that i follows th is .25; the probability that any other letter follows th is 0.

Once we have the text processed, and stored in a structure that allows us to check probabilities, we then pick two letters (for example, the first two in the input text) to use as a beginning for our new text. Finally we use random numbers to choose subsequent characters based on the preceding two characters and the probability information. Limit the output text to no more than 200 characters.

Goals

The goals are to become proficient in using objects to build more complex structures, to learn the importance of careful class design, and to learn the Vector, Association and ReadStream classes.

Important Considerations

You should think about the design of this program carefully before entering the lab. What would constitute a good data structure for this problem? The table of information should support requests of the form: A 3-dimensional array might seem reasonable at first, but its size would be quite large (approximately 27,000 entries). Instead I would like you to use a table which is implemented as a Vector of Associations. Each Association would have a 2-character pair as its key, along with a value which is a frequency list. The frequency list would keep track of which characters appeared after the given 2-character pair, along with a frequency.

How should the frequency list be implemented? Well, ...another Vector of Associations sounds like a good idea. Thus the frequency list's Vector would consist of Associations in which the key was a single character and the value was a count of the number of times that letter occurred after the pair with which the list is associated. Think carefully about what methods the frequency list needs to support.

The data structure design built from these two classes has the benefit of having only as many entries as necessary for the given input text.

You will find it extremely helpful to look carefully at the word frequency program on page 35 of Bailey (or better yet, a revised version of the program in the Sample Programs folder on-line).

Warning: When you import the package with the class Random, use

   import java.util.Random;
If you write "import java.util.*;", the program will get confused as to which version of the Vector class it should use as there is one in java.util as well as one in structure.

Input

I suggest you first debug your program using as input a String constant (e.g., "the theater is their thing") until it works properly. Then you can take input from the keyboard. As you can see in the word frequency example, a Java program running on the Macintosh which takes input from the user needs to include the following import statement:
   import com.mw.*;
and in the main method you need the following statement:
   SystemInput sysIn = new SystemInput();
Interestingly, you need not mention sysIn again in your program. The result of creating a new SystemInput is that a window is popped up on the screen with a field for typing input.

The ReadStream class is part of the Bailey's structure package. You can find the documentation from the 136 web page. The methods you will have the most need for include

   char readChar();
   boolean eoln();
   boolean eof();
End of file is signalled for Java on the Mac by typing "fan-e", where the "fan" key is sometimes referred to as the "apple" key. Of course, you must create a new ReadStream before using any of these methods. If you use the constructor with no parameters, the input will come from the window created by SystemInput.

Extras

There are many ways in which you might extend such a program. For example, as described, letter pairs which never appeared in the input will never appear in the output. Is there a way you could introduce a bit of "mutation" to allow new pairs to appear?

Instead of working at the character-pair level, you could work at the word-pair level, although this results in huge numbers of legal word pairs, so only attempt this after you get the required work finished.

What to hand in

A complete design for the two classes is due at the beginning of lab on Wednesday. You must turn in a copy of the design which includes a listing of instance variables for each class as well as the declaration of each method in each of the classes (for your own benefit you should also include the bodies of these methods). Finally you must include a description of how you will test these classes. Keep a copy of this for your own use as it may take us a while to grade these (though we intend to get them back to you in the early part of the lab). This time we will assign a grade (10% of the grade for this assignment) based on the correctness and completeness of the information you hand in. The program is due at 11:59 p.m. on Sunday, March 1. Programs are handed in by putting a folder, whose name is your last name, into the CS136 drop-off folder on Cider Press. The folder should contain the Metrowerks project folders for all of the classes needed to run your program. As always, try to test your classes thoroughly.

Finally, I will be out of town Thursday through Sunday of this week. Thus it will be to your advantage to be so well prepared that you finish or nearly finish in lab. Professor Lenhart will teach for me on Friday and can answer questions then, and the TA's will also be available as usual during their evening hours, but I probably will be unable to answer e-mail queries until Sunday evening.

Back to:

  • CS 136 Assignments
  • CS 136 home page
  • Kim Bruce's home page
  • CS Department home page
  • kim@cs.williams.edu