### CS150 - Fall 2011 - Class 10

• Pi video

- No lab prep for Friday
- I did add a few pages to read from the book about file I/O
- Lab on Friday will be partnered
- Test project 1 out today
- due next Friday (10/21) at 6pm
- honor code
- must work alone
- may only use: book, your notes, class notes, python.org documentation
- may NOT: get help from other students, get help from the tutors (except for file issues, etc), look online for solutions
- 3 problems
- required to do some extra credit
- 63 points total, but only 60 for just doing what I stated
- More than a third of the points come from code style and commenting
- follow instructions carefully!

• problem set, problem 1c.
- what does it do?
- how does it work?

• aliasing
- what will be the output of my_list after doing the following:

>>> my_list = [1, 2, 3, 4, 5]
>>> other_list = my_list
>>> other_list[2] = 100
>>> other_list
[1, 2, 100, 4, 5]
>>> my_list

- [1, 2, 100, 4, 5] ... why?
- my_list and other_list are just references to the same object
- this is called aliasing, since other_list is an alias (another name) for my_list
- saying other_list = my_list does not do a deep copy, that is it does NOT create a new list that is a copy of the list
- draw a picture

- notice that if I make changes to either one, changes will be seen in the other
>>> my_list
[1, 2, 100, 4, 5]
>>> other_list
[1, 2, 100, 4, 5]
>>> my_list[0] = 0
>>> other_list[1] = 1000
>>> my_list
[0, 1000, 100, 4, 5]
>>> other_list
[0, 1000, 100, 4, 5]

- aliasing can also show up in other places
>>> my_list = [1, 2, 3, 4, 5]
>>> def mystery(x):
...    x[0] = 1000
...
>>> my_list
[1, 2, 3, 4, 5]
>>> mystery(my_list)
>>> my_list
[1000, 2, 3, 4, 5]

- parameters are passed as a shallow copy (i.e. an alias)
- "parameter passing" describes how the values that are input to the function (i.e. the arguments) are bound to the parameters inside the function
- be careful!
- why do you think this is done?
- a deep copy can be a lot of work
- also allows us to write functions that manipulate the parameter (which we may or may not do)
- notice that we cannot changes what other_list reference (only mutate the object)

def mystery(alist):
alist = [0]*10
print alist

>>> my_list = [1, 2, 3, 4, 5]
>>> mystery(my_list)
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
>>> my_list
[1, 2, 3, 4, 5]

- slicing does create a new copy
>>> my_list = [1, 2, 3, 4, 5]
>>> other_list = my_list[2:4]
>>> other_list
[3, 4]
>>> other_list[0] = 100
>>> other_list
[100, 4]
>>> my_list
[1, 2, 3, 4, 5]

- given this, how could we create a deep copy of other_list?
>>> my_list = [1, 2, 3, 4, 5]
>>> other_list = my_list[:]
>>> other_list[3] = 100
>>> other_list
[1, 2, 3, 100, 5]
>>> my_list
[1, 2, 3, 4, 5]

• run the sentence_stats function from word-stats.py code
- similar idea to our scores functions except now we're going it over strings instead of numbers
- the string class has a "split" method that splits up a sentence into a list by splitting on spaces

>>> "this is a sentence".split()
['this', 'is', 'a', 'sentence']

- optionally, can specify what to split on (though this is much more rare)

>>> "this is a sentence".split("s")
['thi', ' i', ' a ', 'entence']

• files
- what is a file?
- a chunk of data stored on the hard disk
- why do we need files?
- hard-drives persist state regardless of whether the power is on or not
- when a program is running, all the data it is generating/processing is in main memory (e.g. RAM)
- main memory is faster, but doesn't persist when the power goes off

- to read a file in Python we first need to open it

file = open("some_file_name", "r")

- open is another function that takes two parameters
- the first parameter is a string identifying the filename
- be careful about the path/directory. Python looks for the file in the same directory as the program (.py file) unless you tell it to look elsewhere
- the second parameter is another string telling Python what you want to do with the file
- "r" stands for "read", that is, we're going to read some data from the file
- open returns a "file" object that we can use later on for reading purposes
- above, I've saved that in a variable called "file", but I could have called in anything else

>>> open("english.txt", "r")
<open file 'english.txt', mode 'r' at 0x10120a030>
>>> type(open("english.txt", "r"))
<type 'file'>

- once we have a file open, we can read a line at a time from the file using a for loop:

for <variable> in <file_variable>:
# do something

- for each line in the file, the loop will get run
- each time the variable will get assigned to the next line in the file
- the line will be of type string
- the line will also have an endline at the end of it which you'll often want to get rid of (the strings strip() method is often good for this)

• look at the file_stats function in word-stats.py code
- what does it do?
- opens a file
- reads a line at a time
- appends each entry in the file to a list called words (stripping of the end of line)
- prints out the statistics of the word file

- in this same directory I have a file call "english.txt" that has a large list of English words

>>> file_stats("english.txt")
Number of words: 47158
Longest word: antidisestablishmentarianism
Shortest word: Hz
Avg. word length: 8.37891768099

- notice how quickly it can process through the file
- computers are fast!