### CS150 - Fall 2011 - Class 14

- CS lunch tomorrow (Tuesday) in Ross at 12:25
- keep up with the exercises and reading
- test project 1
- generally looked pretty good
- scores were good (average was 58)
- make sure you understand the places where you lost points (if there are any questions, come talk to me)
- word of warning: I will be stricter with points for the final test project

• exercise

• you can only put immutable objects in a set
- any guesses as to why?
- objects are kept track of based on their contents
- if their contents change, there is no easy way to let the set know this
- what can/can't we store in a set?
- can store:
- ints
- floats
- strings
- bools
- can't store
- lists
- sets

• tuples
- there are occasions when we want to have a list of things, but it's immutable
- for example, if we want to keep track of a list of things in a set
- a "tuple" is an immutable list
- tuples can be created as literals using parenthesis (instead of square braces)
>>> my_tuple = (1, 2, 3, 4)
>>> my_tuple
(1, 2, 3, 4)
>>> another_tuple = ("a", "b", "c", "d")
>>> another_tuple
('a', 'b', 'c', 'd')

- notice that when they print out they also show using parenthesis
- tuples are sequential and have many of the similar behaviors as lists
>>> my_tuple[0]
1
>>> my_tuple[3]
4
>>> for i in range(len(my_tuple)):
...    print my_tuple[i]
...
1
2
3
4
>>> my_tuple[1:3]
(2, 3)
- tuples are immutable!
>>> my_tuple[0] = 1
Traceback (most recent call last):
File "<string>", line 1, in <fragment>
TypeError: 'tuple' object does not support item assignment
>>> my_tuple.append(1)
Traceback (most recent call last):
File "<string>", line 1, in <fragment>
AttributeError: 'tuple' object has no attribute 'append'

>>> my_tuple = another_tuple
>>> my_tuple
('a', 'b', 'c', 'd')
>>> another_tuple
('a', 'b', 'c', 'd')

- this is perfectly legal. We're not mutating a tuple, just reassigning our variable

• generating histograms
- we'd like to write a function that generates a histogram based on some input data
- what is a histogram?
- shows the "distribution" of the data (i.e. where the values range)
- often visualized as a bar chart
- along the x axis are the values (or bins)
- and the y axis shows the frequency of those values (or bins)
- for example, run histogram.py code
>>> data = [1, 1, 2, 3, 1, 5, 4 ,2, 1]
>>> print_counts(get_counts(data))

- we can use Excel again to visualize this as a histogram
- how can we do this?
- we could do this like we did in assignment 5, where we sort and then count
- but there's an easier way...
- do it on paper: [1, 2, 3, 2, 3, 2, 1, 1, 5, 4, 4, 5]
- how did you do it?
- kept a tally of the number
- each time you saw a new number, added it to your list with a count of 1
- key idea, keeping track of two things:
- a key, which is the thing you're looking up
- a value, which is associated with each key

• dictionaries (aka maps)
- store keys and an associated value
- each key is associated with a value
- lookup can be done based on the key
- this is a very common phenomena in the real world. What are some examples?
- social security number
- key = social security number
- value = name, address, etc
- phone numbers in your phone (and phone directories in general)
- key = name
- value = phone number
- websites
- key = url
- value = location of the computer that hosts this website
- key = license plate number
- value = owner, type of car, ...
- flight information
- key = flight number
- value = departure city, destination city, time, ...
- like sets, dictionaries allow us to efficiently lookup (and update) keys in the dictionary
- creating new dictionaries
- dictionaries can be created using curly braces
>>> d = {}
>>> d
{}
- dictionaries function similarly to lists, except we can put things in any index and can use non-numerical indices
>>> d[15] = 1
>>> d
{15: 1}

- notice when a dictionary is printed out, we get the key AND the associated value

>>> d[100] = 10
>>> d
{100: 10, 15: 1}
>>> my_list = []
>>> my_list[15] = 1
Traceback (most recent call last):
File "<string>", line 1, in <fragment>
IndexError: list assignment index out of range

- dictionaries are different than lists....
- we can also update the values already in a list
>> d[15] = 2
>>> d
{100: 10, 15: 2}
>>> d[100] += 1
>>> d
{100: 11, 15: 2}
- keys in the dictionary can be ANY immutable object
>>> d2 = {}
>>> >>> d2["dave"] = 1
>>> d2["anna"] = 1
>>> d2["anna"] = 2
>>> d2["seymore"] = 100
>>> d2
{'seymore': 100, 'dave': 1, 'anna': 2}
- the values can be ANY object
- >>> d3 = {}
>>> d3["dave"] = set()
>>> d3["anna"] = set()
>>> d3
{'dave': set([]), 'anna': set([])}
>>> d3
{'dave': set([40, 1]), 'anna': set(['abcd'])}
- be careful to put the key in the set before trying to use it
>>> d3["steve"]
Traceback (most recent call last):
File "<string>", line 1, in <fragment>
KeyError: 'steve'
Traceback (most recent call last):
File "<string>", line 1, in <fragment>
KeyError: 'steve'
- how do you think we can create non-empty dictionaries from scratch?
>>> another_dict = {"dave": 1, "anna":100, "seymore": 21}
>>> another_dict
{'seymore': 21, 'dave': 1, 'anna': 100}
- what are some other methods you might want for dictionaries (things you might want to ask about them?
- does it have a particular key
- how many key/value pairs are in the dictionary
- what are all of the values in the dictionary
- what are all of the keys in the dictionary
- remove all of the items in the dictionary
- dictionaries support most of the other things you'd expect them too that we've seen in other data structures
>>> "seymore" in another_dict
True
>>> len(another_dict)
3
- dictionaries are a class of objects, just like everything else we've seen (called dict ... short for dictionary)
>>> help(dict)
- some of the more relevant methods:
>>> d2
{'seymore': 100, 'dave': 1, 'anna': 2}
>>> d2.values()
[100, 1, 2]
>>> d2.keys()
['seymore', 'dave', 'anna']
>>> d2.clear()
>>> d2
{}

• back to our histogram program
- how could we use a dictionary to generate the counts?
>>> data = [1, 2, 3, 2, 3, 2, 1, 1, 5, 4, 4, 5]
>>> print_counts(get_counts(data))
1   3
2   3
3   2
4   2
5   2
- first, we need to store them in a dictionary
- look at the get_counts function in histogram.py code
- creates an empty hashtable
- iterates through the data
- check is the data is in the dictionary already
- if it is, just increment the count by 1
- if it's not, add it to the dictionary with a count of 1
- what types of things could we call get_counts on?
- anything that is iterable!
>>> get_counts(data)
{1: 3, 2: 3, 3: 2, 4: 2, 5: 2}
>>> get_counts("this is a string and strings are iterable")
{'a': 4, ' ': 7, 'b': 1, 'e': 3, 'd': 1, 'g': 2, 'i': 5, 'h': 1, 'l': 1, 'n': 3, 's': 5, 'r': 4, 't': 4}
>>> s = set([1, 2, 3, 4, 1, 2])
>>> s
set([1, 2, 3, 4])
>>> get_counts(s)
{1: 1, 2: 1, 3: 1, 4: 1}

- though sets aren't that interesting :)
- now that we have the dictionary of counts, how can we print them out?
- there are many ways we could iterate over the thins in the dictionary
- iterate over the values
- iterate over the keys
- iterate over the key/value pairs
- which one is most common?
- since lookups are done based on the keys, iterating over the keys is the most commong
- look at print_counts function in histogram.py code
- by default, if you say:

for key in dictionary:
...

key will get associated with each key in the dictionary.
- this is the same as writing

for key in dictionary.keys():
...

- once we have the key, we can use it to lookup the value associated with that key and do whatever we want with the pair
- if you want to iterate over the values, use the values() method, which returns a list of the values
- what if you want to iterate over the key/value pairs?
- there is a method called items() that returns key value pairs as a 2-tuple
>>> my_dict = {"dave": 1, "anna": 15}
>>> my_dict.items()
[('dave', 1), ('anna', 15)]
- how could we use this in a loop?

for (key, value) in my_dict.items():
print "Key: " + str(key)
print "Value: " + str(value)
- items() returns a list of 2-tuples, which we're iterating over

- Does the following print like you'd like it to?
>>> print_counts(get_counts("this is some string"))
3
e   1
g   1
i   3
h   1
m   1
o   1
n   1
s   4
r   1
t   2

- prints in a random order
- how could we print this in sorted order?
- get the keys first
- sort them
- then use them to iterate over the data
- look at print_counts_sorted in histogram.py code