### CS51A - Spring 2019 - Class 13

• write a function called read_numbers that takes a file of numbers (one per line) and generates a list consisting of the numbers in that file
- look at read_numbers function in dictionaries.py code
- if you're reading numbers, don't forget to turn them into ints using "int"

>>> data = read_numbers('numbers.txt')
data
[1, 2, 3, 2, 1, 1, 2, 6, 7, 8, 10, 1, 5, 5, 5, 3, 8, 6, 7, 6, 4, 1, 1, 2, 3, 1, 2, 3]

• what if we wanted to find the most frequent value in this data?
- how would you do it?
- do it on paper: [1, 2, 3, 2, 3, 2, 1, 1, 5, 4, 4, 5]
- how did you do it?
- kept a tally of the number
- each time you saw a new number, added it to your list with a count of 1
- if it was something you'd seen already, add another tally/count
- key idea, keeping track of two things:
- a key, which is the thing you're looking up
- a value, which is associated with each key

• dictionaries (aka maps)
- store keys and an associated value
- each key is associated with a value
- lookup can be done based on the key
- this is a very common phenomena in the real world. What are some examples?
- social security number
- key = social security number
- value = name, address, etc
- phone numbers in your phone (and phone directories in general)
- key = name
- value = phone number
- websites
- key = url
- value = location of the computer that hosts this website
- car license plates
- key = license plate number
- value = owner, type of car, ...
- flight information
- key = flight number
- value = departure city, destination city, time, ...
- creating new dictionaries
- dictionaries can be created using curly braces
>>> d = {}
>>> d
{}
- dictionaries function similarly to lists, except we can put things in ANY index and can use non-numerical indices
>>> d[15] = 1
>>> d
{15: 1}

- notice when a dictionary is printed out, we get the key AND the associated value

>>> d[100] = 10
>>> d
{100: 10, 15: 1}
>>> my_list = []
>>> my_list[15] = 1
Traceback (most recent call last):
File "<string>", line 1, in <fragment>
IndexError: list assignment index out of range

- dictionaries ARE very different than lists....
- we can also update the values already in a list
>> d[15] = 2
>>> d
{100: 10, 15: 2}
>>> d[100] += 1
>>> d
{100: 11, 15: 2}
- keys in the dictionary can be ANY immutable object
>>> d2 = {}
>>> >>> d2["dave"] = 1
>>> d2["anna"] = 1
>>> d2["anna"] = 2
>>> d2["seymore"] = 100
>>> d2
{'seymore': 100, 'dave': 1, 'anna': 2}
- the values can be ANY object
- >>> d3 = {}
>>> d3["dave"] = []
>>> d3
{'dave': []}
>>> d3["dave"].append(1)
>>> d3["dave"].append(40)
>>> d3
{'dave': [1, 40]}
- be careful to put the key in the set before trying to use it
>>> d3["steve"]
Traceback (most recent call last):
File "<string>", line 1, in <fragment>
KeyError: 'steve'
>>> d3["steve"].append(1)
Traceback (most recent call last):
File "<string>", line 1, in <fragment>
KeyError: 'steve'
- how do you think we can create non-empty dictionaries from scratch?
>>> another_dict = {"dave": 1, "anna":100, "seymore": 21}
>>> another_dict
{'seymore': 21, 'dave': 1, 'anna': 100}
- what are some other methods you might want for dictionaries (things you might want to ask about them?
- does it have a particular key?
- how many key/value pairs are in the dictionary?
- what are all of the values in the dictionary?
- what are all of the keys in the dictionary?
- remove all of the items in the dictionary?
- dictionaries support most of the other things you'd expect them too that we've seen in other data structures
>>> "seymore" in another_dict
True
>>> len(another_dict)
3
- dictionaries are a class of objects, just like everything else we've seen (called dict ... short for dictionary)
>>> help(dict)
- some of the more relevant methods:
>>> d2
{'seymore': 100, 'dave': 1, 'anna': 2}
>>> d2.values()
[100, 1, 2]
>>> d2.keys()
dict_keys(['seymore', 'dave', 'anna'])
>>> d2.pop('seymore')
>>> d2
{'dave': 1, 'anna': 2}
>>> d2.clear()
>>> d2
{}

• generating counts
- We're going to use dictionaries to store counts like we did on paper
- Write a function called get_counts that takes a list of numbers and returns a dictionary containing the counts of each of the numbers
- Key idea:

def get_counts(numbers):
d = {}

for num in numbers:
# do something here

return d

- There are two cases we need to contend with:
1) if the number isn't in the dictionary

- In this case we need to add it with the value 1

d[num] = 1

2) if the number is in the dictionary

- In this case, we just need to increment it

d[num] = d[num] + 1

which can also be written

d[num] += 1

- Look at the get_counts function in dictionaries.py code

- We now can generate the counts from our file

>>> data = read_numbers('numbers.txt')
>>> data
>>> [1, 2, 3, 2, 1, 1, 2, 6, 7, 8, 10, 1, 5, 5, 5, 3, 8, 6, 7, 6, 4, 1, 1, 2, 3, 1, 2, 3]
>>> get_counts(data)
{1: 7, 2: 5, 3: 4, 6: 3, 7: 2, 8: 2, 10: 1, 5: 3, 4: 1}

• Iterating over dictionaries
- We're almost to the point where we can find the most frequent value.
- Next, we need to go through all of the values in the dictionary to find the most frequent one.

- there are many ways we could iterate over the things in a dictionary
- iterate over the values
- iterate over the keys
- iterate over the key/value pairs
- which one is most common?
- since lookups are done based on the keys, iterating over the keys is the most common
- by default, if you say:

for key in dictionary:
...

key will get associated with each key in the dictionary.
- once we have the key, we can use it to lookup the value associated with that key and do whatever we want with the pair
for key in dictionary:
value = dictionary[key]
..

- look at the print_counts function in dictionaries.py code
- "\t" is the tab character

>>> data = read_numbers('numbers.txt')
>>> counts = get_counts(data)
>> print_counts(counts)
1   7
2   5
3   4
6   3
7   2
8   2
10   1
5   3
4   1

Notice that there the keys are not in numerical order. In general, there's no guarantee about the ordering of the keys, only that you'll iterate over all of them.

• look at the get_most_frequent_value function in dictionaries.py code
- Looks very similar to the my_max function we wrote in lecture8 (http://www.cs.pomona.edu/~dkauchak/classes/cs51a/lectures/lecture8-sequences.html)
- We keep a variable (max_value) that stores the largest value we've seen so far
- We'll initialize it to -1 assuming that the numbers are all positive
- See problem set 6 for a general solution
- We then iterate through each of the key/value pairs in our dictionary
- We compare the value (i.e. counts[key]) to the largest value we've seen so far
- If it's larger, we update max_value
- The only difference with my_max is that we want to return the *key* associated with the largest value
- We need another variable (max_key) that stores this key
- Whenever we update max_value, we also update max_key

>>> data = read_numbers('numbers.txt')
>>> get_most_frequent_value(data)
1

• It may also be useful to not only get the most frequent value, but also how frequent it is
- Anytime you want to return more than one value from a function, a tuple is often a good option
- Look at the get_most_frequent function in dictionaries.py code
- only difference is that we return a tuple and also include the max_value

>>> data = read_numbers('numbers.txt')
>>> get_most_frequent(data)
(1, 7)