CS150 - Fall 2012 - Class 15

  • quick review of dictionaries
       - creating dictionaries
          - creating an empty dictionary
             >>> d = {}
             >>> d
             {}

          - creating dictionaries with values
             >>> d = {"apple": 2, "banana": 3, "pears": 15}
             >>> d
             {'apple': 2, 'pears': 15, 'banana': 3}

       - accessing keys
          >>> d["apple"]
          2
          >>> d["banana"]
          3
          d["grapefruit"]
          Traceback (most recent call last):
           File "<string>", line 1, in <fragment>
          KeyError: 'grapefruit'

       - updating/adding values
          >>> d["apple"] = 10
          >>> d
          {'apple': 10, 'pears': 15, 'banana': 3}
          >>> d["pineapple"] = 1
          >>> d
          {'pineapple': 1, 'apple': 10, 'pears': 15, 'banana': 3}

       - deleting values
          - sometimes we want to delete a key/value pair
          - "del" does this
          >>> d
          {'pineapple': 1, 'apple': 10, 'pears': 15, 'banana': 3}
          >>> del d["pineapple"]
          >>> d
          {'apple': 10, 'pears': 15, 'banana': 3}

       - useful built-in methods
          - clear(): removes everything
          - keys(): gets the keys as a list
          - values(): gets the values as a list   

  • exercises: histogram program
       - how could we use a dictionary to generate the counts?
          >>> data = [1, 2, 3, 2, 3, 2, 1, 1, 5, 4, 4, 5]
          >>> print_counts(get_counts(data))
          1   3
          2   3
          3   2
          4   2
          5   2
       - first, we need to store them in a dictionary
          - look at the get_counts function in histogram.py code
             - creates an empty hashtable
             - iterates through the data
             - check if the data is in the dictionary already
                - if it is, just increment the count by 1
                - if it's not, add it to the dictionary with a count of 1
          - what types of things could we call get_counts on?
             - anything that is iterable!
                >>> get_counts(data)
                {1: 3, 2: 3, 3: 2, 4: 2, 5: 2}
                >>> get_counts("this is a string and strings are iterable")
                {'a': 4, ' ': 7, 'b': 1, 'e': 3, 'd': 1, 'g': 2, 'i': 5, 'h': 1, 'l': 1, 'n': 3, 's': 5, 'r': 4, 't': 4}
                >>> s = set([1, 2, 3, 4, 1, 2])
                >>> s
                set([1, 2, 3, 4])
                >>> get_counts(s)
                {1: 1, 2: 1, 3: 1, 4: 1}

                - though sets aren't that interesting :)
       - now that we have the dictionary of counts, how can we print them out?
          - there are many ways we could iterate over the things in a dictionary
             - iterate over the values
             - iterate over the keys
             - iterate over the key/value pairs
          - which one is most common?
             - since lookups are done based on the keys, iterating over the keys is the most common
          - look at print_counts function in histogram.py code
             - by default, if you say:

                for key in dictionary:
                   ...

                key will get associated with each key in the dictionary.
             - this is the same as writing
             
                for key in dictionary.keys():
                   ...

             - once we have the key, we can use it to lookup the value associated with that key and do whatever we want with the pair
          - if you want to iterate over the values, use the values() method, which returns a list of the values
          - what if you want to iterate over the key/value pairs?
             - there is a method called items() that returns key value pairs as a 2-tuple
                >>> my_dict = {"dave": 1, "anna": 15}
                >>> my_dict.items()
                [('dave', 1), ('anna', 15)]
             - how could we use this in a loop?
                
                for (key, value) in my_dict.items():
                   print "Key: " + str(key)
                   print "Value: " + str(value)
             - items() returns a list of 2-tuples, which we're iterating over

       - Does the following print like you'd like it to?
          >>> print_counts(get_counts("this is some string"))
              3
          e   1
          g   1
          i   3
          h   1
          m   1
          o   1
          n   1
          s   4
          r   1
          t   2

          - prints in a random order
          - like the values in sets, there is NO inherent ordering to the keys in a dictionary
       - how could we print this in sorted order?
          - get the keys first
          - sort them
          - then use them to iterate over the data
       - look at print_counts_sorted in histogram.py code

  • more tuple fun
       - even if you don't supply parenthesis, if you comma separate values, they're interpreted as a tuple
          >>> 1, 2
          (1, 2)
          >>> [1, 2], "b"
          ([1, 2], 'b')
       
       - unpacking tuples
          - given a tuple, we can "unpack" it's values into variables:
             >>> my_tuple = (3, 2, 1)
             >>> my_tuple
             (3, 2, 1)
             >>> (x, y, z) = my_tuple
             >>> x
             3
             >>> y
             2
             >>> z
             1

             - be careful with unpacking
                >>> (x, y) = my_tuple
                Traceback (most recent call last):
                 File "<string>", line 1, in <fragment>
                ValueError: too many values to unpack
                >>> (a, b, c, d) = my_tuple
                Traceback (most recent call last):
                 File "<string>", line 1, in <fragment>
                ValueError: need more than 3 values to unpack
          - notice that we can store anything in a tuple and unpack anything in a tuple
             my_tuple = ([1], [2], [3])
             >>> (x, y, z) = my_tuple
             >>> x
             [1]
             >>> x.append(4)
             >>> my_tuple
             ([1, 4], [2], [3])
             >>>

             - tuples are immutable, however, the objects inside a tuple may be mutable
             - why hasn't this changed the tuple?
                - the tuple still references the same three lists
                - we've just appended something on to the list

          - unpacking, combined with what we saw before, allows us to do some nice things:
             - initializing multiple variables

             >>> x, y, z = 1, 2, 3
             >>> x
             1
             >>> y
             2
             >>> z
             3
             >>> first, last = "dave", "kauchak"
             >>> first
             'dave'
             >>> last
             'kauchak'
          
             - say we have two variables x and y, how can we swap the values?
                >>> x = 10
                >>> y = 15
                # now swap the values
                >>> temp = x
                >>> x = y
                >>> y = temp
                >>> x
                15
                >>> y
                10
             - is there a better way?
                >>> x
                15
                >>> y
                10
                >>> x, y = y, x
                >>> x
                10
                >>> y
                15

  • returning multiple values from a function
       - there are times when it's useful to return more than one value from a function
       - any ideas how we might do this?
          - return a tuple of the values

          def multiple_return():
           x = 10
           y = 15

           return x, y

          >>> multiple_return()
          (10, 15)
       - if a function returns multiple values, how can we get it into multiple variables?
          - unpacking!

          >>> a, b = multiple_return()
          >>> a
          10
          >>> b
          15

  • matplotlib
       - matplotlib is a module that allows us to create our own plots within python
          - any guess where the name comes from?
             - "matlab plotting library"
          - it's a set of modules that supports plotting functionality similar to that available in matlab
          - it is NOT built-in to python
             - you have to download and install it
                - I'll post instructions on the course web page for installation
          - documentation
             - General: http://matplotlib.sourceforge.net/
             - Basic tutorial: http://matplotlib.sourceforge.net/users/pyplot_tutorial.html
             - Examples: http://matplotlib.sourceforge.net/examples/index.html
       - think about creating a graph/plot. What functionality will we want?
          - plot data
             - as points
             - as lines
             - as bars
             - ...
          - label axes
          - set the title
          - add a legend
          - annotate the graph
          - add grid
          - ...
       - matplotlib supports all of this functionality (though some is easier to get at than others)

  • look at basic_plotting.py code
       - look at simple_plot function
          - what does it do?
             - plots x values vs. y values
          - from matplotlib, we've imported pyplot, however, pyplot isn't a function, it's a module
             - ideas?
             - matplotlib is what is called a package
             - a package is a way of organizing modules
                - all the modules in a package are referenced using the . notation
             - why might this be done?
                - to avoid naming conflicts
                - just like we use modules to hold functions to avoid functions with the same name, packages keep modules with the same name from conflicting
          - notice that after plotting, we call the show() method
             - why do you think it's done this way?
                - so that we can make additional alterations to the plot, before displaying it
          - run simple_plot function
             - generates a nice looking graph
             - notice that matplotlib does a pretty good job at picking default axis values, etc.
             - a new window opens with your plot in it
                - this new window has some interactive functionality
                - most importantly, the ability to save the plot (as a .png file)

       - look at multiple_plot
          - what does it do?
             - plots two lines with the same x coordinates, but different y
          - the plot function can take any number of pairs of lists of x and y to plot multiple lines
          - notice that we can either do it with multiple arguments to pyplot or multiple separate calls to pyplot

       - look at fancy_multiple_plot
          - what does it do?
             - plot two data sets
          - plot optionally can take a list of argument that affect how the line is drawn
             - in addition to the x and y values, you have a third argument that is a string of settings
             - each thing plotted is then a triplet
          - there are a whole bunch of options, see the documentation on plot for a description:
             - http://matplotlib.sourceforge.net/api/pyplot_api.html#matplotlib.pyplot.plot
                - r = red
                - o = circle marker
                - b = blue
                - + = plus marker
                - - = solid line style

  • plotting sets vs lists
       - last week, we ran an experiment that compared lists vs. sets
       - we generated data for varying numbers of queries for asking if a list or a set included a particular value
       - we wanted to plot the times as we increased the number of queries
       
       - how did we do this last time?
          - generated the data
          - printed it out tab delimited so that we could copy and paste into Excel to plot
          - run speed_test_old in lists_vs_sets_improved.py code
       
       - could we do this using matplotlib?
          - first, what do we want to plot (e.g. what are the x axes and y axes)
             - x axis is the number of queries
             - y axis represents the time
             - plot two lines
                - one for list times
                - one for set times
          - what do we need to change in the code?
             - we could just put the graphing code in the speed_data function instead of printing it out
                - any problem with this?
                   - we lose the original functionality
             - we could copy it and put the graphing code in
                - any problem with this?
                   - we have duplicate code
             - better idea: change speed_data to generate the data and then just store it and return it
                - we can then use this data however we want (e.g. print it, plot it, etc.)
          - look at speed_data in lists_vs_sets_improved.py code
             - generate three empty lists
             - populate these lists as we get our timing data
          - look at plot_speed_data in lists_vs_sets_improved.py code
             - takes the three lists as parameters and uses those to generate a plot
             - adds a few more things to the plot to make it nicer
                - xlabel: puts some text under the x-axis
                - ylabel: puts some text next to the y-axis
                - title: adds a title to the plot
                - legend: adds a legend to the plot
          - look at run_experiment in lists_vs_sets_improved.py code
             - generates the speed data and unpacks to three different variables
             - passes these three variables to our plot_speed_data
       - what about printing functionality?
          - write another function that just prints out the values obtained from speed_data
          - look at print_speed_data in lists_vs_sets_improved.py code