CS150 - Fall 2013 - Class 17

  • today: a collection of topics
       - navigating the file system with Terminal
          - calling python from Terminal
       - passing command-line arguments to our program
       - reading data from urls
       - writing files

  • File system basics
       - what is a file system?
          - it's a way of organizing files on a hard-disk
       - For most file systems, how are the files organized?
          - they are hierarchical with nested directories
             - on Macs and Linux everything starts at '/'
             - on Windows everything starts at a "drive", e.g. "C:\"
       - What is a directory?
          - a directory is a container for files and other directories
       - What is the "path" of a file?
          - The path of a file is the sequence of directories leading up to the file
       - What is a home directory?
          - On systems where multiple users can login, the home directory is the location in the file system where a user's files reside
             - commonly: /home/<username>/ for example /home/dkauchak/

  • navigating the file system with Terminal
       - How do you normally navigate through the file system?
          - Using finder/explorer
          - By clicking on directories
          - Using the mouse
       - Terminal is a program that allows you to navigate the file system and run commands/programs using just the keyboard
          - Terminal is an interactive shell for the operating system (just like Python has an interactive shell)
       - When we first start Terminal, we see the prompt:

          dkauchak-15819:~

          (Note: due to various configurations, your prompt will likely be different)
          
       - Terminal has a variety of commands that we can type that allow us to move throughout the file system without clicking
          - pwd (print working directory)
             - prints the current directory that you are in
             
                dkauchak-15819:~ dkauchak$ pwd
                /Users/dkauchak
                dkauchak-15819:~ dkauchak$
       
             - when terminal starts, it starts in your home directory
          - ls
             - lists the contents of the current directory

                dkauchak-15819:~ dkauchak$ ls
                Desktop Downloads Movies Pictures Sites classes research software workspaces
                Documents Library Music Public bin data resources temp
                dkauchak-15819:~ dkauchak$

             - notice that if we navigate to the same place with Finder, we see the same files

          - cd (change directory)
             - changes the current directory
             - we can move around the different directories by changing our current directory

                dkauchak-15819:Desktop dkauchak$ pwd
                /Users/dkauchak/Desktop
                dkauchak-15819:Desktop dkauchak$ ls
                00006.MTS Screen shot 2011-11-02 at 10.14.37 PM.png evis_flight.pdf
                DMV-VD119-Vehicle_Reg_Tax_Title_App.pdf albanian.rtf movies.rtf
                DMV-VT028-Tax_Title_Application.pdf blah.png zipf_corollary.png
                dkauchak-15819:Desktop dkauchak$

             - if we want to go up a directory, we use "..", e.g. "cd .." goes up one directory
                
                dkauchak-15819:Desktop dkauchak$ pwd
                /Users/dkauchak/Desktop
                dkauchak-15819:Desktop dkauchak$ cd ..
                dkauchak-15819:~ dkauchak$ pwd
                /Users/dkauchak
                dkauchak-15819:~ dkauchak$ ls
                Desktop Downloads Movies Pictures Sites classes research software workspaces
                Documents Library Music Public bin data resources temp

             - if you type "cd" without any arguments, it takes you back to your home directory
                dkauchak-15819:~ dkauchak$ cd classes/
                dkauchak-15819:classes dkauchak$ cd cs150/
                dkauchak-15819:cs150 dkauchak$ pwd
                /Users/dkauchak/classes/cs150
                dkauchak-15819:cs150 dkauchak$ cd
                dkauchak-15819:~ dkauchak$ pwd
                /Users/dkauchak

          - lots of other commands for moving, creating and manipulating files and directories
             - a fairly comprehensive list:
                http://ss64.com/bash/

  • windows equivalents
       - Windows has a similar program called "command"
          - To run command, under the start menu goto "run" and then run "cmd"
          - if you really want to be hard-core, download cygwin which has a similar interface to Terminal
       - I've posted the equivalent commands for windows on the course web page in the "Resources" section at the bottom, but here's a quick review
          - directories are delimited by a backslash instead of a forward slash
          - "cd" works the same
          - instead of "ls" use "dir" to list the directory contents

  • Python and Terminal
       - besides navigating files we can also run commands/programs from within Terminal
       - for example, we can run Python by typing "python"!
          
          dkauchak-15819:~ dkauchak$ python
          Python 2.7.2 (v2.7.2:8527427914a2, Jun 11 2011, 15:22:34)
          [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
          Type "help", "copyright", "credits" or "license" for more information.
          >>>

       - when we run Python from the Terminal, it executes the Python shell
       - we can interact with it just like we did in the Python shell within Wing
          >>> print "hello"
          hello
          >>> x = 4
          >>> x
          4
          >>> import math
          >>> math.sqrt(x)
          2.0

       - Wing is an IDE
          - what does IDE stand for?
             - Integrated Development Environment
          - what does that mean?
             - Wing has the python shell built into it
             - but it also has an editor for editing our programs
             - allows us to run and debug our programs
          - but it is built on top of the exact same Python we can run from Terminal

  • running Python programs from the Terminal
       - just like we can run Python programs in Wing, we can also run Python programs from the python terminal
       - first, you need to change your directory into the directory where your .py file is (using "cd")
       - once you're there, you can run your program by typing python followed by the name of your .py file (i.e. the name of your program)

          dkauchak-15819:examples dkauchak$ python print_vs_return.py
          100
          100
          25
          25
          None
          25
          dkauchak-15819:examples dkauchak$

       - when you run a program from Terminal:
          - the program executes each step, just as if we'd run it in Wing
          - Any input/output (e.g. print or raw_input) happen through the Terminal window
          - when the program finishes, you end up back at the Terminal prompt (i.e. python exits)

       - if you want to still be able to call functions, etc running a file (like in Wing), you need to run python in interactive mode:
          dkauchak-15819:examples dkauchak$ python -i print_vs_return.py
          100
          100
          25
          25
          None
          25
          >>>

  • command-line parameters
       - when you run Python programs from the Terminal, you can also specify arguments to pass extra information to the program
       - these arguments are added after the "python program_name.py"
       - look at sys_args.py code
          - there is a module called "sys"
             - has lots of functionality regarding the Python system
          - inside the module is a variable called argv (short for arguments vector)
             - this variable is a list and contains all of the things that were typed on the command-line when python started, after "python"
             - if you're running a program, the first thing in the list is always the name of the .py file
             - everything after that are any other arguments that you may want to pass to your program
                
                dkauchak-15819:examples dkauchak$ python sys_args.py
                Arguments: ['sys_args.py']
                0: sys_args.py
                dkauchak-15819:examples dkauchak$ python sys_args.py information
                Arguments: ['sys_args.py', 'information']
                0: sys_args.py
                1: information
                dkauchak-15819:examples dkauchak$ python sys_args.py these are some arguments
                Arguments: ['sys_args.py', 'these', 'are', 'some', 'arguments']
                0: sys_args.py
                1: these
                2: are
                3: some
                4: arguments

       - how might this be useful? what type of information might we pass it? How is this different than, say, raw_input?
          - another way of interacting with the program
          - often pass things like filenames, urls, numbers, etc. (similar types of things you might use for raw_input)
          - allows for repeatability
             - like in the Wing shell, we can just hit up to run the program again
             - can run this program externally, without requiring user interaction
          - this is a common phenomena for many programs
             - compare, for example, running Word by itself, vs. running it by double-clicking on a .doc(x) file

  • web pages
       - what is a web page or more specifically what's in a web page?
          - just a text file with a list of text, formatting information, commands, etc.
          - written mostly in html, but can also contains some scripting (i.e. mini-programs), for example in javascript
             - sometimes this content can be automatically generated by a program
          - this text is then parsed by the web browser to display the content
       - you can view the html source of a web page from your browser
          - in Safari: View->View Source
          - in Firefox: View->Page Source
          - in Chrome: View->Developer->View Source
       - html content
          - html consists of tags (a tag starts with a '<' and ends with a '>')
          - generally tags come in pairs, with an opening tag and closing tag, e.g. <html> ... </html>
          - lots of documentation online for html

  • reading from web pages using urllib
       - look at url_basics.py code : what does this program do?
          - uses sys.argv to get input from the user
             - if the user does not provide exactly one argument, it calls print_usage, which prints out how to run the program
          - takes a single argument from the command-line
          - if that argument is a url (web page address)
             - use the urllib module to open a connection to the web page
             - urllib.urlopen takes a web page as a parameter and opens a reader to that web page that reads a line at a time
                - this is almost identical to file reading!
                - only difference is how we open it (open, for a file vs. urllib.urlopen, for a url)
          - otherwise assume it's a file
             - and open it using open
          - print out the contents of whatever was opened
             - notice that we can read from a web page in the same way that we read from a file, a line at a time
             - we can interchange a file or a web page reader once it's opened, since the functionality is the same

       - we can run this from the command-line
          - if we don't give it any arguments on the command-line, we get the usage
             dkauchak-15819:examples dkauchak$ python url_basics.py
             url_basics.py <filename or url>

          - if we give it a web page, it prints out the source for the page
             python url_basics.py http://www.cs.middlebury.edu/~dkauchak/classes/cs150/
             <html>
             <head>
             <title>CS 150 - Computing for the Sciences - Fall 2011</title>
             ...

          - if we give it a file, it prints out the text in the file
             dkauchak-15819:examples dkauchak$ python url_basics.py url_basics.py
             import urllib
             import sys

             def print_data(reader):
             ...


  • reading web pages: ethics
       - you are reading a file on a remote server
          - you shouldn't be doing this repeatedly
          - if you're trying to debug some code, copy the source into a file and debug that way before running live
       - there are some restrictions about what content a web site owner may want you looking at
          - see http://www.robotstxt.org/

  • look at url_extractor.py code
       - what does the get_note_urls function do?
          - opens up the course web page
          - reads a line at a time
             - checks each line to see if it contains any lecture notes
             - if so, keeps track of it in a list
       - what does write_list_to_file do?
          - opens a file, this time with "w" instead of "r"
             - "w" stands for write
             - if the file doesn't exist it will create it
             - if the file does exists, it will erase the current contents and overwrite it (be careful!)
          - we can also write to a file without overwriting the contents, but instead appending to the end
             - "a" stands for append
          - just like with reading form a file, we get a file object from open
          - the "write" method writes an object to the file as a string
          - why do I have the "\n" appended on to the end of item?
             - write does NOT put a line return after the end of it
             - if you want one, you need to put it in yourself
       - what does this program do?
          - gets the lecture urls from the course web page
          - writes them to a file called "lectures.txt"