CS312 - Spring 2012 - Class 8

  • administrative
       - CS talks tomorrow starting at 12:35
       - go/codegolf

  • databases
       - what is a database?
          - data
             - a collection of data
             - stored on disk
             - in a structured manner
          - interface
             - a well-defined way for inserting, deleting and querying the data
       - why are they useful?
          - structured data shows up in many places
          - highly optimized to answer queries on the structured data
          - deals with data that is too big to fit into memory
          - centralized location for data
          - often allow for multiple access and will maintain data consistency
       - what aren't they good at?
          - free formed querying
          - still on disk, so can be slow for some applications
       - many different kinds of databases
          - MySQL
          - Oracle
          - DB2
          - SQLite

  • SQL
       - structured query language
       - programming language for interacting/querying databases
       - supported by most common databases
          - there is a standard
          - however, some slight variation from database to database
       

  • SQLite
       - open source
          - you can grab your own copy: http://www.sqlite.org/
       - operates via a single file on disk
       - comes installed with many linux/unix/mac distributions
       - upsides
          - low barrier to entry
          - easy setup
       - downsides
          - doesn't have all the performance optimization of mysql and others
          - may not fare as well with multiple processes
          - no server model
       - for now, this is what we'll use, though most of the commands we'll examine are portable across databases

  • tables
       - a database consists of one or more tables
       - a table contains a collection of information with shared attributes
       - a table contains columns
          - columns indicate data types
          - for example, we might have a table for students with columns
             - id
             - first name
             - last name
             - address
             - city
             - gpa
             - ...
       - the columns dictate the form of the data that will be stored in this table
          - columns have names
          - columns have types (i.e. the type of data that is stored)
          - may also have other associated with them
             - what the default value is
             - whether or not a value is required
       - tables contain data
          - an entry in a table is called a record
          - each record consists a value for each column in the table
       - look at friends.db in SQL code
          - we can start sqlite by calling it with a database name on the command-line
             $ sqlite3 friends.db

          - just like the command-line prompt and the irb shell, we can issue commands interactively from there

             sqlite>

          - we can enter queries/commands to the tab
          - inside this database are multiple tables one of which is called friends
          - the friends table consists of the following columns
             - an id
             - last_name
             - first_name
             - email
             - and a profession_id (more on this soon)

  • examining the table entries
       - all examples were run on friends.db from SQL code
       - there are many "special" commands for sqlite that are non-standard for SQL
          - in sqlite they all start with a period, e.g. ".exit" exits
          - http://www.sqlite.org/sqlite.html
       - we can find out what tables are associated with a database using the .tables command
          sqlite> .tables
          friends professions

          - this example has two different tables
       - if we want to know how a given table is structured, we can use the .schema command
          - called by itself, we get the schema (i.e. table definition) for all tables

          sqlite> .schema
          CREATE TABLE friends (
           friend_id INTEGER PRIMARY KEY AUTOINCREMENT,
           last_name TEXT NOT NULL,
           first_name TEXT NOT NULL,
           email TEXT NOT NULL DEFAULT '',
           prof_id INTEGER /* foreign key */
          );
          CREATE TABLE professions (
           prof_id INTEGER PRIMARY KEY AUTOINCREMENT,
           name TEXT NOT NULL default ''
          );


          - or you can call it and pass it a particular table name

          sqlite> .schema friends
          CREATE TABLE friends (
           friend_id INTEGER PRIMARY KEY AUTOINCREMENT,
           last_name TEXT NOT NULL,
           first_name TEXT NOT NULL,
           email TEXT NOT NULL DEFAULT '',
           prof_id INTEGER /* foreign key */
          );


       - the schema command gives a lot of information about a table
          - it tells use the columns by name:
             friend_id
             last_name
             first_name
             email
             prof_id

          - it also tells us information about what type of data is stored in that column
             - sqlite has a limited number of "data types" compared to other databases
             - http://sqlite.org/datatype3.html
                - integer
                - text
                - numeric
                - real
                - none
             - in fact, everything is basically stored as text in an sqlite database
             - other database schemas allow you much more control over the types
                - this can be useful for type checking
                - and can result in improved performance
                - but it takes more work and finesse
          - it also tells us some other information
             - NOT NULL indicates that we must specify a value (instead of allowing it to be NULL)
             - DEFAULT allows us to specify a default value
             - PRIMARY KEY
                - requires uniqueness (though you can also use UNIQUE)
                   - unique keys allow for much faster indexing
                - cannot be NULL
                - indicates to other people looking at your schema that this is the main thing to index off of
             - AUTOINCREMENT
                - if a NULL value is specified, the next integer in a sequence will be used

  • select command
       - all examples below were run on friends.db from SQL code
       - the select command allows us to select records from the database
       - in its most basic form:

          SELECT [column_name(s)] FROM [table]

          sqlite> select first_name from friends;
          Joe
          Phoebe
          Tony
          Jennifer

          - NOTE: sql is case insensitive
             - good form usually capitalized keywords (e.g. SELECT), however, it is not required
                - I often won't do it when typing interactively, but will if I put it in code or in a document

       - we can select multiple columns by separating them by a ,

          sqlite> select first_name, last_name from friends;
          Joe|Smith
          Phoebe|Lee
          Tony|Speaker
          Jennifer|Johnson
          sqlite> select last_name, first_name from friends;
          Smith|Joe
          Lee|Phoebe
          Speaker|Tony
          Johnson|Jennifer

          - entries are separated by a | (though this can be changed)
          - and are ordered based on how they were ordered in a query

       - we can select all of the columns using *

          sqlite> select * from friends;
          1|Smith|Joe|jsmith@woohoo.com|1
          2|Lee|Phoebe|superchef@aol.com|4
          3|Speaker|Tony|ts@xyz.edu|5
          4|Johnson|Jennifer|jenniferj@medny.net|2      

       - if you want to toggle the column headers on or off for your select queries use .header on and .header off
          
          sqlite> .header on
          sqlite> select * from friends;
          friend_id|last_name|first_name|email|prof_id
          1|Smith|Joe|jsmith@woohoo.com|1
          2|Lee|Phoebe|superchef@aol.com|4
          3|Speaker|Tony|ts@xyz.edu|5
          4|Johnson|Jennifer|jenniferj@medny.net|2

          sqlite> .header off
          sqlite> select * from friends;
          1|Smith|Joe|jsmith@woohoo.com|1
          2|Lee|Phoebe|superchef@aol.com|4
          3|Speaker|Tony|ts@xyz.edu|5
          4|Johnson|Jennifer|jenniferj@medny.net|2

          - special commands in sqlite are inticated by a prefixed .

  • additional modifiers to SELECT queries
       - WHERE clause
          - allows us to set criterion for those that we select
          - follows the basic select statement

          sqlite> select * from friends where first_name='Tony';
          3|Speaker|Tony|ts@xyz.edu|5
          sqlite> select first_name, last_name from friends where first_name='Tony';
          Tony|Speaker
          sqlite> select * from friends where first_name <> 'Tony';
          1|Smith|Joe|jsmith@woohoo.com|1
          2|Lee|Phoebe|superchef@aol.com|4
          4|Johnson|Jennifer|jenniferj@medny.net|2


          - we can use a variety of operators
             = (equal)
             <> (not equal)
             > (greater than)
             < (less than)
             >=
             <=
             BETWEEN (between a range, range is separated with "and", e.g. "between 2 and 3")
             LIKE (pattern search)
             IN (values in a set of values, specified by a set, e.g. "(2, 3)")

          - we can use AND and OR to combine multiple filtering criterion
          
          sqlite> select * from friends where first_name='Tony' or email like '%woohoo.com';
          1|Smith|Joe|jsmith@woohoo.com|1
          3|Speaker|Tony|ts@xyz.edu|5
          
             - % is a wildcard like * that matches anything
             - '_' matches just one character

          sqlite> select * from friends where first_name='Tony' and friend_id < 3;
          sqlite>
       - LIMIT
          - we can limit the number of results we want to see using the LIMIT command

          sqlite> select * from friends limit 2;
          1|Smith|Joe|jsmith@woohoo.com|1
          2|Lee|Phoebe|superchef@aol.com|4
          sqlite> select * from friends where friend_id <= 3 limit 1;
          1|Smith|Joe|jsmith@woohoo.com|1

             - this can be useful for larger databases when you want to make sure your query is right
             - sometimes you also just want a few results

       - ORDER BY
          - we can order the results by any column we want

          sqlite> select * from friends order by first_name;
          4|Johnson|Jennifer|jenniferj@medny.net|2
          1|Smith|Joe|jsmith@woohoo.com|1
          2|Lee|Phoebe|superchef@aol.com|4
          3|Speaker|Tony|ts@xyz.edu|5
          sqlite> select * from friends order by first_name desc;
          3|Speaker|Tony|ts@xyz.edu|5
          2|Lee|Phoebe|superchef@aol.com|4
          1|Smith|Joe|jsmith@woohoo.com|1
          4|Johnson|Jennifer|jenniferj@medny.net|2
          sqlite> select * from friends where email like '%.com' order by last_name;
          2|Lee|Phoebe|superchef@aol.com|4
          1|Smith|Joe|jsmith@woohoo.com|1

          - they are ordered in ascending order by default
          - the order is determined by the column type
             - strings/characters are alphabetical
             - numbers are numerical
             - etc
          - if you want it to be in descending order, add DESC after the column name
          - you can put multiple columns separated by a ',' in which case ties are broken by the next column in the list

  • interacting with multiple tables
       - so far, all the queries that we've looked at have only dealt with a single table
       - often we split up the data across multiple tables
       - in our friends.db, we also have a professions table
       
          sqlite> .header on
          sqlite> select * from professions;
          prof_id|name
          1|Software Developer
          2|Medical Doctor
          3|Financial Analyst
          4|Chef
          5|Professor

       - the two tables are linked to eachother via the prof_id index
          - the friends table contains an prof_id index corresponding to the prof_id index in the professions table
       - why separate this data across multiple tables? why not just put this all in one table?
          - more space efficient
             - just have to store an id rather than all the data in the friends table
          - easier to manage
             - in this case, we only have a the name of a profession
             - but we may have more data associated with a profession (e.g. salary, degree required, etc.)
             - we don't want to append all this information in the friends table
                 - this information is associated with the profession
          - easier/faster to interact with
             - we can query each of these tables independently if we want
          - shared information
             - there may be other tables besides friends that also index into the professions table
       
  • JOIN
       - the JOIN command allows us to merge two tables on a shared index
          SELECT column_name(s)
          FROM table_name1
          JOIN table_name2
          ON table_name1.column_name=table_name2.column_name

          - if the column names are ambiguous (i.e. to tables share the same column name), then you can use table.column to specify a particular table entry

          sqlite> select * from friends join professions on friends.prof_id=professions.prof_id;
          friend_id|last_name|first_name|email|prof_id|prof_id|name
          1|Smith|Joe|jsmith@woohoo.com|1|1|Software Developer
          2|Lee|Phoebe|superchef@aol.com|4|4|Chef
          3|Speaker|Tony|ts@xyz.edu|5|5|Professor
          4|Johnson|Jennifer|jenniferj@medny.net|2|2|Medical Doctor

          sqlite> select first_name,last_name,name from friends join professions on friends.prof_id=professions.prof_id;
          first_name|last_name|name
          Joe|Smith|Software Developer
          Phoebe|Lee|Chef
          Tony|Speaker|Professor
          Jennifer|Johnson|Medical Doctor
       
          sqlite> select friends.first_name, friends.last_name, professions.name from friends join professions on friends.prof_id=professions.prof_id;
          first_name|last_name|name
          Joe|Smith|Software Developer
          Phoebe|Lee|Chef
          Tony|Speaker|Professor
          Jennifer|Johnson|Medical Doctor

       - we can also combine this with any of the modifiers for select we say above

          sqlite> select first_name,last_name,name from friends join professions on friends.prof_id=professions.prof_id where name='Chef';
          first_name|last_name|name
          Phoebe|Lee|Chef

       - AS: it can be confusing when we start to join multiple tables. the AS modifier allows you to change the name of how the header is displayed

          sqlite> select first_name as "First", last_name as "Last", name as "Profession" from friends join professions on friends.prof_id=professions.prof_id where name='Chef';
          First|Last|Profession
          Phoebe|Lee|Chef

  • inserting into a table
       - so far, all we've done is query existing data in the table
       - we can both insert, update and delete data into the tables
       - INSERT: two basic forms:
          - specify all of the values
          INSERT INTO table_name VALUES (value1, value2, value3,...)

          - specify particular column values (assuming a default is allowed for that column)
          INSERT INTO table_name (column1, column2, column3,...) VALUES (value1, value2, value3,...)

          sqlite> insert into friends (last_name, first_name, email, prof_id) values("Z", "Bobby", "bobbyz@phoenix.edu", 5);
          sqlite> select * from friends;
          friend_id|last_name|first_name|email|prof_id
          1|Smith|Joe|jsmith@woohoo.com|1
          2|Lee|Phoebe|superchef@aol.com|4
          3|Speaker|Tony|ts@xyz.edu|5
          4|Johnson|Jennifer|jenniferj@medny.net|2
          5|Z|Bobby|bobbyz@phoenix.edu|5

       - NULL
          - NULL represents no value in sql (like a null pointer)
          - our friends table has 5 values
          - the first is an friends_id
             - we could specify this ourselves, but better to let sqlite just take care of it for us
             - we can say we don't have a value by specifying null

             sqlite> insert into friends values(NULL, "Z", "Bob", "bob@aol.com", 2);
             sqlite> select * from friends;
             friend_id|last_name|first_name|email|prof_id
             1|Smith|Joe|jsmith@woohoo.com|1
             2|Lee|Phoebe|superchef@aol.com|4
             3|Speaker|Tony|ts@xyz.edu|5
             4|Johnson|Jennifer|jenniferj@medny.net|2
             5|Z|Bobby|bobbyz@phoenix.edu|5
             6|Z|Bob|bob@aol.com|2

             - if we didn't include the NULL we'd get an error, since we need to specify 5 things

  • DISTINCT
       - sometimes we only want those entries that are unique
       - the DISTINCT keyword allows us to specify that
          sqlite> select last_name from friends;
          last_name
          Smith
          Lee
          Speaker
          Johnson
          Z
          Z

          sqlite> select distinct last_name from friends;
          last_name
          Johnson
          Lee
          Smith
          Speaker
          Z