[ news | information | syllabus | writing assignments | links ]

From Information to Knowledge
ID1 - Section 3 - Fall 2011

TR 11-12:15 in Edmunds 217
Prof. Chen

2/2/2012: I've archived all of the course materials (ie, most of the links below don't work) and am only leaving up the syllabus for reference. If you're looking for the material, you may be able to find a link to a more recent offering of the class here or here.


Please check here regularly for general announcements/thoughts/etc. You are responsible for things posted here.
  • 11/22/2011: look at the schedule below for the changes to the last 3 classes. You will get email by the end of the day tomorrow with your peer review assignment for Tuesday 11/29.
  • 10/14/2011: presentation/response schedule available here: <pdf>
  • Information

    Last year Eric Schmidt, then the CEO of Google, said that "There were 5 Exabytes of information created between the dawn of civilization through 2003, but that much information is now created every 2 days." As a result, rather than deciding what additional data should be collected to answer specific questions, the challenge has shifted to determining how to generate meaningful knowledge from vast stores of existing data. Consider, for example, the IBM Watson system that became famous on Jeopardy earlier this year. What technical innovations were needed to design a system that could mine over four terabytes of data to answer individual questions in under three seconds? How might similar techniques be used to improve doctors' ability to successfully treat patients? How about to better predict hospitalizations and lawsuits? In this class we will discuss technical challenges in making large datasets useful and reflect on the ethical implications of doing so. In the final research paper you will explore a particular case of your choosing in detail.

    Here's the administrivia: <pdf>

    Much of the reading for this class will come from articles and excerpts from books, on-line sources, the news media, etc. However, we will also read some actual books. In particular, you should get the following three books (I have placed an order at Huntley bookstore):

  • The Numerati, by Stephen Baker.
  • The Filter Bubble: What the Internet is Hiding from You, by Eli Pariser.
  • Final Jeopardy: Man vs. Machine and the Quest to Know Everything, by Stephen Baker.
  • In addition, I, as well as many (most?) of the other ID1 professors, will be referring to the following writing reference which is therefore strongly recommended. There should be copies available at Huntley bookstore.

  • Pocket Reference for Writers (3rd Edition), Toby Fulwiler and Alan R. Hayakawa
  • Syllabus

    Any topics or assignments that are listed for dates in the future should be taken as (potentially very) tentative. Anything for a date at least 2 days in the past will accurately reflect what actually happened in the class.

    Reading assignments should be completed by 11AM on the day they're listed (i.e. you should bring the readings to, and be prepared to discuss them in, class that day). Writing assignments are generally due by 10pm the day before they're listed in the syllabus (i.e. in time for me to take a look at them before class).

    Week Date In class topic(s) Reading for class Writing due Named presenter(s)/responder(s) + topics
    1 (Tue) 8/30 *** no class, go to convocation ***
    (Thu) 9/1 introduction, administrivia, guesstimates,
    working with numbers
    2 (Tue) 9/6 • sources of data, types of data
    • do companies only care about money, is that ok, and why
    • spreadsheet of companies with votes <pdf>
    thinking about thesis, motive
    NSF Task Force on Grand Challenges (chapter 2, pages 5-24 (or pages 29-48 in pdf))
    World's data will grow by 50X in next decade, IDC study predicts (get a sense for scale)
    • The Numerati (intro, chapter 1)
    2-3 companies with description of data and purpose (entered here) who is Baker? [John]
    (Thu) 9/8 peer review • Clarke: 9 billion names of god
    • Winerip: In Pennsylvania, suspicious erasing on state exams at 89 schools
    Deal releases findings of Atlanta school probe
    • The Numerati (chapters 2-3)
    draft of essay #1
    3 (Tue) 9/13 statistics, algorithms, data mining • Borges: "The Library of Babel" (sakai)
    • Watts and Strogatz: Collective dynamics of 'small-world' networks (just get a sense of what the article is about)
    Fraud detection & prevention
    • Schneier:
    Why data mining won't stop terror
    • The Numerati (chapters 4-5)
    final draft of essay #1 who is Schneier? [Nick] who is Borges? [Liana] who is Clarke? [George]
    (Thu) 9/15 information retrieval, data discovery (Char Booth and Sam Kone), class thoughts on The Last Question<pdf> • Asimov: The Last Question
    • Brin and Page: The Anatomy of a Large-Scale Hypertextual Web Search Engine
    • Goldman's Google sued over rankings
    • The Numerati (keep reading)
    who is Asimov [Alex]
    4 (Tue) 9/20 information retrieval, collaborative filtering, thesis • Greene: The $1 Million Netflix Challenge
    • Bell et al.: The Million Dollar Programming Prize
    Netflix awards $1 Million Netflix Prize and Announces Second $1 Million Challenge
    • Flynn's Amazon says technology, not ideology, skewed results
    • O'Brien's "you're sooooooooo predictable" (sakai)
    • Singer's Designers make data much easier to digest
    • Wikipedia page on Tag cloud
    prep for essay #2 who are Brin/Page [Bill]
    (Thu) 9/22 visualization, motivation, so what? • The Numerati (to end) first draft of essay #2
    5 (Tue) 9/27 visualization, Baker vs. Pariser, Google personalization • Filter Bubble (intro)
    • Friedman's Data Visualization: modern approaches
    • Friedman's Data Visualization and Infographics
    peer reviews for essay #2 who is Eli Pariser? [Hannah]
    (Thu) 9/29 perspectives on data conglomerates, filtering news • Filter Bubble (chapters 1-2)
    6 (Tue) 10/4 impact of filtering, grading • Filter Bubble (chapter 3) second draft of essay #2
    (Thu) 10/6 impact - privacy • Filter Bubble (chapters 4-5)
    • Tossell's Facial-recognition technology needs limits, privacy advocates warn
    • Wortham's London police use Flickr to identify looters
    • Boutin's You are what you search
    • Angwin's Latest in web tracking: stealthy 'supercookies'
    7 (Tue) 10/11 impact - permanence, working with sources • Borges: "Fumes the Memorius" (sakai)
    • Rosen's The Web Means the End of Forgetting
    • Richmond's How to Fix (or Kill) Web Data About You
    • Filter Bubble (chapter 6-7)
    who is Andrew Feldmar? [Catherine]
    (Thu) 10/13 impact - human • Thompson's A Head for Detail
    • Sparrow, Liu, and Wegner article "Google Effects on Memory", click on "Full Text (PDF)" towards bottom of this page
    • Filter Bubble (to end)
    final draft of essay #2 who is Gordon Bell? [Kenny]
    8 (Tue) 10/18
    *** No class - fall recess *** (start working on research paper!)
    (Thu) 10/20 Library session: in Keck 2 room in Honnold/Mudd prep for essay #3
    9 (Tue) 10/25 impact - environmental prep 2 for essay #3
    (Thu) 10/27 5-minute overviews + discussion pdf schedule
    10 (Tue) 11/1 5-minute overviews + discussion pdf schedule
    (Thu) 11/3 5-minute overviews + discussion pdf schedule
    11 (Tue) 11/8 last 5-minute overview
    applications in education
    • Morris's Mining student data could save lives
    • Linden on Massive scale data mining for education
    • Final Jeopardy (intro, chapters 1-2)
    [Alex, Trey]
    (Thu) 11/10 applications in science, sustainability
    paper workshop
    • CRA blog: US, China collaborations in computing and sustainability
    • Final Jeopardy (chapters 3-4)
    first draft of essay #3
    12 (Tue) 11/15 applications in law • Markoff's Armies of expensive lawyers, replaced by cheaper software
    • Brozan's Divorce lawyers' new friend: social networks
    • Final Jeopardy (chapters 5-7)
    (Thu) 11/17 applications in health • Savage's remaking American medicine
    • Ginsberg et al's Detecting influenza epidemics using search engine query data
    • Final Jeopardy (8-10)
    who is Henrietta Lacks? [TBD]
    13 (Tue) 11/22 applications
    artificial intelligence
    • Final Jeopardy (to end) second draft of essay #3
    (Thu) 11/24
    *** No class - Thanksgiving ***
    14 (Tue) 11/29
    Final Jeopardy,
    peer reviews for essay #3
    (Thu) 12/1 wrapping up and looking ahead
    15 (Tue) 12/6 4 minute lightning talks essay #4, final draft of essay #3 due by noon on Friday 12/9

    Writing Assignments

    1. Essay #1 - How much data? <.pdf>
    2. Essay #2 - Processing data <.pdf>
    3. Research paper <.pdf>
    4. Non-traditional piece


    "Computers do not solve problems, they execute solutions"
    --Laurent Gasser