Big Data (http://bit.ly/bigdataslides)

Target Stores

  • Problem:
    • Get customers to do all their shopping at Target
    • People shop for different things in different places
    • However new parents have no time to shop and can be convinced
    • Birth records are public. Baraged with promotion at that point.
    • Goal: find out and start marketing before baby is born!
  • Statistical analysis detected among shoppers who had signed up for the “Baby Register” bought:
    • More lotion
    • More calcium supplements
    • More cotton balls and soap
    • etc.
    • So: Begin marketing to them early. However: Unintended side effects

What is “Big Data”

  • As usual, a new term has many varied definitions
  • Huge amounts of data being collected because of advances in technology
    • Online transactions
    • Electronic Records
    • Online behavior
    • Instrumentation and automatic intake
  • Advances in Computation and Statistics
    • Process the data in real time
    • New kinds of databases (beyond tabular “SQL” databases)
  • Sources of the big data
    • Science (Genomics, Weather, …)
    • Business (Amazon, Google, …)
    • Government (Medicare, Obamacare, …)
  • Databases: Column databases, No SQL Databases
  • MapReduce: Computational architecture for mass scaling
  • Cloud Computing

Challenges

  • “Big Data Hubris” - Just because you have a lot of data
  • “Bad Data” - Google Flu badly failed to predict. Search for Flu Remedy” does not mean I have flu
  • “Correlation vs. Causation” - Deep statistical insight and humility required
  • “Gaming the System” - Grading or student performance based on analysis of essays