People, Data, and Systems / Fall 2021

Updates

  • 11/28 -- New Lecture is up: Lecture 18 - Discrimination in Platforms [slides]
  • 11/21 -- New Lecture is up: Lecture 17 - Limits to prediction [slides] [recording]
  • 11/14 -- New Lecture is up: Lecture 16 - Differential Privacy [slides] [recording]
  • 11/07 -- New Lecture is up: Lecture 15 - Experimentation conclusion [slides] [recording]
  • 11/02 -- New Lecture is up: Lecture 14 - Experimentation in marketplaces [slides] [recording]
  • 10/31 -- New Lecture is up: Lecture 13 - Experimentation -- Peeking and Interference [slides] [recording]
  • 10/29 -- New Assignment released: [Homework #4 - Experimentation]

Course Description

Data science is often both about people (data of behavior), and for people (deployed models to influence behavior). Whether for online marketplaces, transportation, governmental, urban, or other socio-technical systems, effective data science in such settings requires dealing with user incentives and strategic behavior, networked and decentralized decision-making, and feedback loops between deployed models and future data. This course is about all the ways introductory statistics/data science/machine learning fails when deployed in such systems, and how to nonetheless build effective systems.

Important links

Course topics

  • Data collection (~3 weeks)
    • Data constructs, surveys, ratings, polling, and implicit data exhausts
    • Challenges and biases: censoring, strategic reporting, social desirability, ratings inflation, privacy, etc
    • Technical solutions to challenges: stratification, weighting, post-processing
    • Non-technical solutions and case studies
  • Recommendations (~2 weeks)
    • Collaborative filtering and personalized recommendations; individual vs demographic based recommendations
    • Recommendations in practice: Capacity constraints, matching, 2-sided fairness, and other challenges (such as limited + missing data)
  • Algorithmic pricing (~2-3 weeks)
    • Basics of posted price mechanisms, algorithmic pricing
    • Personalized and dynamic pricing in practice (online marketplaces, supply/labor side wages, and roadway congestion pricing)
    • Fairness, ethics, and limitations
  • Experimentation (~2-3 weeks)
    • A/B testing basics
    • Experimentation in practice: networks, interference, clustering, experimentation over time, switchbacks, 2-sided experimentation, trade-offs across experiments
    • Ethics and communication of experiments
    • Introduction to causal inference without experiments
  • Miscellaneous (~2-3 weeks): Exact topics based on student interest
    • Algorithmic explainability and transparency
    • Performance drift, strategic reactions to your model, Data feedback loops
    • Human-in-the-loop machine learning
    • Fairness audits and interventions
    • Differential privacy

Instructor

Teaching Assistant

Zhi Liu