People, Data, and Systems / Fall 2021


Course Description

Data science is often both about people (data of behavior), and for people (deployed models to influence behavior). Whether for online marketplaces, transportation, governmental, urban, or other socio-technical systems, effective data science in such settings requires dealing with user incentives and strategic behavior, networked and decentralized decision-making, and feedback loops between deployed models and future data. This course is about all the ways introductory statistics/data science/machine learning fails when deployed in such systems, and how to nonetheless build effective systems.

Important links

Course topics

  • Data collection (~3 weeks)
    • Data constructs, surveys, ratings, polling, and implicit data exhausts
    • Challenges and biases: censoring, strategic reporting, social desirability, ratings inflation, privacy, etc
    • Technical solutions to challenges: stratification, weighting, post-processing
    • Non-technical solutions and case studies
  • Recommendations (~2 weeks)
    • Collaborative filtering and personalized recommendations; individual vs demographic based recommendations
    • Recommendations in practice: Capacity constraints, matching, 2-sided fairness, and other challenges (such as limited + missing data)
  • Algorithmic pricing (~2-3 weeks)
    • Basics of posted price mechanisms, algorithmic pricing
    • Personalized and dynamic pricing in practice (online marketplaces, supply/labor side wages, and roadway congestion pricing)
    • Fairness, ethics, and limitations
  • Experimentation (~2-3 weeks)
    • A/B testing basics
    • Experimentation in practice: networks, interference, clustering, experimentation over time, switchbacks, 2-sided experimentation, trade-offs across experiments
    • Ethics and communication of experiments
    • Introduction to causal inference without experiments
  • Miscellaneous (~2-3 weeks): Exact topics based on student interest
    • Algorithmic explainability and transparency
    • Performance drift, strategic reactions to your model, Data feedback loops
    • Human-in-the-loop machine learning
    • Fairness audits and interventions
    • Differential privacy


Teaching Assistant

Zhi Liu