People, Data, and Systems / Fall 2021

Updates

12/04 -- New Lecture is up: Lecture 20 - Course conclusion [slides] [recording]
12/01 -- New Lecture is up: Lecture 19 - Digital marketing and AI for political campaigns
11/28 -- New Lecture is up: Lecture 18 - Discrimination in Platforms [slides] [recording]
11/21 -- New Lecture is up: Lecture 17 - Limits to prediction [slides] [recording]
11/14 -- New Lecture is up: Lecture 16 - Differential Privacy [slides] [recording]
11/07 -- New Lecture is up: Lecture 15 - Experimentation conclusion [slides] [recording]
11/02 -- New Lecture is up: Lecture 14 - Experimentation in marketplaces [slides] [recording]

Course Description

Data science is often both about people (data of behavior), and for people (deployed models to influence behavior). Whether for online marketplaces, transportation, governmental, urban, or other socio-technical systems, effective data science in such settings requires dealing with user incentives and strategic behavior, networked and decentralized decision-making, and feedback loops between deployed models and future data. This course is about all the ways introductory statistics/data science/machine learning fails when deployed in such systems, and how to nonetheless build effective systems.

Important links

Course website
Canvas
Ed Discussion – Primary communication tool
Gradescope – Place to turn in all assignments
YouTube – Lecture recordings

Course topics

Data collection (~3 weeks)
- Data constructs, surveys, ratings, polling, and implicit data exhausts
- Challenges and biases: censoring, strategic reporting, social desirability, ratings inflation, privacy, etc
- Technical solutions to challenges: stratification, weighting, post-processing
- Non-technical solutions and case studies
Recommendations (~2 weeks)
- Collaborative filtering and personalized recommendations; individual vs demographic based recommendations
- Recommendations in practice: Capacity constraints, matching, 2-sided fairness, and other challenges (such as limited + missing data)
Algorithmic pricing (~2-3 weeks)
- Basics of posted price mechanisms, algorithmic pricing
- Personalized and dynamic pricing in practice (online marketplaces, supply/labor side wages, and roadway congestion pricing)
- Fairness, ethics, and limitations
Experimentation (~2-3 weeks)
- A/B testing basics
- Experimentation in practice: networks, interference, clustering, experimentation over time, switchbacks, 2-sided experimentation, trade-offs across experiments
- Ethics and communication of experiments
- Introduction to causal inference without experiments
Miscellaneous (~2-3 weeks): Exact topics based on student interest
- Algorithmic explainability and transparency
- Performance drift, strategic reactions to your model, Data feedback loops
- Human-in-the-loop machine learning
- Fairness audits and interventions
- Differential privacy

Instructor

Nikhil Garg

Teaching Assistant

Zhi Liu