People, Data, and Systems / Fall 2021
Updates
- 12/04 -- New Lecture is up: Lecture 20 - Course conclusion [slides] [recording]
- 12/01 -- New Lecture is up: Lecture 19 - Digital marketing and AI for political campaigns
- 11/28 -- New Lecture is up: Lecture 18 - Discrimination in Platforms [slides] [recording]
- 11/21 -- New Lecture is up: Lecture 17 - Limits to prediction [slides] [recording]
- 11/14 -- New Lecture is up: Lecture 16 - Differential Privacy [slides] [recording]
- 11/07 -- New Lecture is up: Lecture 15 - Experimentation conclusion [slides] [recording]
- 11/02 -- New Lecture is up: Lecture 14 - Experimentation in marketplaces [slides] [recording]
Course Description
Data science is often both about people (data of behavior), and for people (deployed models to influence behavior). Whether for online marketplaces, transportation, governmental, urban, or other socio-technical systems, effective data science in such settings requires dealing with user incentives and strategic behavior, networked and decentralized decision-making, and feedback loops between deployed models and future data. This course is about all the ways introductory statistics/data science/machine learning fails when deployed in such systems, and how to nonetheless build effective systems.
Important links
- Course website
- Canvas
- Ed Discussion – Primary communication tool
- Gradescope – Place to turn in all assignments
- YouTube – Lecture recordings
Course topics
- Data collection (~3 weeks)
- Data constructs, surveys, ratings, polling, and implicit data exhausts
- Challenges and biases: censoring, strategic reporting, social desirability, ratings inflation, privacy, etc
- Technical solutions to challenges: stratification, weighting, post-processing
- Non-technical solutions and case studies
- Recommendations (~2 weeks)
- Collaborative filtering and personalized recommendations; individual vs demographic based recommendations
- Recommendations in practice: Capacity constraints, matching, 2-sided fairness, and other challenges (such as limited + missing data)
- Algorithmic pricing (~2-3 weeks)
- Basics of posted price mechanisms, algorithmic pricing
- Personalized and dynamic pricing in practice (online marketplaces, supply/labor side wages, and roadway congestion pricing)
- Fairness, ethics, and limitations
- Experimentation (~2-3 weeks)
- A/B testing basics
- Experimentation in practice: networks, interference, clustering, experimentation over time, switchbacks, 2-sided experimentation, trade-offs across experiments
- Ethics and communication of experiments
- Introduction to causal inference without experiments
- Miscellaneous (~2-3 weeks): Exact topics based on student interest
- Algorithmic explainability and transparency
- Performance drift, strategic reactions to your model, Data feedback loops
- Human-in-the-loop machine learning
- Fairness audits and interventions
- Differential privacy
Instructor
Teaching Assistant
Zhi Liu