About this Course
Data, Inference, and Decisions
This course develops the probabilistic foundations of inference in data science. It builds a comprehensive view of the decision-making and modeling life cycle in data science, including its human, social, and ethical implications. Topics include: frequentist and Bayesian decision-making, permutation testing, false discovery rate, probabilistic interpretations of models, Bayesian hierarchical models, basics of experimental design, confidence intervals, causal inference, robustness, Thompson sampling, optimal control, Q-learning, differential privacy, fairness in classification, recommendation systems and an introduction to machine learning tools including decision trees, neural networks and ensemble methods.
This class is listed as STAT 102.
Announcements
All course announcements will be made on Piazza.
Meeting Times
The course structure this semester is different from previous semesters as we adapt to remote learning. For changes to grading, see the grading tab.
Lecture: Each lecture will be released as a playlist of short, 3-15 minute videos totalling about an hour (total length will vary from lecture to lecture), linked to from the main course page. We ask that you view each lecture’s video before the corresponding discussion session. We’ll have Piazza threads for each lecture where you can ask questions.
Discussion session:
- When: Tuesdays and Thursdays from 10:10 AM to 11:00 AM, OR alternate time slot (to be announced)
- Where: Remote instruction over Zoom. See Piazza posts for corresponding Zoom links.
- What: Discussion of the corresponding lecture’s contents (see main page for schedule). Discussion sessions will help you bridge the big ideas in lecture videos with the problem-solving skills you’ll need for the homework.
- Every lecture will come with a short worksheet that is due before the start of the discussion session (10:10 am Tuesdays and Thursdays). We’d like you to attempt each problem and either provide your solution, or a short explanation of where you got stuck. You can either fill in your answers in the provided notebook or on paper, and upload your responses to the corresponding Gradescope assignment.
- Attendance is highly encouraged but not mandatory. These sessions will be recorded and released the following day. An alternate discussion time slot will be arranged for students with conflicts or time zone issues with the default time.
Lab: There will be no lab meetings: we ask that you complete lab assignments on your own time and use office hours to get help from course staff.
Office Hours Schedules
Please see Piazza posts for corresponding Zoom links.
For official holidays see the academic calendar.
Prerequisites
While we are working to make this class widely accessible we currently require the following (or equivalent) prerequisites :
-
Principles and Techniques of Data Science: DS100 covers important computational and statistical skills that will be necessary for DS102.
-
Probability: Probability and Random Processes EECS126, or Concepts of Probability STAT134, or Probability for Data Science STAT140, or Probability and Risk Analysis for Engineers IEOR172. EECS126 and STAT140 are prefered. These courses cover the probabilistic tools that will form the underpinning for the concepts covered in DS102.
-
Math: Linear Algebra & Differential Equations Math54, or Linear Algebra MATH110, or both Designing Information Devices and Systems I EE16A and Designing Information Devices and Systems II EE16B, or Linear Algebra for Data Science Stat89a, or Introduction to Mathematical Physics PHYSICS89. We will need some basic concepts like linear operators, eigenvectors, derivatives, and integrals to enable statistical inference and derive new prediction algorithms.
Main Instructors
See Piazza posts for Zoom OH links.
OH: TBA
OH: TBA
TAs
See Piazza posts for corresponding Zoom OH links.
OH: TBA
OH: TBA
OH: TBA
OH: TBA