Resources
Here is a collection of resources that will help you learn more about various concepts and skills covered in the class. Learning by reading is a key part of being a well rounded data scientist. We will not assign mandatory reading but instead encourage you to look at these and other materials. If you find something helpful, post it on EdStem, and consider contributing it to the course website.
Jump to:
Exam Resources
Semester  Midterm (1)  Midterm 2  Final 

Spring 2023  Exam (Solutions)  Exam (Solutions)  
Fall 2022  Exam (Solutions)  Exam (Solutions)  
Spring 2022  Exam (Solutions)  Exam (Solutions)  
Fall 2021  Exam (Solutions)  Exam (Solutions)  
Spring 2021  Exam (Solutions)  Exam (Solutions)  
Fall 2020  Exam (Solutions)  Exam (Solutions)  
Spring 2020  Exam (Solutions)  
Fall 2019  Exam (Solutions)  Exam (Solutions) 
Here is a collection of resources that may help you learn more about various concepts and skills covered in the class. Learning by reading is a key part of being a wellrounded data scientist. We will not assign mandatory reading but instead encourage you to look at these materials.
Web References
In this class we will be using several key libraries. Here are their documentation pages:
 Python:
 Python Tutorial: Teach yourself python. This is a pretty comprehensive tutorial.
 Python + Numpy Tutorial this tutorial provides a great overview of a lot of the functionality we will be using in DS102.
 Python 101: A notebook demonstrating a lot of python functionality with some (minimal explanation).
 Plotting:
 matplotlib.pyplot tutorial: This short tutorial provides an overview of the basic plotting utilities we will be using.
 seaborn: The Seaborn library has some nice additional visualization functions that we may use occasionally.
 Pandas:
 The Pandas Cookbook: This provides a nice overview of some of the basic Pandas functions. However, it is slightly out of date.
 Learn Pandas A set of lessons providing an overview of the Pandas library.
 Python for Data Science Another set of notebook demonstrating Pandas functionality.
 Python for Data Analysis (Available as eBook for Berkeley students). This book provides a good reference for the Pandas library.
Textbooks from Previous Data Science Courses
Data 102 builds on material taught in previous data science courses. You may find the textbooks from those courses helpful:
 Data 8
 Data 100
 Data 140: even if you took one of the other probability prerequisite courses, this book can be a helpful reference.
Reading Resources

Data 102 Textbook Because data science is a relatively new and rapidly evolving discipline there is no single ideal textbook for this subject. The instructors are in the process of developing this online textbook for the course, they will be updating the textbook as semester progresses.
You can also find useful reading among the following collection of books, all of which are free  Patterns, Predictions, and Actions This book is a great introduction to many of the topics we cover in this course, as well as several other important topics in advanced machine learning and data science. The following chapters are particularly relevant to this class:
 Chapter 2 covers decision theory.
 Chapter 8 covers datasets: even though we won’t be talking about this much in Data 102, this is an extremely important topic to know about for doing realworld data science.
 Chapters 9 and 10 cover causal inference.
 Chapter 12 covers reinforcement learning.

Statistical Rethinking This popular online graduate course is an excellent introduction to thinking about statistics through a Bayesian and causal lens. While it goes into much more detail than Data 102 does, lectures 2, 5, and 8 are all closely related to what we cover.
 All of Statistics This book is a great, broad introduction to mathematical statistics. It begins with probability concepts (e.g. Bayes’ theorem), works through many statistical inference topics (e.g. hypothesis testing, decision theory, and bootstrap, and also includes statistical modeling (e.g. regression and causal inference)). The textbook as a whole covers many more ideas from statistics than will be used in or needed for this course, but students may still find it useful to reference specific topics within it to supplement ideas covered in lecture or review ideas from previous courses. For example:
 Chapters 13 review some background ideas about probability and random variables
 Chapter 12 discusses the statistical decision theory framework
 Section 9.3 reviews maximum likelihood estimation, while the first few sections of chapter 11 review the core idea behind Bayesian inference
 Sections 10.2, 10.6, and 10.7 cover pvalues, the likelihood ratio test, and multiple testing ideas
 Chapter 13 covers linear and logistic regression
 Chapters 78 review empirical distributions and bootstrap
 Chapter 16 covers causal inference
 ComputerAge Statistical Inference This book takes a fairly modern view of statistics, often examining the influence of computation on the field. It is useful to keep in mind that the book was written with masters’ students in mind. As such, this textbook covers many topics beyond the scope of this course, but nevertheless provides useful, highlevel discussion of some course topics for those students looking for additional information. For example:
 Chapters 2 and 3 do an excellent job of comparing and contrasting frequentist and Bayesian inference, with illustrative examples
 Chapter 4 discusses maximum likelihood estimation
 Chapter 15 provides additional details about multiple hypothesis testing and false discovery rate control
 There is also one section each on logistic regression, the EM algorithm, the bootstrap, conjugate priors, and Gibbs sampling

Causal Inference: The Mixtape This book is a great introduction and reference for all things causal inference.

Introduction to Statistical Learning (Free online PDF) This book is a great reference for the machine learning and some of the statistics material in the class

Data Science from Scratch (Available as eBook for Berkeley students) This more applied book covers many of the topics in this class using Python but doesn’t go into sufficient depth for some of the more mathematical material.

Doing Data Science (Available as eBook for Berkeley students) This books provides a unique casestudy view of data science but uses R and not Python.
 Matrix Cookbook This “cookbook” is a handy collection of facts about linear algebra and matrices.