Syllabus

Table of Contents

About the Course

This course develops the probabilistic foundations of inference in data science. It builds a comprehensive view of the decision-making and modeling life cycle in data science, including its human, social, and ethical implications. Topics include: frequentist and Bayesian decision-making, permutation testing, false discovery rate, probabilistic interpretations of models, Bayesian hierarchical models, basics of experimental design, confidence intervals, causal inference, robustness, Thompson sampling, optimal control, Q-learning, differential privacy, fairness in classification, recommendation systems and an introduction to machine learning tools including decision trees, neural networks and ensemble methods.

This class is listed as Data 102.

Prerequisites

We currently require the following (or equivalent) prerequisites:

  1. Principles and Techniques of Data Science: Data 100 covers important computational and statistical skills that will be necessary for Data 102.

  2. Probability: Data 140, EECS 126, STAT 134, IEOR 172, or Math 106. Data 140 and EECS 126 are preferred. These courses cover the probabilistic tools that will form the underpinning for the concepts covered in Data 102.

  3. Math: Math 54, Math 56, Math 110, both EE 16A and EE 16B, STAT 89a, or Physics 89. We will need some basic concepts like linear operators, eigenvectors, derivatives, and integrals to enable statistical inference and derive new prediction algorithms.

Please consult the Resources page for additional resources for reviewing prerequisite material.

Course Components

Lectures and Textbook

Lectures will be held in-person Tuesdays and Thursdays from 12:30 - 2 PM in 105 Stanley, and the companion to the lectures is the Data 102 textbook. Here’s what you need to know about the relationship between the two:

  • Both cover the same material, but sometimes will provide complementary perspectives.
  • Each lecture page will contain a listing of the corresponding textbook sections.
  • Lectures will be interactive, with several discussion questions and understanding checks that give you an opportunity to talk with your fellow students and solidify your understanding.
  • Most textbook sections have videos included with them, which are similar to, but not the same as, the corresponding lecture content.
  • Lecture recordings will not be made available (except as specified in arrangement with the DSP office).

The most common approach that students find helpful are to attend lecture and read the corresponding textbook sections, in whichever order you find most helpful to your own learning. Some students are able to follow all the material using only one or the other, but this is less common as a pathway to success.

Laptop use during lecture is discouraged. If you have circumstances that require or would benefit from the use of a laptop to support your learning during class (including but not limited to accommodations), you should fill out the exemption form.

Discussion

Discussion section will be held on Wednesdays, led by your GSIs. These sections will cover important problem-solving skills that bridge the concepts in lectures with the skills you’ll need to apply the ideas on the homework and beyond. Each week, discussion worksheets and answers (without full explanations) will be posted online the following day.

There will be 10 discussion sections, held almost every week: the sections on 10/7 and 11/26 will be cancelled, and the office hours on 11/19 and 12/3 will be replaced with office hours to help you prepare for midterm 2 and the project, respectively.

Discussion attendance will be opt-in mandatory: this means that you have the option at the start of the semester (and again at week 7) to choose between one of two options:

  • Option 1 (discussion attendance, recommended): if you choose this option, you are committing to attend discussion section every week, and work with a group of 3 of your peers (i.e., groups of 4) on discussion questions. For each discussion section you attend, you will earn 1 discussion point, and your discussion grade will be computed as (number of discussion points) / 8: in other words, you may miss up to two discussion sections with no penalty.

  • Option 2 (no attendance): if you choose this option, you are choosing to not attend discussion section. You are welcome to access the worksheets and answers (without full solutions) posted online each week. If you choose option 2, you will receive full credit (i.e., 8 free discussion points). If you choose Option 2 for half the semester (see below for information about switching), you’ll receive full credit for that half (i.e., 4 free discussion points).

While course staff strongly recommend choosing option 1, we encourage you to choose the option that will ultimately best suit your learning. In other data courses that have similar policies, choosing Option 1 is positively correlated with earning a higher grade (despite the additional attendance requirement). We will distribute a form during the first week that all students must fill out to choose between the options.

You will have the option to switch between Option 1 and Option 2 after Midterm 1 grades have been returned, in week 7.

Lab

Labs will be released every Friday evening as Jupyter notebooks, and due on Wednesdays at 5PM. Labs are a chance to get hands-on practice with the material in a more guided setting. Lab notebooks usually cover material from the past week’s lecture, and give you a chance to implement and code up the more abstract ideas from lecture.

Lab-specific office hours will be held on Mondays by GSIs. These provide a good opportunity to work on lab assignments with your GSI. You may use these as a drop-in session to get help on specific questions, or as a section to work through the lab while in the room. During these office hours, questions on the lab assignment will be prioritized over any other questions, and other office hours will prioritize non-lab questions.

When completing and submitting the lab, you may work individually or in pairs. You may pair up with any enrolled or waitlisted student for each lab: you do not have to work with the same person every week.

Homeworks

Homework assignments are released every other week on Fridays and due two Fridays after. These assignments are designed to help students develop an in-depth understanding of both the theoretical and practical aspects of ideas presented in lectures. They contain both math and coding tasks, as well as critical reflection questions that require you to explain your answers and put them into context.

  • HW1 through HW5 must be submitted to Gradescope/Pensieve by their posted deadlines.
  • Each assignment will include detailed instructions on how to submit your work for grading. It is the student’s responsibility to read these carefully and ensure that their work is submitted correctly. Assignment accommodations will not be granted in cases where students have mis-submitted their work (for example, by submitting to the wrong portal, submitting only part of an assignment, or forgetting to select pages).
  • HW6 will be a discussion-based activity on Ed.
  • The primary form of support students will have for homeworks are homework parties, office hours, and Ed.
  • HW1-4 will be worth twice as much (i.e., 3% of your overall grade) as HW5-6 (i.e., 1.5% of your overall grade).

Vitamins

Vitamins are weekly short Gradescope/Pensieve assignments to check that you are keeping up with lectures. They will be released on Thursdays after lecture and due on Sundays.

Exams

There will be two midterms in this class and a project-related quiz:

  • Midterm I on October 7th, 8-10PM
  • Midterm II on November 20th, 8-10PM
  • Project quiz on Friday, December 19th, 9-11AM

Note that the project quiz will only have questions about your final project, and is not a comprehensive final exam.

All exams must be taken in-person. You must sit the midterms at the specified time: if you have a conflict, please contact course staff ASAP at data102@berkeley.edu. We will not accept any conflicts after the drop deadline.

Final Project

At the end of the semester, you will apply the knowledge you learned in this class on a real-world dataset to complete a final project. You will be working in groups of 4. During the normal final exam time (Friday 12/19 from 9-11AM), you will take a quiz with questions about your project.

More details will be announced on Ed closer to the end of term.

Grading Policies

Grading Scheme

Grades will be assigned using the following weighted components:

Category Percentage Details
Vitamins 5% Drop 2 lowest scores
Homeworks 15% No drop; 5 slip days
Labs 12% Drop 2 lowest scores
Discussion 3% Drop 2 absences
Midterm 1 25%  
Midterm 2 25%  
Final project 15%  

Grading Criteria

  • Homework will be graded on completion and correctness. No assignment may be dropped, but we have a slip day policy (see below).
  • Lab assignments will be graded on completion and correctness, but all test cases for autograded questions will be public. Your two lowest lab scores will be dropped.
  • When submitting assignments on Gradescope/Pensieve, you must match each page to the corresponding question on Gradescope/Pensieve. If you fail to do so, you may not receive credit for your work!
  • A grading rubric and more details regarding the final project will be released later in the semester.

Regrade Requests

  • After each assignment is graded, course staff will post the deadline for regrade requests for that assignment on Ed.
  • To ensure that our grading team is not overworked, regrade requests for each assignment must be submitted before the deadline (except in cases of emergencies).
  • Note: When you submit a regrade request, we will take a fresh look at the question, so it is possible that you will receive a grade lower that what you originally received.

Slip Days

Each student gets an extension budget of 5 total slip days. You can use the extension on homework assignments only (not lab assignments, vitamins, or the final project) during the semester. Some important notes on slip days:

  • Do not plan to use your slip days: we’re providing them for unforeseen circumstances.
  • Slip days are self-serve: we’ll apply them to your assignments automatically.
  • Slip days are full days, not hours. We round up, so if you are 1 hour late, then 1 slip day will be used. (Why? We’d rather you get some sleep and make an attempt to finish the assignment the next day instead of staying up to micromanage hours.)
  • For HW5 (and HW5 only), you may only use 2 slip days.
  • Slip days can only be used for HW1 through HW5, and cannot be used on HW6 (since HW6 is a discussion activity)
  • After you have used your slip-time budget, any assignment handed in late will be marked off 20% per day late (rounded up to the nearest integer number of days).
  • No assignment will be accepted more than 5 days late.

Extenuating Circumstances

We recognize that our students come from varied backgrounds and have widely-varying experiences. If you encounter extenuating circumstances at any time in the semester, please do not hesitate to let us know by filling out this extension request form. The sooner we are made aware, the more options we have available to us to help you.

For any circumstances that cannot be resolved via slip days and drops, please contact us at data102@berkeley.edu. Within two business days, a member of course staff will reach out to you and provide a space for conversation, as well as to arrange course/grading accommodations as necessary.

We recognize that at times, it can be difficult to manage your course performance — particularly in such a huge course, and particularly at Berkeley’s high standards. Sometimes emergencies just come up (personal health emergency, family emergency, etc.). This policy is meant to lower the barrier to reaching out to us, as well as build your independence in managing your academic career long-term. So please do not hesitate to reach out.

Note that extenuating circumstances do not extend to the following:

  • Logistical oversight, such as Datahub/Gradescope/Pensieve tests not passing, submitting only one portion of the homework, forgetting to save your notebook before exporting, submitting to the wrong assignment portal, or not properly tagging pages on Gradescope/Pensieve. It is the student’s responsibility to identify and resolve these issues in advance of the deadlines.
  • Workload-related issues. It is the student’s responsibility to manage their other coursework and extracurricular commitments. We will not grant accommodations for these cases; instead, please use drops or slip days to cushion these issues.
  • Requests made after the assignment deadlines. Please make sure to submit a request before the assignment is due.

Finally, simply submitting a request does not guarantee you will receive an extension. Even if your work is incomplete, please submit before the deadline so you can receive credit for the work you did complete.

DSP Accommodations

If you are registered with the Disabled Students’ Program (DSP) you can expect to receive an email from us during the first week of classes confirming your accommodations. Otherwise, email data102@berkeley.edu. DSP students who receive approved assignment accommodations will have a 2-day extension on homeworks and 1-day extension on labs and vitamins. Please note that any extension, plus slip days, cannot exceed 5 days.

You are responsible for reasonable communication with course staff. If you make a request close to the deadline, we can not guarantee that you will receive a response before the deadline.

LLM and Generative AI Policy

LLMs like ChatGPT, Gemini, Claude, etc. are tools, and are permitted to be used in this class with some important restrictions. More important than the policies are the reasons we have them!

Why do we have these policies?

The goal of this course is to give you skills to apply sophisticated data science techniques and reasoning in real-world scenarios. Towards this end, we’ve designed homework and lab questions with a lot of guidance and structure to help you apply the skills you learn. This guidance and structure is meant to make it easy to learn things for the first time, but it also has the side effect of making things very easy for an LLM to answer.

In real-world scenarios, while using an LLM can provide a lot of help and support in solving problems, the current generation can’t identify the right questions to ask: this is something you can only learn through practice and experience. In particular, the process of getting stuck, identifying gaps in your knowledge, and getting yourself un-stuck is a critical part of the learning process. If you use an LLM to answer these questions for you, you’re robbing yourself of that experience, and also of the depth of learning.

We encourage you to reach out to course staff, either on Ed or in office hours, as your first point of contact when you get stuck. Why? Because course staff are trained to respond to your questions in a way that helps you learn, while LLMs are (literally) trained to be sycophantic and tell you what you want to hear [1].

The Actual Policy

Using LLMs in Data 102 is subject to the following policies:

  1. You may use LLMs to answer conceptual questions (e.g., “Why is a Beta(7, 9) prior stronger than a Beta(2, 3) prior?) with no restrictions.
  2. You may use LLMs to explain any solution already provided (e.g., understadning already-released homework, lab, or vitamin solutions, discussion worksheets, or past exams), but we strongly recommend you attempt to understand it yourself first.
  3. You may not use LLMs to answer homework or lab questions before turning in an assignment.
  4. If you use any LLM to answer conceptual questions or otherwise for an assignment, you must submit a full transcript of all prompts and responses.
  5. Patterns of repeated policy violations are subject to 0s on offending assignments after a warning.

Course staff are much more interested in teaching than enforcing the policy above: please make good use of your time and ours, and don’t make us have to enforce it!

There can be a fine line between items (1) and (3). Here are a few examples of acceptable and unacceptable use:

Allowed usage examples

  • What is the difference between discrete and continuous random variables?
  • What’s the numpy or scipy library function that generates beta random variables?
  • What information do I need to use Bayes’ rule?
  • When should I use Poisson vs negative binomial GLMs?

Banned usage examples

  • Give me code to do an A/B test in Python given a CSV with columns Age, Income, and Education.
  • We have a waiting time T ~ Exponential((\lambda)) and want to test (\lambda = c) vs (\lambda = 2c). Compute LR(T) explicitly in terms of c.
  • Describe what the parameters of the Pareto distribution mean in the context of the Pareto-uniform conjugate pair.

Collaboration and Academic Integrity

Data science is a collaborative activity. While you may talk with others about the homework, we ask that you write your solutions individually. If you do discuss the assignments with others please include their names at the top of your notebook. Keep in mind that content from the homeworks and labs will likely be covered on both of the midterms. We will be following the campus policy on Academic Honesty, so be sure you are familiar with it.

As a member of the Berkeley community, we expect you to follow the Berkeley Honor Code:

“As a member of the UC Berkeley community, I act with honesty, integrity, and respect for others.”

Waitlist

If you are on the waitlist, you should complete and submit all assignments as if enrolled: we will not offer any makeup assignments or extensions for waitlisted students.

For all other enrollment related issues, please reach out to the Data Science advisors, as instructors and staff do not manage enrollment into the class.

Community Resources

Device Lending Options

Students can access device lending options through the Student Technology Equity Program STEP program.

Data Science Student Climate

Data Science Undergraduate Studies faculty and staff are committed to creating a community where every person feels respected, included, and supported. We recognize that incidents may happen, sometimes unintentionally, that run counter to this goal. There are many things we can do to try to improve the climate for students, but we need to understand where the challenges lie. If you experience a remark, or disrespectful treatment, or if you feel you are being ignored, excluded or marginalized in a course or program-related activity, please speak up. Consider talking to your instructor, but you are also welcome to contact Executive Director Christina Teller at cpteller@berkeley.edu or report an incident anonymously through this online form.

Community Standards

Ed is a formal, academic space. We must demonstrate appropriate respect, consideration, and compassion for others. Please be friendly and thoughtful; our community draws from a wide spectrum of valuable experiences. For further reading, please reference Berkeley’s Principles of Community and the Berkeley Campus Code of Student Conduct.