This is the repo of all work done during the DS100 course at UC Berkeley, Fall 2018, including 12 labs, 7 homeworks, and 3 projects.
HW0: Introductions(Setup, Prerequisites, and Classification)
HW1: Food Safety (Cleaning and Exploring Data with Pandas)
HW2: Bike Sharing(EDA and Visualization)
HW3: Loss Minimization(Modeling, Estimation and Gradient Descent)
HW4: Spam/Ham Classification(Feature Engineering, Logistic Regression, Cross Validation)
HW5: Hypothesis Testing: Does The Hot Hand Effect Exist?
HW6: Scalable Data Processing Using Ray
Lab01: Get familiar with JupyterHub and introduction to matplotlib, a python visualization library
Lab02: Pandas Overview
Lab03: Data Cleaning and Visualization
Lab04: Practice plotting, applying data transformations, and working with kernel density estimators. (Working with data from the World Bank containing various statistics for countries and territories around the world.)
Lab05: Modeling and Estimation
Lab06: Multiple Linear Regression and Feature Engineering
Lab07: Feature Engineering & Cross-Validation
Lab09: Logistic regression
Lab10: Use Bootstrap to Estimate Mean and Variance
Lab11: SQL, FEC Data, and Small Donors
Lab12: Introduction to dataCommons
Project1: Trump, Twitter, and Text(work with the Twitter API in order to analyze Donald Trump's tweets.)
Project2: NYC Taxi Rides(The Data Science Lifecycle)
-
Project2A
- Part1: Data Wrangling
- Part2: EDA, Visualization, Feature Engineering
-
Project2B:
- Part3: NYC Accidents Data
- Part4: Feature Engineering and Model Fitting