Skip to content

General Assembly's Data Science course in Washington, DC

Notifications You must be signed in to change notification settings

justmarkham/DAT5

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

83 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DAT5 Course Repository

Course materials for General Assembly's Data Science course in Washington, DC (3/18/15 - 6/3/15).

Instructors: Kevin Markham and Brandon Burroughs

Monday Wednesday
3/18: Introduction and Python
3/23: Git and Command Line 3/25: Exploratory Data Analysis
3/30: Visualization and APIs 4/1: Machine Learning and KNN
4/6: Bias-Variance and Model Evaluation 4/8: Kaggle Titanic (Part 1)
4/13: Web Scraping, Tidy Data, Reproducibility 4/15: Linear Regression
4/20: Logistic Regression and Confusion Matrix 4/22: ROC and Cross-Validation
4/27: Project Presentation #1 4/29: Kaggle Titanic (Part 2)
5/4: Naive Bayes 5/6: Natural Language Processing
5/11: Decision Trees 5/13: Ensembles
5/18: Clustering and Regularization 5/20: Advanced scikit-learn
5/25: No Class 5/27: Databases and SQL
6/1: Course Review 6/3: Project Presentation #2

Key Project Dates

  • 3/30: Deadline for discussing your project idea(s) with an instructor
  • 4/6: Project question and dataset (write-up)
  • 4/27: Project presentation #1 (slides, code, visualizations)
  • 5/18: First draft due (draft of project paper, code, visualizations)
  • 5/25: Peer review due
  • 6/3: Project presentation #2 (project paper, slides, code, visualizations, data, data dictionary)

Key Project Links

Logistics

  • Office hours will take place every Saturday and Sunday.
  • Homework will be assigned every Wednesday and due on Monday, and you'll receive feedback by Wednesday.
  • Our primary tool for out-of-class communication will be a private chat room through Slack.

Submission Forms

Before the Course Begins

Python Resources


Class 1: Introduction and Python

  • Introduction to General Assembly
  • Course overview (slides)
  • Brief tour of Slack
  • Checking the setup of your laptop
  • Python lesson with airline safety data (code)

Homework:

Optional:

  • If we discovered any setup issues with your laptop, please resolve them before Monday.
  • If you're not feeling comfortable in Python, keep practicing using the resources above!

Class 2: Git and Command Line

  • Any questions about the course project?
  • Command line (slides)
  • Git and GitHub (slides)

Homework:

Optional:

Resources:


Class 3: Pandas

Homework:

Optional:


Class 4: Visualization and APIs

Homework:

Optional:

  • Watch Look at Your Data (18 minutes) for an excellent example of why visualization is useful for understanding your data.

Resources:


Class 5: Data Science Workflow, Machine Learning, KNN

Homework:

Optional:

Resources:


Class 6: Bias-Variance Tradeoff and Model Evaluation

  • Brief introduction to the IPython Notebook
  • Exploring the bias-variance tradeoff (notebook)
  • Discussion of the assigned reading on the bias-variance tradeoff
  • Model evaluation procedures (notebook)

Resources:

  • If you would like to learn the IPython Notebook, the official Notebook tutorials are useful.
  • To get started with Seaborn for visualization, the official website has a series of tutorials and an example gallery.
  • Hastie and Tibshirani have an excellent video (12 minutes, starting at 2:34) that covers training error versus testing error, the bias-variance tradeoff, and train/test split (which they call the "validation set approach").
  • Caltech's Learning From Data course includes a fantastic video (15 minutes) that may help you to visualize bias and variance.

Class 7: Kaggle Titanic (Part 1)

  • Guest instructor: Josiah Davis
  • Participate in Kaggle's Titanic competition
    • Work in pairs, but the goal is for every person to make at least one submission by the end of the class period!

Homework:

  • Option 1 is to do the Glass identification homework. This is a good option if you are still getting comfortable with what we have learned so far, and prefer a very structured assignment. (solution)
  • Option 2 is to keep working on the Titanic competition, and see if you can make some additional progress! This is a good assignment if you are feeling comfortable with the material and want to learn a bit more on your own.
  • In either case, please submit your code as usual, and include lots of code comments!

Class 8: Web Scraping, Tidy Data, Reproducibility

Resources:

About

General Assembly's Data Science course in Washington, DC

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •