Skip to content
forked from justmarkham/DAT7

General Assembly's Data Science course in Washington, DC

Notifications You must be signed in to change notification settings

wjofarre/DAT7-1

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

71 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DAT7 Course Repository

Course materials for General Assembly's Data Science course in Washington, DC (6/1/15 - 8/12/15).

Instructor: Kevin Markham

Monday Wednesday
6/1: Introduction to Data Science 6/3: Command Line and Version Control
6/8: Data Reading and Cleaning 6/10: Exploratory Data Analysis
6/15: Visualization 6/17: Machine Learning
6/22: Getting Data
Project Discussion Deadline
6/24: K-Nearest Neighbors
Project Question and Dataset Due
6/29: Model Evaluation Part 1 7/1: Linear Regression
7/6: Logistic Regression 7/8: Model Evaluation Part 2
7/13: First Project Presentation 7/15: Naive Bayes and Text Data
7/20: Natural Language Processing 7/22: Kaggle Competition
7/27: Decision Trees
Draft Paper Due
7/29: Ensembling
8/3: Clustering, Peer Review Due 8/5: Course Review
8/10: Final Project Presentation 8/12: Final Project Presentation

Python Resources

Submission Forms


Class 1: Introduction to Data Science

Homework:

Resources:


Class 2: Command Line and Version Control

  • Command line exercise (code)
  • Git and GitHub (slides)
  • Intermediate command line
  • Wrap up: Course schedule, office hours

Homework:

  • Complete the homework exercise listed in the command line introduction. Create a Markdown document that includes your answers and the code you used to arrive at those answers. Add this file to a GitHub repo that you'll use for all of your coursework, and submit a link to your repo using the homework submission form.
  • Review the code from the beginner and intermediate Python workshops. If you don't feel comfortable with any of the content (up through the "dictionaries" section), you should spend some time this weekend practicing Python. Here are my recommended resources:
    • If you like learning from a book, Python for Informatics has useful chapters on strings, lists, and dictionaries.
    • If you prefer interactive exercises, try these lessons from Codecademy: "Python Lists and Dictionaries" and "A Day at the Supermarket".
    • If you have more time, try these much longer lessons from DataQuest: "Find the US city with the lowest crime rate" and "Discover weather patterns in LA".
    • If you've already mastered these topics and want more of a challenge, try solving the second Python Challenge and send me your code in Slack.
  • If there are specific Python topics you want me to cover next week, send me a Slack message.

Git and Markdown Resources:

  • Pro Git is an excellent book for learning Git. Read the first two chapters to gain a deeper understanding of version control and basic commands.
  • If you want to practice a lot of Git (and learn many more commands), Git Immersion looks promising.
  • If you want to understand how to contribute on GitHub, you first have to understand forks and pull requests.
  • GitRef is my favorite reference guide for Git commands, and Git quick reference for beginners is a shorter guide with commands grouped by workflow.
  • Markdown Cheatsheet provides a thorough set of Markdown examples with concise explanations. GitHub's Mastering Markdown is a simpler and more attractive guide, but is less comprehensive.

Command Line Resources:

  • If you want to go much deeper into the command line, Data Science at the Command Line is a great book. The companion website provides installation instructions for a "data science toolbox" (a virtual machine with many more command line tools), as well as a long reference guide to popular command line tools.
  • If you want to do more at the command line with CSV files, try out csvkit, which can be installed via pip.

Class 3: Data Reading and Cleaning

  • Git and GitHub assorted tips (slides)
  • Review command line homework (solution)
  • Python:
    • Spyder interface
    • Review of list comprehensions
    • Lesson on file reading with airline safety data (code, data, article)
    • Data cleaning exercise
    • Walkthrough of homework with Chipotle order data (code, data, article)

Homework:

  • Complete the homework assignment with the Chipotle data, and add a commented Python script to your GitHub repo. If you are unable to complete a part, try writing some pseudocode instead! You have until Monday to complete this assignment.

Resources:

  • PEP 8 is Python's "classic" style guide, and is worth a read if you want to write readable code that is consistent with the rest of the Python community.

Class 4: Exploratory Data Analysis

Homework:

Resources:


Class 5: Visualization

  • Part 2 of Exploratory Data Analysis with Pandas (code)
  • Visualization with Pandas and Matplotlib (code)

Homework:

Pandas Resources:

Visualization Resources:


Class 6: Machine Learning

Homework:

  • Your deadline for discussing your project ideas with an instructor is Monday, and your project question and dataset is due Wednesday.

Resources:


Class 7: Getting Data

Homework:

API Resources:

Web Scraping Resources:


Class 8: K-Nearest Neighbors

Homework:

KNN Resources:

Reproducibility Resources:

Other Resources:

  • If you would like to learn the IPython Notebook, the official Notebook tutorials are useful.
  • To get started with Seaborn for visualization, the official website has a series of tutorials and an example gallery.

About

General Assembly's Data Science course in Washington, DC

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 98.6%
  • HTML 1.4%