Skip to content
/ DAT8 Public
forked from justmarkham/DAT8

General Assembly's Data Science course in Washington, DC

Notifications You must be signed in to change notification settings

shorveer/DAT8

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DAT8 Course Repository

Course materials for General Assembly's Data Science course in Washington, DC (8/18/15 - 10/29/15).

Instructor: Kevin Markham

Tuesday Thursday
8/18: Introduction to Data Science 8/20: Command Line and Version Control
8/25: Data Reading and Cleaning 8/27: Exploratory Data Analysis
9/1: Visualization
Project Discussion Deadline
9/3: Machine Learning
Project Question and Dataset Due
9/8: Getting Data 9/10: K-Nearest Neighbors
9/15: Basic Model Evaluation 9/17: Linear Regression
9/22: First Project Presentation 9/24: Logistic Regression
9/29: Advanced Model Evaluation 10/1: Naive Bayes and Text Data
10/6: Natural Language Processing 10/8: Kaggle Competition, Draft Paper Due
10/13: Decision Trees 10/15: Ensembling
10/20: Regularization and
Clustering, Peer Review Due
10/22: Course Review
10/27: Final Project Presentation 10/29: Final Project Presentation

Before the Course Begins

  • Install Git.
  • Create an account on the GitHub website.
    • It is not necessary to download "GitHub for Windows" or "GitHub for Mac"
  • Install the Anaconda distribution of Python 2.7x.
    • If you choose not to use Anaconda, here is a list of the Python packages you will need to install during the course.
  • We would like to check the setup of your laptop before the course begins:
    • You can have your laptop checked before the intermediate Python workshop on Tuesday 8/11 (5:30pm-6:30pm), at the 15th & K Starbucks on Saturday 8/15 (1pm-3pm), or before class on Tuesday 8/18 (5:30pm-6:30pm).
    • Alternatively, you can walk through the setup checklist yourself.
  • Once you receive an email invitation from Slack, join our "DAT8 team" and add your photo.
  • Practice Python using the resources below.

Python Resources

Submission Forms


Class 1: Introduction to Data Science

Homework:

  • Work through GA's friendly command line tutorial using Terminal (Linux/Mac) or Git Bash (Windows).
  • Read through this command line reference, and complete the pre-class exercise at the bottom. (There's nothing you need to submit once you're done.)
  • Watch videos 1 through 8 (21 minutes) of Introduction to Git and GitHub.
  • If your laptop has any setup issues, please work with us to resolve them by Thursday.

Resources:


Class 2: Command Line and Version Control

  • Review the command line pre-class exercise (code)
  • Git and GitHub (slides)
  • Intermediate command line
  • Wrap up: Course schedule, office hours

Homework:

  • Complete the homework exercise listed in the command line introduction:
    • Create a Markdown file that includes your answers and the code you used to arrive at those answers.
    • Add this file to a GitHub repo that you'll use for all of your coursework.
    • Submit a link to your repo using the homework submission form.
  • Review the code from the beginner and intermediate Python workshops. If you don't feel comfortable with any of the content (excluding the "requests" and "APIs" sections), you should spend some time this weekend practicing Python:
    • If you like learning from a book, Python for Informatics has useful chapters on strings, lists, and dictionaries.
    • If you prefer interactive exercises, try these lessons from Codecademy: "Python Lists and Dictionaries" and "A Day at the Supermarket".
    • If you have more time, try missions 2 and 3 from DataQuest's Learning Python course.
    • If you've already mastered these topics and want more of a challenge, try solving Python Challenge number 1 (decoding a message) and send me your code in Slack.
  • To help you think about your own project, watch What is machine learning, and how does it work? (10 minutes) and browse through some more example student projects.

Git and Markdown Resources:

  • Pro Git is an excellent book for learning Git. Read the first two chapters to gain a deeper understanding of version control and basic commands.
  • If you want to practice a lot of Git (and learn many more commands), Git Immersion looks promising.
  • If you want to understand how to contribute on GitHub, you first have to understand forks and pull requests.
  • GitRef is my favorite reference guide for Git commands, and Git quick reference for beginners is a shorter guide with commands grouped by workflow.
  • Markdown Cheatsheet provides a thorough set of Markdown examples with concise explanations. GitHub's Mastering Markdown is a simpler and more attractive guide, but is less comprehensive.

Command Line Resources:

  • If you want to go much deeper into the command line, Data Science at the Command Line is a great book. The companion website provides installation instructions for a "data science toolbox" (a virtual machine with many more command line tools), as well as a long reference guide to popular command line tools.
  • If you want to do more at the command line with CSV files, try out csvkit, which can be installed via pip.

About

General Assembly's Data Science course in Washington, DC

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 92.5%
  • Python 7.5%