Workshop at General Assembly (Washington, DC) on October 20, 2014.
Instructor: Kevin Markham
- Why Python? (10 min.)
- Characteristics of Python
- Python vs. R
- Why Anaconda?
- Just Enough Python Basics (45 min.)
- Python interpreter (aka "Python shell"), IPython shell
- Running a simple script (code)
- Spyder IDE
- Exploring data types, lists, functions (code)
- Getting Data (20 min.)
- Public datasets in structured formats
- Accessing APIs (code, documentation)
- Scraping websites (code, pages)
- Looking at Data (5 min.)
- Data from FiveThirtyEight (GitHub repository)
- Alcohol consumption (article, modified data)
- Pandas for Data Exploration (70 min.)
- Exploring alcohol data: examining, summarizing, filtering, sorting, handling missing values, split-apply-combine (code)
- If time permits, also explore movie ratings data (description): joins, plotting
- Brief Tour of Other Modules for Data Science (5 min.)
- Recommended Resources for Self-Learning (10 min.)
- Basic Python: Codecademy, Google's Python Class, Python Tutor (to visualize code execution)
- Pandas: tutorial, book: "Python for Data Analysis" (includes numpy and basic Python)
- Web scraping: tutorial
- Command line: tutorial
- Git and GitHub: video series
- Machine learning: book and videos: "An Introduction to Statistical Learning", scikit-learn tutorials, Data Science as a Sport (video), Kaggle Titanic competition
- Data science in general: ebook: "Analyzing the Analyzers"
- Data-focused newsletters: Center for Data Innovation, O'Reilly Data Newsletter, Data Community DC
- Full-fledged courses: Data Science Specialization (9 short courses by JHU in R), Machine Learning (1 course by Andrew Ng in Matlab/Octave), Learning from Data (1 course, programming language not specified)
- General Assembly's Data Science Course (5 min.)
- Ask Me Anything