Novice
Anyone who is interested in working with data and hasn't used IPython Notebook & pandas together before.
The goal is to show people how Python can be used as a practical and fun tool for working with data, as an alternative to R. After going to this talk, they'll have a good idea of the power of IPython notebook and pandas. They'll also be able to use it for some simple data analysis, because the slides double as a practice sheet for playing around with the data on your own.
I'll walk you through Python's best tools for getting a grip on data: IPython Notebook and pandas. I'll show you how to read in data, clean it up, graph it, and draw some conclusions, using some open data about the number of cyclists on Montréal's bike paths as an example.
Using the example of some cyclist sensor data from Montréal, I'll explain how to
- clean up data (fix date formatting issues, remove null values, ...)
- graph the data
- scrape weather data from the weather office website and look at the relationship between temperature & cyclists
- aggregate the data to find out how many people bike on weekdays vs weekends
- talk about possible directions to take the project (make a model using scikit-learn)
Here's an approximate outline.
- Who am I? Why do I use IPython & pandas? (2 minutes)
- What is IPython Notebook? Short demo. (5 minutes)
- What is pandas? What are its advantages over straight numpy? (5 minutes)
- Installation tips (use anaconda!) & how to start the notebook. (1 minute)
- Importing data into a dataframe. What's a dataframe? Plotting the data (3 minutes)
- Indexing and slicing dataframes (2 minutes)
- Using groupby & aggregate to get weekday counts (3 minutes)
- Resampling weather data (2 minutes)
- More slicing to zoom in on unpopular days (2 minutes)
- Questions (5 minutes)
Total: 30 minutes
I gave this talk at PyCon Canada in August and it was very well received -- people told me that it showed them how to do things they didn't know were possible, and that it was really accessible for Python beginners. I've also given versions of this talk at Montréal Python twice.
I'm also planning to submit a 1-hour tutorial on IPython notebook & pandas to PyData in NYC in November, so providing that that gets accepted I'll have even more practice talking about these tools.
Some links:
This talk was accepted! =)