forked from datasciencemasters/go
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
eba3391
commit d66b849
Showing
1 changed file
with
104 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,104 @@ | ||
### The Open-Source Masters | ||
|
||
I couldn't wait to go back to grad school. Literally. So I designed my own grad school and spent 5 months learning & hacking in great delight! | ||
|
||
### My Background ([linkedin](http://bit.ly/clarecorthell)) | ||
|
||
I'm a Stanford-educated Engineer, previously a Front-End Developer and UX Designer on early-stage products. I'm always in hot pursuit of deeper insight to social questions! | ||
|
||
### Goals & Motivations of the Open Source M.S. | ||
|
||
Data Science is an ideal marriage for my technical capacities, social research inquisitions, and my geekish-freakish love of statistics. | ||
|
||
### Next Steps? | ||
|
||
I'm now a Data Scientist with an incredible team at [Mattermark](http://www.mattermark.com)! | ||
|
||
*** | ||
|
||
## The Data Science Curriculum / April-August 2013 | ||
|
||
* **Intro to Data Science** [UW / Coursera](https://www.coursera.org/course/datasci) | ||
* *Topics:* Python NLP on Twitter API, Distributed Computing Paradigm, MapReduce/Hadoop & Pig Script, SQL/NoSQL, Relational Algebra, Experiment design, Statistics, Graphs, Amazon EC2, Visualization. | ||
|
||
### Math | ||
* Linear Algebra / Levandosky [Stanford / Book](http://www.amazon.com/Linear-Algebra-Steven-Levandosky/dp/0536667470/ref=sr_1_1?ie=UTF8&qid=1376546498&sr=8-1&keywords=linear+algebra+levandosky#) | ||
* Statistics [Stats in a Nutshell / Book](http://shop.oreilly.com/product/9780596510497.do) | ||
* Problem-Solving Heuristics "How To Solve It" [Polya / Book](http://en.wikipedia.org/wiki/How_to_Solve_It) | ||
|
||
### Computing | ||
* **Algorithms** | ||
* Algorithms Design & Analysis I [Stanford / Coursera](https://www.coursera.org/course/algo) | ||
* Algorithm Design [Kleinberg & Tardos / Book](http://www.amazon.com/Algorithm-Design-Jon-Kleinberg/dp/0321295358/ref=sr_1_1?ie=UTF8&qid=1376702127&sr=8-1&keywords=kleinberg+algorithms) | ||
|
||
* **Databases** | ||
* Introduction to Databases [Stanford / Coursera](https://www.coursera.org/course/db) | ||
|
||
* **Data Mining** | ||
* Mining Massive Data Sets [Stanford / Book](http://i.stanford.edu/~ullman/mmds.html) | ||
* Mining The Social Web [O'Reilly / Book](http://shop.oreilly.com/product/0636920010203.do) | ||
* Introduction to Information Retrieval [Stanford / Book](http://nlp.stanford.edu/IR-book/information-retrieval-book.html) | ||
|
||
* **Machine Learning** | ||
* Machine Learning / Ng [Stanford / Coursera](https://www.coursera.org/course/ml) | ||
* Programming Collective Intelligence [O'Reilly / Book](http://shop.oreilly.com/product/9780596529321.do) | ||
* Statistics [The Elements of Statistical Learning / Book](http://www-stat.stanford.edu/~tibs/ElemStatLearn/) ** *en process* | ||
|
||
* **Probabilistic Graphical Models** | ||
* Probabilistic Programming and Bayesian Methods for Hackers [Github / Tutorials] (https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers) | ||
* PGMs / Koller [Stanford / Coursera](https://www.coursera.org/course/pgm) ** *en process* | ||
|
||
* **Natural Language Processing** | ||
* NLP with Python [O'Reilly / Book](http://shop.oreilly.com/product/9780596516499.do) | ||
|
||
* **Analysis** | ||
* Python for Data Analysis [O'Reilly / Book](http://www.kqzyfj.com/click-7040302-11260198?url=http%3A%2F%2Fshop.oreilly.com%2Fproduct%2F0636920023784.do&cjsku=0636920023784) | ||
* Big Data Analysis with Twitter [UC Berkeley / Lectures](http://blogs.ischool.berkeley.edu/i290-abdt-s12/) | ||
* Social and Economic Networks: Models and Analysis / [Stanford / Coursera](https://www.coursera.org/course/networksonline) | ||
* Information Visualization ["Envisioning Information" Tufte / Book](http://www.amazon.com/Envisioning-Information-Edward-R-Tufte/dp/0961392118/ref=sr_1_8?ie=UTF8&qid=1376709039&sr=8-8&keywords=information+design) | ||
|
||
* **Python** (Learning) | ||
* New To Python: [Learn Python the Hard Way](http://learnpythonthehardway.org/), [Google's Python Class](code.google.com/edu/languages/google-python-class/) | ||
|
||
* **Python** (Libraries) | ||
* Basic Packages [Python, virtualenv, NumPy, SciPy, matplotlib and IPython ](http://www.lowindata.com/2013/installing-scientific-python-on-mac-os-x/) | ||
* Bayesian Inference | [pymc](https://github.com/pymc-devs/pymc) | ||
* Labeled data structures objects, statistical functions, etc [pandas](https://github.com/pydata/pandas) (See: Python for Data Analysis) | ||
* Python wrapper for the Twitter API [twython](https://github.com/ryanmcgrath/twython) | ||
* Tools for Data Mining & Analysis [scikit-learn](http://scikit-learn.org/stable/) | ||
* Network Modeling & Viz [networkx](http://networkx.github.io/) | ||
* Natural Language Toolkit [NLTK](http://nltk.org/) | ||
|
||
### Projects | ||
* Coursework | ||
* Sentiment analysis, trending topics, and friendship mapping with Twitter API | ||
* Joins and Matrix Manipulation in MapReduce (AWS EC2) | ||
* In-database Text analysis (SQL) | ||
* Sentiment analysis of movie tweets (Python) | ||
|
||
|
||
*** | ||
### A Note on Tools | ||
|
||
This degree is brought to you by: "THE INTERNET". | ||
|
||
Information is more democratized^ now than it was at any point in history. Given a little initiative and interest, you can tailor and excel in an education of your own design. The connective web made me what I am today, growing from the child obsessed with [Number Munchers](http://en.wikipedia.org/wiki/Munchers#Number_Munchers) to an adult jaw-dropping over [DBSCAN](http://en.wikipedia.org/wiki/DBSCAN). | ||
|
||
The most valuable resources I used were: | ||
* [Coursera](http://coursera.org) | ||
* [Khan Academy](https://www.khanacademy.org/math/probability/random-variables-topic/random_variables_prob_dist/v/term-life-insurance-and-death-probability) | ||
* [Wolfram Alpha](http://www.wolframalpha.com/input/?i=torus) | ||
* [Wikipedia](http://en.wikipedia.org/wiki/List_of_cognitive_biases) | ||
* [Quora](http://www.quora.com/Programming-Challenges-1/What-are-some-good-toy-problems-in-data-science) | ||
* **Kindle .mobis** (carrying textbooks is so 90s.) | ||
* PopSci Read: [The Signal and The Noise](http://www.amazon.com/Signal-Noise-Predictions-Fail-but-ebook/dp/B007V65R54/ref=tmm_kin_swatch_0?_encoding=UTF8&sr=8-1&qid=1376699450) Nate Silver | ||
* **Friends & Family** (Impossible without their support! Special Thanks to N.S.) | ||
|
||
*^ given internet access - an issue near and dear to me.* | ||
|
||
*** | ||
|
||
|
||
### I "Forked" this into the [Open Source Data Science Masters](http://datasciencemasters.org) Curriculum. | ||
|
||
[Follow me on Twitter @clarecorthell](http://twitter.com/clarecorthell) |