###Open Source Data Science
Transcript for L. George (snapshot as of June 20th, 2014)
###Data Science / Analytics Coursework
Data Analysis: Applied statistics course using R. Coursera/Johns Hopkins U. Completed 3/22/13.
Computing for Data Analysis: Using R for effective data analysis. Coursera/Johns Hopkins U. Completed 4/17/13.
Web Intelligence and Big Data: Search, indexing, sentiment analysis, MapReduce, classification and clustering algorithms, Bayesian inference, and feature selection. Tools: Python, SQLLite. Coursera/IIT Delhi. Completed 6/6/13.
Introduction to Data Science: SQL/NoSQL, Hadoop, MapReduce, statistical modeling and machine learning, sentiment analysis (via Twitter API), visualization. Tools: Python, SQLLite, Tableau. Coursera/U. of Washington. Completed 6/29/13.
Zipfian Academy data science training program (not open source): Q1-Q2, 2014.
- Worked with a diverse range of analytic algorithms and approaches, including supervised and unsupervised machine learning, recommender systems, natural language processing, A/B testing, and methods for large-scale data storage and retrieval. Languages: Python, MySQL.
- Implemented WorkVibes, a summarization tool for company reviews. WorkVibes curates rich, distinctive content from review corpora, using part-of-speech tagging and aggregate TF-IDF weights to identify relevant opinions across large numbers of reviews. The data pipeline includes data acquisition from Glassdoor.com, HTML parsing, and the use of MySQL for storing reviews.
Machine Learning: Explored a range of machine learning approaches from regression to neural networks, anomaly detection, and machine learning at scale. Coursera/Stanford. Completed 6/4/14.
###Computing
####Software Engineering
Introduction to Systematic Program Design: Modeling information and structuring programs in a systematic way. Coursera/U. of British Columbia. Completed 9/11/13.
###Database MySQL Crash Course: Overview of MySQL. Book/Forta. Completed 12/30/13.
###Future steps
Natural Language Processing. Coursera/Stanford
Probabilistic Graphical Models. Coursera/Stanford.
Introduction to Databases. Coursera/Stanford.
Social/personality psychologist with computer science background: LinkedIn
- Coursera
- O'Reilly books Python for Data Analysis, Natural Language Processing with Python, and Doing Data Science
- ThinkPython book, A. Downey
- Stack Overflow
- Quora