Skip to content

Latest commit

 

History

History
51 lines (30 loc) · 2.53 KB

linda-george-transcript.md

File metadata and controls

51 lines (30 loc) · 2.53 KB

###Open Source Data Science

Transcript for L. George (snapshot as of June 20th, 2014)

###Data Science / Analytics Coursework

Data Analysis: Applied statistics course using R. Coursera/Johns Hopkins U. Completed 3/22/13.

Computing for Data Analysis: Using R for effective data analysis. Coursera/Johns Hopkins U. Completed 4/17/13.

Web Intelligence and Big Data: Search, indexing, sentiment analysis, MapReduce, classification and clustering algorithms, Bayesian inference, and feature selection. Tools: Python, SQLLite. Coursera/IIT Delhi. Completed 6/6/13.

Introduction to Data Science: SQL/NoSQL, Hadoop, MapReduce, statistical modeling and machine learning, sentiment analysis (via Twitter API), visualization. Tools: Python, SQLLite, Tableau. Coursera/U. of Washington. Completed 6/29/13.

Zipfian Academy data science training program (not open source): Q1-Q2, 2014.

  • Worked with a diverse range of analytic algorithms and approaches, including supervised and unsupervised machine learning, recommender systems, natural language processing, A/B testing, and methods for large-scale data storage and retrieval. Languages: Python, MySQL.
  • Implemented WorkVibes, a summarization tool for company reviews. WorkVibes curates rich, distinctive content from review corpora, using part-of-speech tagging and aggregate TF-IDF weights to identify relevant opinions across large numbers of reviews. The data pipeline includes data acquisition from Glassdoor.com, HTML parsing, and the use of MySQL for storing reviews.

Machine Learning: Explored a range of machine learning approaches from regression to neural networks, anomaly detection, and machine learning at scale. Coursera/Stanford. Completed 6/4/14.

###Computing

####Software Engineering

Introduction to Systematic Program Design: Modeling information and structuring programs in a systematic way. Coursera/U. of British Columbia. Completed 9/11/13.

###Database MySQL Crash Course: Overview of MySQL. Book/Forta. Completed 12/30/13.

###Future steps

Natural Language Processing. Coursera/Stanford

Probabilistic Graphical Models. Coursera/Stanford.

Introduction to Databases. Coursera/Stanford.

Background

Social/personality psychologist with computer science background: LinkedIn

Favorite resources:

  • Coursera
  • O'Reilly books Python for Data Analysis, Natural Language Processing with Python, and Doing Data Science
  • ThinkPython book, A. Downey
  • Stack Overflow
  • Quora

Latest version of this transcript