If you have any question, please email me at: [email protected]
The course will meet two sessions per week, each session lasts two hours. There will be forty five minutes lecture following an hour and fifteen minutes lab time, students will work on real-world projects under guidance of instructor (students supposed to work on the projects at home, during the lab time, instructor will help students to troubleshoot errors as well as perfect the projects).
- Session 1: Introduction, Anaconda setup, Jupyter notebook, getting familiar with Pandas, numpy, scipy, SQLite database
- Session 2: Statistics, probability, hypothesis testing, t-test, p-value
Project: database, statistics, probability
- Session 1: Probability distributions, chi squared, Bernoulli, Normal, Central Limit Theorem
- Session 2: Visualizations (matplotlib, seaborn), testing loans data, A/B test, RFC experiment
Project: probability plotting, analysis report (using statistic models)
- Session 1: Acquiring data in Json format, download and clean Citi Bike data
Project: citibike - Session 2: work on project, store and analyze an hour Citi Bike data
- Session 1: Acquiring weather data from an API, store and profile data
Project: Temperature - Session 2: work on project, analyze temperature data
- Session 1: HTML and CSS for web scraping, scape data from United Nations
Project: education - Session 2: work on project, store and profile scraped UN data, compare GDP to educational attainment
- Session 1: Overview Linear Regression, clean and plot data
Project: Linear Regression - Session 2: work on project, Linear Regression Analysis
- Session 1: Overview Logistic Regression, data cleaning
Project: Logistic Regression - Session 2: work on project, Logistic Regression Analysis
- Session 1: Overview Multivariate and Time Series
Project: Multivariate Analysis, Time Series - Session 2: work on project, Multivariate Analysis and Time Series
- Session 1: Over fitting and Cross Validation, Decision Tree and Random Forest
Project: Random Forest - Session 2: work on project, data cleaning, Random Forest Analysis
- Session 1: Bayes, data cleaning, Bayes Analysis
Project: Bayes - Session 2: K-Nearest Neighbors, Clustering, data cleaning
Project: knn, kmeans
- Session 1: Support Vector Machine
Project: SVM - Session 2: Principal Component Analysis, Linear Discriminant Analysis
Project: PCA, LDA
These two weeks will dedicate to Capstone project. Each student will propose his/her own project to work on independently under instructor’s guidance.