A repository for a collection of graduate school assignments.
Lending Club: This assignment from August of 2019 analyzed Lending Club data from 2012-2017, with a specific focus on years 2012- 2014 and 2015 – 2017 (before and after Lending Club’s IPO).
R_Webscraping_and_RShiny: The culmination of two separate projects in October and November 2019. Using the package Rvest in R, I web-scraped hiking trail data in New Hampshire and Maine. I also conducted machine learning on hiking trails by first programming Principle Component Analysis then K-Means clustering in Python Then, using the R packages shiny, shinythemes, plotly, ggplot2, and leaflet, designed an RShiny app to explore hiking clusters and their characteristics in order to find new trails.
Amazon Customer Reviews: A case analysis of Amazon reviews. Feature space was created from stemmed reviews using NLTK to create Document Term Matrices. From there, dimension-reduction happened in three ways (PCA, Sparse PCA, UMAP) to prepare data for machine learning algorithms (KNN, Random Forest, Gradient Boost, XGBoost) in order to predict star rating based on word occurences. From December 2019
Skin Cancer MNIST HAM 10000: The culmination to my Masters Degree. From May 2020, it features 10,0015 images of different dermatoscoptic images, some of which of skin cancer, some of which are benign. This contains a Jupyter Notebook outlining and creating a Convolutional Neural Network in an attempt to accuractely predict 7 classes.