Skip to content

Latest commit

 

History

History
85 lines (65 loc) · 4.53 KB

index.md

File metadata and controls

85 lines (65 loc) · 4.53 KB

Portfolio


Check out my Data Science Internship Experience here


Highlights of my projects

AirBnb Price Prediction

Project Using scikit-learn, we modeled on Airbnb dataset to estimate prices of Airbnb listings for the guests depending on various features like neighborhood, zipcodes, apartment type etc.

The Jupyter notebook in this repo contains the code to run Exploratory Data Analysis and Regression estimators on the Inside Airbnb listings dataset for Denver.

The target variable is the price of the listing.

Using this dataset I tried to answer some of the questions like:

  • What are the most important characteristics of a listings in Denver, and how do they influence the price?
  • Which neighborhoods in Denver have the highest rental prices?
  • What distinguishes hosts that have Superhost status? Do all Superhosts properly qualify the criteria that AirBnB has set for them?
  • Does reducing the dimensionality of the dataset lead to loss in information?
Raw Data available here


Home Depot Product Search Relevance

This is a challenge to predict the search relevance of search results on homedepot.com. More than 73% of the products in the dataset were unique items, which presented a challenge in training the model. This dataset required text cleaning and feature extraction.

I used natural language processing (NLTK) to derive the word stems on the product title, description and search terms. I then created features based on cosine distance, shared words, Edit distances, Search query length of the product title and description. Used sckit-learn models to predict the Relevance scores. Models were evaluated based on the RMSE.


Google Analytics Customer Revenue (R)

Dataset: Google Analytics data of Google Merchandise Store website. It's visit-level data, including userid, time, geo_info, pageviews, hits, referrer, ad_click, ... Link for the competition and dataset (here)

Objective: Predict the total purchase a user has made during the visits in the test set.

Customer traffic dataset was analyzed and pre-processed in R Studio platform to predict the natural log of sum of all transactions per user. Used Ensemble learning techniques from H20.ai (open source Leader in ML and AI) to train and run on the processed data to achieve a considerably lower error rate.

Please notice that this is my code for the competition before its relaunch in early Nov. (There is a data leakage identified in late Oct, so everything about this competition has been modified, including rule, dataset, and prediction objectives)

Overall it was a good learning experience, as I have been playing with the Google Analytics data off late at work to understand user behaviors, it’s actually great to have a chance to try predicting sales with those web data from Google Analytics.


Tableau Projects

Check out my (Tableau Public) for some of my Tableau projects.


Github Repositories -



Page template forked from evanca