Skip to content

alexhuang1117/Data-Science-Portfolio

Repository files navigation

Data Science Portfolio

This is a repository of the projects I worked on or currently working on. It is constantly updated. The projects are either written in R (R markdown) or Python (Jupyter Notebook). The goal is to use data science/statistical modelling techniques to find something that is interesting. A typical project consist of finding and cleaning data, analysis, visualization and conclusion.

Projects:

  • Plot Bitcoin Price vs S&P500 prices, and perform Granger Causality test.
  • Fitted ARIMA model on Bitcoin prices to forecast Bitcoin range of movement.
  • Keywords(R, Time Series, Causality)
  • In this project, I tried to predict US (2016) and UK (2017) election victories as the voting results of each region becomes available.
  • The prior information is the polling data and as each regions results comes out, the model is updated.
  • Monte Carlos simulation is used to simulate the winner of the election.
  • The result is compared with exchange rates fluctuations to see how the financial market kept up with the result.
  • Keywords(Python, Linear Regression, Monte Carlos Simulation, Twitter API)
  • Fitted power-law and log-normal distribution to US baby names since 1960 and compared the fit.
  • Use bootstrapping techniques to find a distribution of the power-law parameters
  • Crawled Twitter to find 20000 random user and fitted power law distribution to users' friends count and followers count.
  • Keywords(R, Power-law, Bootstrapping, Log-normal)
  • Fitted polynomial linear regression on wine quality vs wine chemical properties.
  • Used ridge and lasso regularization to tackle overfitting and compared result
  • Used cross validation to select the optimal regularization strength
  • Keywords(Python, Linear Regression, Ridge and Lasso Regularization, Cross Validation)
  • Parsed a few GB of Tweets to select all the tweets in UK and in English.
  • Used 'qdap' package to analyze the emotion of the Tweets
  • Plotted the emotions over the day and over the week and analysed the interesting results.
  • Keywords(R, Twitter API, Time Series, Sentiment Analysis)