This is a repository of the projects I worked on or currently working on. It is constantly updated. The projects are either written in R (R markdown) or Python (Jupyter Notebook). The goal is to use data science/statistical modelling techniques to find something that is interesting. A typical project consist of finding and cleaning data, analysis, visualization and conclusion.
- Plot Bitcoin Price vs S&P500 prices, and perform Granger Causality test.
- Fitted ARIMA model on Bitcoin prices to forecast Bitcoin range of movement.
- Keywords(R, Time Series, Causality)
- In this project, I tried to predict US (2016) and UK (2017) election victories as the voting results of each region becomes available.
- The prior information is the polling data and as each regions results comes out, the model is updated.
- Monte Carlos simulation is used to simulate the winner of the election.
- The result is compared with exchange rates fluctuations to see how the financial market kept up with the result.
- Keywords(Python, Linear Regression, Monte Carlos Simulation, Twitter API)
- Fitted power-law and log-normal distribution to US baby names since 1960 and compared the fit.
- Use bootstrapping techniques to find a distribution of the power-law parameters
- Crawled Twitter to find 20000 random user and fitted power law distribution to users' friends count and followers count.
- Keywords(R, Power-law, Bootstrapping, Log-normal)
- Fitted polynomial linear regression on wine quality vs wine chemical properties.
- Used ridge and lasso regularization to tackle overfitting and compared result
- Used cross validation to select the optimal regularization strength
- Keywords(Python, Linear Regression, Ridge and Lasso Regularization, Cross Validation)
- Parsed a few GB of Tweets to select all the tweets in UK and in English.
- Used 'qdap' package to analyze the emotion of the Tweets
- Plotted the emotions over the day and over the week and analysed the interesting results.
- Keywords(R, Twitter API, Time Series, Sentiment Analysis)