Skip to content

Latest commit

 

History

History
146 lines (115 loc) · 7.19 KB

README.md

File metadata and controls

146 lines (115 loc) · 7.19 KB

Predicting Short-term IPO Returns

For our final project, we developed models to predict the underpricing of IPOs or Initial Public Offerings. By collecting previous IPO data, we were able to train multiple AI and machine learning models to classify the data. Our best model, a Random Forest algorithm, achieved an accuracy of about 76%. All of our development was done in python using Jupiter notebooks.

Table of Contents

Installation

Libraries

The libraries we used include:

  • scikit_learn: Used for training and classifying with our selected models as well as for analyzing the results.
  • pandas: Used for general data handling, especially importing and combining data from csv files.
  • numpy: Used for general data formatting and handling.
  • pytorch: Used to create a Binary Classification Neural Network.
  • seaborn: Visualization.
  • ipython: Development environment.

Dependencies

To install all of our project dependencies run:

pip install -r requirements.txt

Data Collection

For data collection and formatting we used pandas and .csv files. All of our data can be found in the data folder and each source that needed to be cleaned has an independent ipython notebook in the cleaning_scripts folder. The following websites and applications are where we sourced our data for the project.

Features

We train all our models using the following features:

  • Sales - 1 Yr Growth
  • Profit Margin
  • Return on Assets
  • Offer Size (M)
  • Shares Outstanding (M)
  • Offer Price
  • Market Cap at Offer (M)
  • Cash Flow per Share
  • Instit Owner (% Shares Out)
  • Instit Owner (Shares Held)
  • Real GDP Per Capita
  • OECD Composite Leading Indicator
  • Interest Rate
  • Seasonally Adjusted Unemployment Rate
  • CPI Growth Rate
  • Industry Sector
  • Industry Group
  • Industry Subgroup
  • Underpriced (Classifying Feature)

These features were selected based on the features used in previous research, along with the data that was publically available to us. Please reference our research paper for a definition of each feature.

Models

We implemented four machine-learning models that were identified by previous research done in the field. We utilized the sklearn library to implement the random forest, gradient boosting classifier, and support vector machine. We used the pytorch library to implement a neural network. All of our models can be found in the models folder.

  • Random Forest:
    • Notebook: random_forest_scikit.ipynb
    • Overall Accuracy: 76%
      • Underpriced Accuracy: 92.8%
      • Overpriced Accuracy: 16.6%
    • Implemented using the sklearn function RandomForestClassifier
  • Gradient Boosting Classifier:
    • Notebook: gradient_boosting.ipynb
    • Overall Accuracy: 75.3%
      • Underpriced Accuracy: 94.9%
      • Overpriced Accuracy: 12.1%
    • Implemented using the sklearn function GradientBoostingClassifier
  • Support Vector Machine:
    • Notebook: svm.ipynb
    • Overall Accuracy: 73.9%
      • Underpriced Accuracy: 100%
      • Overpriced Accuracy: 0%
    • Implemented using the sklearn function svm
  • Neural Network:
    • Notebook: neural_network.ipynb
    • Overall Accuracy: 70.2%
      • Underpriced Accuracy: 88.8%
      • Overpriced Accuracy: 16.2%
    • Implemented using pytorch library

Random Forest Model Configuration

To achieve a 76% accuracy for the random forest model, we first anaylized several of the model's parameters. Specifically, examined the results of every combination of the features listed below:

  • estimators - The number of trees in the forest
  • criterion - The function to measure the quality of a split
  • max_depth - The maximum allowed depth for trees
  • max_features - The number of features to consider in each tree

After analysis, the max_depth of the tree turned out to be the determining factor in a model's accuracy. The full process can be found in the test_random_forest_model_config.ipynb.

Final Project Paper

For more information view our project paper. It goes into much greater detail about our problem space, algorithms, methods, and results.

References

Research Papers

FRED Data