Predicting Short-term IPO Returns

For our final project, we developed models to predict the underpricing of IPOs or Initial Public Offerings. By collecting previous IPO data, we were able to train multiple AI and machine learning models to classify the data. Our best model, a Random Forest algorithm, achieved an accuracy of about 76%. All of our development was done in python using Jupiter notebooks.

Installation

Libraries

The libraries we used include:

scikit_learn: Used for training and classifying with our selected models as well as for analyzing the results.
pandas: Used for general data handling, especially importing and combining data from csv files.
numpy: Used for general data formatting and handling.
pytorch: Used to create a Binary Classification Neural Network.
seaborn: Visualization.
ipython: Development environment.

Dependencies

To install all of our project dependencies run:

pip install -r requirements.txt

Data Collection

For data collection and formatting we used pandas and .csv files. All of our data can be found in the data folder and each source that needed to be cleaned has an independent ipython notebook in the cleaning_scripts folder. The following websites and applications are where we sourced our data for the project.

IPOScoop Data:
- Source: IPOScoop.com
- Notebook: scoop-data.ipynb
- We converted the .xls file into a .csv file in the cleaning file.
- IPOs from 2000 to 2020.
Ticker Data:
- Source: Ticker.com
- Notebook: ticker-data.ipynb
- Scraped directly from the website using pandas read_html function.
- Concatenated data from 2012 to 2022.
Bloomberg Data:
- Source: Bloomberg Terminal
- Notebook: bloomberg-sector-cleaning.ipynb
- Downloaded datat from computer access in the business library.
FRED Data:
- Source: St. Louis Federal Reserve Economic Data database.
- Used to source macroeconmic data on a specific industry and sector.

Features

We train all our models using the following features:

Sales - 1 Yr Growth
Profit Margin
Return on Assets
Offer Size (M)
Shares Outstanding (M)
Offer Price
Market Cap at Offer (M)
Cash Flow per Share
Instit Owner (% Shares Out)
Instit Owner (Shares Held)
Real GDP Per Capita
OECD Composite Leading Indicator
Interest Rate
Seasonally Adjusted Unemployment Rate
CPI Growth Rate
Industry Sector
Industry Group
Industry Subgroup
Underpriced (Classifying Feature)

These features were selected based on the features used in previous research, along with the data that was publically available to us. Please reference our research paper for a definition of each feature.

Models

We implemented four machine-learning models that were identified by previous research done in the field. We utilized the sklearn library to implement the random forest, gradient boosting classifier, and support vector machine. We used the pytorch library to implement a neural network. All of our models can be found in the models folder.

Random Forest:
- Notebook: random_forest_scikit.ipynb
- Overall Accuracy: 76%
  - Underpriced Accuracy: 92.8%
  - Overpriced Accuracy: 16.6%
- Implemented using the sklearn function RandomForestClassifier
Gradient Boosting Classifier:
- Notebook: gradient_boosting.ipynb
- Overall Accuracy: 75.3%
  - Underpriced Accuracy: 94.9%
  - Overpriced Accuracy: 12.1%
- Implemented using the sklearn function GradientBoostingClassifier
Support Vector Machine:
- Notebook: svm.ipynb
- Overall Accuracy: 73.9%
  - Underpriced Accuracy: 100%
  - Overpriced Accuracy: 0%
- Implemented using the sklearn function svm
Neural Network:
- Notebook: neural_network.ipynb
- Overall Accuracy: 70.2%
  - Underpriced Accuracy: 88.8%
  - Overpriced Accuracy: 16.2%
- Implemented using pytorch library

Random Forest Model Configuration

To achieve a 76% accuracy for the random forest model, we first anaylized several of the model's parameters. Specifically, examined the results of every combination of the features listed below:

estimators - The number of trees in the forest
criterion - The function to measure the quality of a split
max_depth - The maximum allowed depth for trees
max_features - The number of features to consider in each tree

After analysis, the max_depth of the tree turned out to be the determining factor in a model's accuracy. The full process can be found in the test_random_forest_model_config.ipynb.

Final Project Paper

For more information view our project paper. It goes into much greater detail about our problem space, algorithms, methods, and results.

References

Research Papers

Predicting IPO underperformance using machine learning by Rachit Agrawal
Textual Information and IPO Underpricing: A Machine Learning approach by Apostolos Katsafados et al.
A Neural Network Model to Predict Initial Return of Chinese SMEs Stock Market Initial Public Offerings by Dan Meng
Proceedings of the 2021 3rd International Conference on Economic Management and Cultural Industry by Kelai Wang

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
.ipynb_checkpoints		.ipynb_checkpoints
cleaning_scripts		cleaning_scripts
data		data
models		models
.gitignore		.gitignore
B351_Main_Project_Final_Paper.pdf		B351_Main_Project_Final_Paper.pdf
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predicting Short-term IPO Returns

Table of Contents

Installation

Libraries

Dependencies

Data Collection

Features

Models

Random Forest Model Configuration

Final Project Paper

References

Research Papers

FRED Data

About

Releases

Packages

Languages

mborhi/IPO-Underpricing

Folders and files

Latest commit

History

Repository files navigation

Predicting Short-term IPO Returns

Table of Contents

Installation

Libraries

Dependencies

Data Collection

Features

Models

Random Forest Model Configuration

Final Project Paper

References

Research Papers

FRED Data

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages