Implement Random Tree Learner, Bag learner and Insane Learner. Evaluate all the learners implemented in this project as well as those in Part 1
This is Part 2 in a four-part series of Machine Learning Algorithms for Trading:
- Part 1: Implement the Linear Regression Learner and Decision Tree Learner and generate data that works better for one learner than the other.
- Part 2 (this repository): Implement Random Tree Learner, Bag learner and Insane Learner. Evaluate all the learners implemented in Parts 1 and 2.
- Part 3: Implement the Q-Learning and Dyna-Q solutions to the reinforcement learning problem.
- Part 4: Implement a learning trading agent using Q-learning.
Code in this project can be grouped into two categories:
1) Implement decision tree learner, random tree learner, bag learner and insane learner
-
DTLearner.py
- The Decision Tree Learner is based on J.R. Quinlan's paper. Other thanaddEvidence
andquery
, this learner also has:__build_tree
: A private function called byaddEvidence
. It builds the decision tree recursively by choosing the best feature to split on and the splitting value. The best feature has the highest absolute correlation with dataY. If all features have the same absolute correlation, choose the first feature. The splitting value is the median of the data according to the best feature.__tree_search
(self, point, row): A private function called by query. It recursively searches the decision tree matrix and returns a predicted value for a given query.get_learner_info
: It print out a tree in the form of a pandas dataframe if verbose is set to True.
-
RTLearner.py
- The Random Tree Learner is based on A. Cutler's algorithm. It has a similar API to DTLearner, but has a few key differences regarding__build_tree
:- The choice of feature to split on is be made randomly.
- For the chosen feature, calculate the mean of feature values from two randomly-chosen rows. This mean will be the splitting value for the feature.
-
BagLearner.py
- Code that implements Bootstrap Aggregating as a Python class named BagLearner. BagLearner can accept any learner (e.g., RTLearner, LinRegLearner, etc.) as input and use it to generate a learner ensemble. -
InsaneLearner.py
- An InsaneLearner should contain 20 BagLearner instances where each instance is composed of 20 instances of LinRegLearner or another learner.
2) Evaluate learners
-
analyze_learners_util.py
- Helper code to proccess, train, test and plot data. -
analyze_learners.ipynb
- Uses helper functions fromanalyze_learners_util.py
to evaluate different learners.
You need Python 2.7.x or 3.x, and the following packages: pandas, numpy, and scipy.
All the data files are in Data
subdirectory. However, we will only use Istabul.csv
to analyze the learners in the notebook analyze_learners.ipynb
. This data includes the returns of multiple worldwide indexes for a number of days in history. The overall objective is to predict what the return for the MSCI Emerging Markets (EM) index will be on the basis of the other index returns. Y in this case is the last column to the right, and the X values are the remaining columns to the left (except the first column which is the date).
To run any script file, use:
python <script.py>
To run any IPython Notebook, use:
jupyter notebook <notebook_name.ipynb>
Source: Part 3 of Machine Learning for Trading by Georgia Tech