Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
The previous version used random rampling with duplicates for selecting
examples from the dataset. The algorithm idea states that it should use random sampling of m examples without duplicates because duplicates cause unstable results where the effect of some examples will be aplified and some will be ignored. Whereas since the current code set m to n_samples (all examples), there's really no point in random sampling since you use all examples anyway. The order of sample processing doesn't matter either, so I removed all this random sampling bit and just let it loop through results. In future I can modify the algorithm to use another argument m and use a random subset of examples properly (like stated in the article), but for now I did it like this. The algorithm is stable now.
- Loading branch information