BanditLib: a simple multi-armed bandit library
.-. .-') ('-. .-') _ _ .-') _ .-') _ .-. .-')
\ ( OO ) ( OO ).-. ( OO ) )( ( OO) ) ( OO) ) \ ( OO )
;-----.\ / . --. /,--./ ,--,' \ .'_ ,-.-') / '._ ,--. ,-.-') ;-----.\
| .-. | | \-. \ | \ | |\ ,`'--..._) | |OO)|'--...__)| |.-') | |OO)| .-. |
| '-' /_).-'-' | || \| | ) | | \ ' | | \'--. .--'| | OO ) | | \| '-' /_)
| .-. `. \| |_.' || . |/ | | ' | | |(_/ | | | |`-' | | |(_/| .-. `.
| | \ | | .-. || |\ | | | / : ,| |_.' | | (| '---.',| |_.'| | \ |
| '--' / | | | || | \ | | '--' /(_| | | | | |(_| | | '--' /
`------' `--' `--'`--' `--' `-------' `--' `--' `------' `--' `------'
1. About
2. Environment
3. Quick run
4. Misc
This is a C++ package for multi-armed bandit simulations. This package is designed to be
- Simple : easy to understand and extend, but not optimized for speed.
- Independent : does not require external library.
- Arms:
- Binary and Normal distribution of rewards (arms) are implemented.
- Policies:
- DMED for binary rewards [1]
- Epsilon-Greedy
- KL-UCB [2]
- MOSS [3]
- Thompson sampling for binary rewards [4]
- UCB [5]
- UCB-V [6]
This program supports a linux/GNU C++ environment. We do not check windows/MacOSX.
More formally, this program depends on:
- C++0x: modern C++ compiler (preferably GNU C++ (g++))
- waf (included) [7]: build script
- cmdline.h (included) [8]: command line parser
Type
./compile
./build/main -r 10
to run 10 simulation runs. The result of the runs will be written in out/example1.txt
This package also includes a simple plot tool (simpleplot.py) that is dependent on Python/Matplotlib. If your environment is g++/Python ready, try
./example.sh
The implementation of the beta distribution sampler is from [9]. The logo was generated by using [10].
##References [1] J. Honda, A. Takemura: An asymptotically optimal policy for finite support models in the multiarmed bandit problem. Machine Learning 85(3) 2011, p.361-391 [2] Aurélien Garivier, Olivier Cappé: The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond. COLT 2011: 359-376 [3] J-Y. Audibert and S. Bubeck: Minimax Policies for Adversarial and Stochastic Bandits. Proceedings of the 22nd Annual Conference on Learning Theory 2009 [4] Thompson, William R: On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3–4):285–294, 1933 [5] Peter Auer, Nicolò Cesa-Bianchi and Paul Fische: Finite-time analysis of the multiarmed bandit problem. Machine Learning 47 2002 p.235-256 [6] J.-Y. Audibert, R. Munos, Cs. Szepesvári: Exploration-exploitation trade-off using variance estimates in multi-armed bandits. Theoretical Computer Science Volume 410 Issue 19 Apr. 2009 pp. 1876-1902 [7] Waf - The meta build system: https://code.google.com/p/waf/ [8] Hideyuki Tanaka: cmdline https://github.com/tanakh/cmdline [9] Joseph Mansfield: A comment on stackoverflow http://stackoverflow.com/questions/15165202/random-number-generator-with-beta-distribution [10] Text to Ascii Art Maker: http://patorjk.com/software/taag/
##Author Junpei Komiyama (junpei.komiyama atmark gmail.com)
This software is released under the MIT License, see LICENSE.txt.