-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathrl.hpp
420 lines (358 loc) · 11.5 KB
/
rl.hpp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
/* This file is part of rl-lib
*
* Copyright (C) 2010, Supelec
*
* Author : Herve Frezza-Buet and Matthieu Geist
*
* Contributor :
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public
* License (GPL) as published by the Free Software Foundation; either
* version 3 of the License, or any later version.
*
* This library is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License for more details.
*
* You should have received a copy of the GNU General Public
* License along with this library; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*
* Contact : [email protected] [email protected]
*
*/
#pragma once
#include <gsl/gsl_vector.h>
#include <cmath>
#include <rlAlgo.hpp>
#include <rlEpisode.hpp>
#include <rlException.hpp>
// #include <rlKTD.hpp>
// #include <rlLSTD.hpp>
// #include <rlMLP.hpp>
// #include <rlOffPAPI.hpp>
#include <rlPolicy.hpp>
#include <rlQLearning.hpp>
// #include <rlSARSA.hpp>
#include <rlTD.hpp>
// #include <rlActorCritic.hpp>
#include <rlTypes.hpp>
#include <rl_range_control.hpp>
// #include <rl-garnet.hpp>
/**
* @example example-000-000-overview.cc
*/
/**
* @example example-000-001-simulator.cc
*/
/**
* @example example-000-002-learning.cc
*/
/**
* @example example-000-003-agents.cc
*/
/**
* @example example-001-001-cliff-walking-sarsa.cc
*/
/**
* @example example-001-002-cliff-walking-qlearning.cc
*/
/**
* @example example-002-001-boyan-lstd.cc
*/
/**
* @example example-002-002-pendulum-lspi.cc
*/
/**
* @example example-003-001-pendulum-ktdq.cc
*/
/**
* @example example-003-002-pendulum-mlp-ktdq.cc
*/
/**
* @example example-003-003-mountain-car-ktdsarsa.cc
*/
/**
* @example example-004-001-cliff-onestep.cc
*/
/**
* @example example-004-002-cliff-eligibility.cc
*/
/**
* @example example-defs-transition.hpp
*/
/**
* @example example-defs-tabular-cliff.hpp
*/
/**
* @example example-defs-cliff-experiments.hpp
*/
/**
* @example example-defs-pendulum-architecture.hpp
*/
/**
* @example example-defs-test-iteration.hpp
*/
/**
* @example example-defs-ktdq-experiments.hpp
*/
/**
* @example example-defs-mountain-car-architecture.hpp
*/
/**
* @mainpage
*
* @section Overview
*
* The rl library is not a framework where you can plug your own
* algorithms by complying to predefined interfaces. It is rather a set
* of tools that helps you designing your work or experiment from
* scratch. The main function is yours, and you are responsible for
* scheduling everything from it, for creating every object that you
* need.
*
* In such a design, the library offers ready-to-use algorithms and
* types written as templates. Using the template is equivalent as
* asking some programmer to write for you a code that is dedicated to
* your application.
*
* This documentation contains both a reference and a user manual. The
* reference manual is given by the Doxygen structure of class
* names, as usually. The user manual is a better way to get familiar
* with the library. The user manual, here, consists of the set of
* examples. You can start by reading them, <b>in the suggested
* order</b> (examples all have a number).
*
* The use of templates may be considered as adding programming
* complexity. The point is that this kind of genericity, based on a
* re-writing mechanism at compiling time that writes code for you,
* makes the design close to the mathematics. The cost is that you
* spend time to make sure that you fit the requirements when you use
* some rl object. If you don't, you will get some very complicated
* syntax error message. This is clearly the drawback of the use of
* templates. Nevertheless, once it compiles, the code you get is
* quite safe. Our philosophy is that fixing syntax error is a finite
* process, as opposed to bug fixing.
*
* @section tailor Tailoring your code with typedefs
*
* As you will see when browsing examples, it often contains a list of
* typedefs. This is a smart way to cope with quite complicated types
* generated by templates. For example:
*
* @code
typedef rl::problem::mountain_car::DefaultParam mcParam;
typedef rl::problem::mountain_car::Simulator<mcParam> Simulator;
typedef Simulator::action_type A;
* @endcode
* So when you write afterward:
* @code
A optimal_action;
* @endcode
* It is as if you had written:
* @code
rl::problem::mountain_car::Simulator<rl::problem::mountain_car::DefaultParam>::action_type optimal_action;
* @endcode
* This raises a problem when syntax error occur, since error message displays the complicated version of your types.
*
* @section concept The use of concepts
*
* There is not a clean support of concepts in the version of c++ that
* we use. In order to help the designers, we have made explicit concepts
* in classes, that are just aimed at being documented here, and that
* are never used in the code. They are gathered in the rl::concept
* namespace. The convention is the following. If you need a rl
* template whose documentation is like this:
*
* @code
namespace rl {
template<typename STUFF, typename SA_FOO, typename SA_BAR>
class DummyAlgorithm {
public:
double computeResult(void);
};
}
* @endcode
*
* You have to search in the documentation for concepts
* rl::concept::Stuff, rl::concept::sa::Foo, rl::concept::sa::Bar, as
* suggests the names of the formal template parameters of
* DummyAlgorithm. Let us suppose that you find this.
*
* @code
namespace rl {
namespace concept {
template<typename ANY>
class Stuff {
public:
typedef ANY any_type;
void interpret(any_type& a);
};
namespace sa {
template<typename VALUE>
class FooBase {
public:
typedef VALUE value_type;
value_type get(void);
};
template<typename VALUE>
class Foo : public FooBase {
public:
void set(const value_type& v);
};
class Bar {
public:
static int size(void);
};
}
}
}
* @endcode
*
* It does not mean at all that you have to inherit from the previous
* classes in order to provide type parameters to the DummyAlgorithm
* class. Rather, it means that you have to design a class <b>accordingly</b>
* to the concept classes. Let us make a class that fits all the three
* rl::concept::Stuff, rl::concept::sa::Foo and rl::concept::sa::Bar
* concepts. You just have to copy-paste from the concept
* documentation.
*
* @code
class ThreeInOne {
public:
// This fits rl::concept::Stuff<std::string>
typedef std::string any_type;
void interpret(any_type& a) {
// your code here
}
// This fits rl::concept::sa::Foo<int>... and
// rl::concept::sa::FooBase<int> since Foo inherits
// from FooBase
typedef int value_type;
value_type get(void) {
// your code here
}
void set(const value_type& v) {
// your code here
}
// This fits rl::concept::sa::Bar
static int size(void) {
// your code here
}
};
* @endcode
*
* Once this ThreeInOne class is defined, it can be used as a type
* parameter for the three slots in the DummyAlgorithm template, since
* it fits the three requirements.
*
* @code
typedef rl::DummyAlgorithm<ThreeInOne,ThreeInOne,ThreeInOne> MyAlgo;
...
MyAlgo algo;
double res = algo.computeResult;
* @endcode
*
* Fitting to the concepts ensures that your code will compile. It
* also induce a very strong type checking, that may be annoying at
* compiling time if you do not perfectly fit the concepts, but that
* brings a lot of safety at run time.
*
*
* @section functional A intensive use of C++-11 function tools
*
* There are quite a few concepts in the library (since version
* 3.00.00 !). They are used mainly for the definition of
* simulators. Fitting a concept often requires to define wrapper
* classes, in order to make pr-existing code elements compatible with
* the required concepts. This is why the rl design is rather based on
* functions, as examples show. Lambda functions and bindings are
* widely used in the examples, since they provide a powerfull and
* compact way to wrap things.
*
* This is an example of the use of bindings.
* @code
double q_param(const Param& theta, S s, A a) {
// compute q_\theta(s,a)
}
using namespace std::placeholders; // defines _1,_2,...
Param p;
// A Q-function takes two arguments, not three. We can get a
// Q-function by binding the first parameter of q_param to p.
auto q1 = std::bind(q_param,p,_1,_2); // q1(x,y) = q_param(p,x,y).
// The same can be done from a lambda function as well.
auto q2 = [&p](S s, A a) -> double {return q_param(p,s,a);};
S s0;
// If we want to get the best action from s0, we need an action
// iterator (got from an array here) and a binding.
std::array<A,NB_ACTIONS> actions = {{action1,action2,action3,...}};
auto a_q_pair = rl::argmax(std::bind(q1,s0,_1), // this is f(x) = q1(s0,x).
actions.begin(), actions.end());
auto best_a = a_q_pair.first;
* @endcode
*
* Another use of functions is to provide accessors to internal data,
* or builders, so that algorithm can handle a data without requiring
* some template fitting. This avoids the above mentionned concept-based
* wrapping. Let us see an example that gives the taste of the rl
* library design.
*
* Let us suppose that the library provides an algorithm that sums the
* real parts of a collection of complex numbers. The sumation
* algotithm would be written like this.
*
* @code
template<typename ITERATOR, typename GET_COMPLEX, typename GET_REAL>
double sum_real_parts(const ITERATOR& begin, const ITERATOR& end,
const GET_COMPLEX& get_complex,
const GET_COMPLEX& get_real) {
double sum = 0;
for(auto it = begin; it != end; ++it)
sum += get_real(get_complex(*it));
return sum;
}
* @endcode
*
* The previous code do not expect that the complex are placed within
* a vector, since general purpose iterators are expected. Moreover,
* the content of the collection has not to be directly a complex,
* since get_complex is invoqued to get it. Last, complex are note
* required to fit some concept telling that c.re has to be a legal
* expression, since get_real does the job. Let use our algorithm with
* some data.
@code
typedef std::pair<double,double> Complex;
struct Data {
Complex value;
std::string name;
int tag;
};
std::map<std::string,Data> database = .... ;
double sum = sum_real_parts(database.begin(),database.end(),
[](const Data& content) -> Complex {return content.second.value;}, // Gets the complex...
[](const Complex& c) -> double {return c.first;}); // ... and from it, its real part.
* @endcode
*
* @section start Getting started
*
* You are now ready to read the examples following the order induced
* by the file names, and of course you can design you own
* experiments, inspiring from the code in the examples. In order to
* compile your code, pkg-config support is available (unix).
*
* @code
g++ -o example.bin file.cc `pkg-config --cflags --libs rl`
./example.bin
* @endcode
* or more generally
* @code
g++ -c file1.cc `pkg-config --cflags rl`
g++ -c file2.cc `pkg-config --cflags rl`
...
g++ -c fileN.cc `pkg-config --cflags rl`
g++ -o example.bin *.o `pkg-config --libs rl`
./example.bin
* @endcode
*/