Merge branch 'master' of github.com:openai/requests-for-research

vpomponiu · Aug 3, 2016 · 120bf72 · 120bf72
2 parents aa47617 + a7c1c58
commit 120bf72
Showing 1 changed file with 19 additions and 0 deletions.
diff --git a/_requests_for_research/infinite-symbolic-generalization.html b/_requests_for_research/infinite-symbolic-generalization.html
@@ -0,0 +1,19 @@
+---
+title: 'Reduced-Information Program Learning'
+summary: ''
+difficulty: 2 # out of 3
+---
+
+<p>A difficult machine learning task is that of program, or algorithm, learning. An algorithm is a set of rules which can be used to perform a mapping from a set of inputs to a set of outputs. Examples of simple algorithmic tasks are <a href="https://gym.openai.com/envs/RepeatCopy-v0">RepeatCopy</a> and <a href="https://gym.openai.com/envs/ReversedAddition-v0">ReversedAddition</a>. Theoretically, recurrent neural networks (RNNs) are Turing-complete, which means that they can model any computable function. In practice, however, it has been difficult to <em>learn</em> algorithms.</p>
+
+<p>A recent success is the <a href="http://www-personal.umich.edu/~reedscot/iclr_project.html">Neural Programmer-Interpreter</a> (NPI), which uses a strong supervision signal in the form of execution traces. This is opposed to the notion of <em>program induction</em>, where programs must be inferred from input-output pairs. Using strong supervision, the NPI is able to exhibit <em>strong generalisation</em> on long inputs, unlike the sequence-to-sequence RNN baseline. Another key to the success of the NPI is its task-independent RNN core, which is able to take advantage of the compositionality (or hierarchy) of programs (in programming parlance, subroutines).</p>
+
+<p>The challenge is to achieve similar results with <strong>partial</strong> execution traces. For many problems we may not have the ability to produce detailed, full execution traces, but can specify higher-level details. In the algorithmic learning context this could be likened to psuedocode, where supervision is provided in the form of providing high-level routines, but there lies at least a single level of hierarchy between primitive actions/operations and the routines provided as supervision. The NPI can be taken as a baseline, with possible constructs such as programming embeddings - the only restriction is the lack of full execution traces. However, other weak supervision may be provided - such as the input-output pairs used in program induction. The three tasks that are to be performed by the model with a single set of weights (but interchangeable <em>encoders</em>) are addition, sorting and canonicalizing 3D models. The former two tasks make use of a "scratch pad" and pointers, whilst the latter uses a set of <a href="http://ttic.uchicago.edu/~fidler/projects/CAD.html">CAD models</a>. The goal is to reach the same performance (per sequence % accuracy) as the multi-task NPI, with 64 sequence samples per task.</p>
+
+<p>Progressing from learning to perform bubble sort (as achieved by the NPI), more difficult algorithms to learn would be more complex sorting algorithms, such as quicksort. Whilst bubble sort mainly consists of local comparisons, quicksort involves comparisons across the length of the sequence as well as multiple levels of recursion.</p>
+
+<hr />
+
+<h3>Notes</h3>
+
+<p>The compositionality or "subroutines" exhibited in algorithms has links to hierarchical reinforcement learning, where the reinforcement learning task can be subdivided into a series of smaller problems. Both identifying subtasks and learning when to execute subtask policies is an ongoing area of research. Without full execution traces it seems likely that reinforcement learning could be used to learn these algorithms, although the sample complexity will necessarily be higher.</p>