Skip to content

Teaching students by iterative correction and evaluation

Notifications You must be signed in to change notification settings

stevenzolo/CorrectiveTeaching

Repository files navigation

Corrective Teacher

Official implementation of the paper:

Teaching as Iterative Correction and Evaluation: A Bi-Level Reinforcement Learning Approach

A Novel Interactive Teaching Paradigm.

In this paper, we adopt a bi-level reinforcement learning framework to model the interaction among the teachers, the students, and the task environment.
  • Lower-level RL: the student agent interacts with the task environment, like a standard RL problem.
  • Higher-level RL: the teacher agent observes the lower-level interaction and offers instructions to improve the student’s policy.

Depending on whether the teacher can provide timely suggestions to students during the interaction, two basic problem formulations are considered:

Instant Teaching Delayed Teaching
  • Instant Teaching: In turn-based games (such as Go), students have the opportunity to report their intended actions to the teacher before taking action, thus the instruction can be adopted and evaluated instantly.
  • Delayed Teaching: If the teacher provides instruction after the student’s action is executed, for example in tennis training, the effectiveness can only be evaluated by delaying the student’s adoption of this instruction until the next occurrence of the same task state.

Experiments in Windy Gridworld

The Gridworld game features a boundary area divided into multiple unit squares. The agent’s objective is to navigate from the start square to the goal square, with available actions of {Up, Down, Left, Right}. Attempts to move beyond the boundary do not change the agent’s position. The game complexity is heightened by introducing unknown wind forces under each column, influencing the agent’s movement. An optimal path of the designed map, marked with a blue line, serves as a reference.

Comparison between the corrective teacher and elite-player teachers.

Results demonstrate that corrective teachers in both scenarios outperform elite-player and can better facilitate student learning.

Instant Scenario Delayed Scenario

Teach students with varying initial skill levels

Even for the students initialized with varying skill levels, the proposed corrective teacher help can students achieve better efficiency than their self-study.

Instant Scenario Delayed Scenario

About

Teaching students by iterative correction and evaluation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages