Corrective Teacher

Official implementation of the paper:

Teaching as Iterative Correction and Evaluation: A Bi-Level Reinforcement Learning Approach

A Novel Interactive Teaching Paradigm.

In this paper, we adopt a bi-level reinforcement learning framework to model the interaction among the teachers, the students, and the task environment.

Lower-level RL: the student agent interacts with the task environment, like a standard RL problem.
Higher-level RL: the teacher agent observes the lower-level interaction and offers instructions to improve the student’s policy.

Depending on whether the teacher can provide timely suggestions to students during the interaction, two basic problem formulations are considered:

Instant Teaching	Delayed Teaching

Instant Teaching: In turn-based games (such as Go), students have the opportunity to report their intended actions to the teacher before taking action, thus the instruction can be adopted and evaluated instantly.
Delayed Teaching: If the teacher provides instruction after the student’s action is executed, for example in tennis training, the effectiveness can only be evaluated by delaying the student’s adoption of this instruction until the next occurrence of the same task state.

Experiments in Windy Gridworld

The Gridworld game features a boundary area divided into multiple unit squares. The agent’s objective is to navigate from the start square to the goal square, with available actions of {Up, Down, Left, Right}. Attempts to move beyond the boundary do not change the agent’s position. The game complexity is heightened by introducing unknown wind forces under each column, influencing the agent’s movement. An optimal path of the designed map, marked with a blue line, serves as a reference.

Comparison between the corrective teacher and elite-player teachers.

Results demonstrate that corrective teachers in both scenarios outperform elite-player and can better facilitate student learning.

Instant Scenario	Delayed Scenario

Teach students with varying initial skill levels

Even for the students initialized with varying skill levels, the proposed corrective teacher help can students achieve better efficiency than their self-study.

Instant Scenario	Delayed Scenario

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
gym_games		gym_games
hyperparams		hyperparams
imgs		imgs
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
coach_agent.py		coach_agent.py
coach_eval_plot.py		coach_eval_plot.py
coach_train_plot.py		coach_train_plot.py
student_agent.py		student_agent.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Corrective Teacher

A Novel Interactive Teaching Paradigm.

Experiments in Windy Gridworld

Comparison between the corrective teacher and elite-player teachers.

Teach students with varying initial skill levels

About

Releases

Packages

Languages

stevenzolo/CorrectiveTeaching

Folders and files

Latest commit

History

Repository files navigation

Corrective Teacher

A Novel Interactive Teaching Paradigm.

Experiments in Windy Gridworld

Comparison between the corrective teacher and elite-player teachers.

Teach students with varying initial skill levels

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages