This repo is for the Stackelberg meta-learning project. The underlying application is UAV guiding UGV.
Note: This repo is reorganized for better readability in April 2024. The old version is archived in the main-old
branch.
- Python 3.9 or higher
- PyTorch 1.12.1 or higher
- Create a Python virtual environment with Python 3.9 or higher and source the virtual environment:
$ python3.9 -m venv <your-virtual-env-name>
$ source /path-to-venv/bin/activate
- Use
pip
to install related packages:
(your-venv)$ pip install -e .
To use plotting functions, install with
(your-venv)$ pip install -e ".[visual]"
- Go to the
experiments/
directory and run different training scripts. e.g.,
(your-venv)$ python train_meta.py
Note: generate_data.py
should be run first before all training.
sg_meta/
: algorithm implementationsdata/
: environment settings and learning hyperparameters.model.py
: definition of the best response NN model.agent.py
: implementations of leader and follower classes.meta.py
: sampling and meta-learning algorithm.utils.py
: miscellaneous utilities.
data/
: data directory for saving generated and learned models.experiments/
: Python scripts for running the experiments.generate_data.py
: generate the training data.train_meta.py
: meta-learning algorithm implementations.ave_param.py
: average over parameter space.ave_output.py
: average over output space.receding_horizon.py
: receding horizon planning.zero_guidance.py
: compute the follower's trajectory without the leader's guidance.plot_things.py
: plotting scripts.
tests/
: test Python scripts.
Note: Meta training and adaptation are performed on the CPU since we manually implement gradient updates for each training iteration. GPU implementation is less efficient.
In the Leader
class, we specify some functions:
compute_opt_traj
: solve the parameterized trajectory optimization probleminitx
: generate an initial guess for the trajectory optimization problemoc_opt
: use optimization solver to obtain the trajectorypmp_opt
: use pmp conditions to refine the trajectory
obj_oc
: objective of control costgrad_obj_oc
: gradient of control cost
In the Meta
class, we specify some functions:
- sample_task_theta: sample BR data for task theta
- sample_task_theta_traj: sample BR data for task theta near the trajectory
- sample_task_theta_uniform: randomly sample BR data for task theta
- update_model: update meta model
- update_model_theta: update intermediate model
- train_brnet: train separate brnet for different followers, designed for individual learning
To save space, we use state_dict
to pass a neural network.
Each obstacle is specified by a 6-dim vector: [xc, yc, rc, norm, x_scale, y_scale]
.
- If
norm=1
,x_scale, y_scale
scale the unit width/heightrc
. - If
norm=2
,x_scale, y_scale
scale the radiusrc
. - If
norm=-1
,x_scale, y_scale
scale the unit edge lengthrc
. The previous scaling notation is easy for plotting. The math representation is scaled by1/x_scale
and1/y_scale
, respectively.
- BR data are organized into numpy array.
D[i,:] = [x, a, br]
- Trajectory is stored in a 2d numpy array with axis0 as time index.
x_traj[t, :] = x_t
- Use a list to store type-related quantity.
br_list[i]
is the adapted meta model (an NN) for type i follower.- Trajectories have different time dimensions.
- state trajectory
x_traj
has dimension T+1.x_0, ..., x_T
- control input trajectories
a_traj
andb_traj
have dimension T.a_0, ..., a_{T-1}
- costate trajectory
lam_traj
has dimension T.lam_1, ..., lam_T
- Meta learning and adaptation hyperparameters are in the
sg_meta/data/parameters.json
. - For
Param-Ave
andOutput-Ave
training, hyperparameters are defined in the script. They can be different from meta learning hyperparameters.