This directory contains a TensorFlow implementation of an Implicit Gradient Transport optimizer using Anytime Average. For details, see Reducing the variance in online optimization by transporting past gradients.
The optimizer relies on a gradient extrapolation (the gradient is not computed as the parameter values). The present implementation relies on the variables containing the shifted parameters. The true parameters are instead contained in associated slots. This is an important distinction, especially when considering whether the learning curves are for the true or the shifted paramemeters.
The experimental framework is centered on a fork of the Cloud TPU resnet code from May 2019.
resnet_main.py
is the main executable. Important flags are:
mode
, which offers a specialeval_igt
mode for evaluating an IGT model at the true parameters (vs shifted ones). This value should be used in conjunction with theigt_eval_mode
andigt_eval_set
flags.optimizer
, for setting the optimizerigt_optimizer
, for setting the optimizer to use in conjunction with IGTtail_fraction
, for setting IGT's any time average data windowlr_decay
andlr_decay_step_fraction
dump_metrics_to_csv.py
is used to convert the learning curves from their
TensorFlow summary format to an easier to consume csv format.
If you use this code for your publication, please cite the original paper:
@inproceedings{clark2019bam,
title = {Reducing the variance in online optimization by transporting past
gradients},
author = {Sébastien Arnold and Pierre-Antoine Manzagol and Reza Harikandeh
and Ioannis Mitliagkas and Nicolas Le Roux (Google Brain)},
booktitle = {NeurIPS},
year = {2019}
}