This directory contains source code for evaluating federated learning with different optimizers on various models and tasks. The code was originally developed for a paper, "Adaptive Federated Optimization" (arXiv link), but has since evolved into a general library for comparing and benchmarking federated optimization algorithms.
This library uses TensorFlow Federated. For a more general look at using TensorFlow Federated for research, see Using TFF for Federated Learning Research.
Some pip packages are required by this library, and may need to be installed. See the requirements file for details.
We also require Bazel in order to run the code. Please see the guide here for installation instructions.
This directory is broken up into six task directories. Each task directory
contains task-specific libraries (such as libraries for loading the correct
dataset), as well as libraries for performing federated training. These are in
the task
folder.
A single binary for running these tasks can be found at trainer.py
. This
binary will, according to absl
flags, run any of the six task-specific
federated training libraries.
Suppose we wish to train a convolutional network on EMNIST for purposes of
character recognition (emnist_character
), using federated optimization.
Various aspects of the federated training procedure can be customized via absl
flags. For example, from this directory one could run:
bazel run :trainer -- --task=emnist_character --total_rounds=100
--client_optimizer=sgd --client_learning_rate=0.1 --client_batch_size=20
--server_optimizer=sgd --server_learning_rate=1.0 --clients_per_round=10
--client_epochs_per_round=1 --experiment_name=emnist_fedavg_experiment
This will run 100 communication rounds of federated training, using SGD on both
the client and server, with learning rates of 0.1 and 1.0 respectively. The
experiment uses 10 clients in each round, and performs 1 training epoch on each
client's dataset. Each client will use a batch size of 10 The experiment_name
flag is for the purposes of writing metrics.
To try using Adam at the server, we could instead set --server_optimizer=adam
.
Other parameters that can be set include the batch size on the clients, the
momentum parameters for various optimizers, and the number of total
communication rounds.
Below we give a summary of the datasets, tasks, and models used in this directory.
Task Name | Dataset | Model | Task Summary |
---|---|---|---|
cifar100_image | CIFAR-100 | ResNet-18 (with GroupNorm layers) | Image classification |
emnist_autoencoder | EMNIST | Bottleneck network | Autoencoder |
emnist_character | EMNIST | CNN (with dropout) | Character recognition |
shakespeare_character | Shakespeare | RNN with 2 LSTM layers | Next-character prediction |
stackoverflow_word | Stack Overflow | RNN with 1 LSTM layer | Next-word prediction |
stackoverflow_tag | Stack Overflow | Logistic regression classifier | Tag prediction |
In our work, we compare 5 primary server optimization methods: FedAvg, FedAvgM, FedAdagrad, FedAdam, and FedYogi. The first two use SGD on the server (with FedAvgM using server momentum) and the last three use an adaptive optimizer on the server. All five use client SGD.
To configure these optimizers, use the following flags:
- FedAvg:
--server_optimizer=sgd
- FedAvgM:
--server_optimizer=sgd --server_sgd_momentum={momentum value}
- FedAdagrad:
--server_optimizer=adagrad
- FedAdam:
--server_optimizer=adam
- FedYogi:
--server_optimizer=yogi
For adaptive optimizers, one should also set the numerical stability constant
epsilon (tau in Algorithm 2 of
Adaptive Federated Optimization). This
parameter can be using the flag server_{adaptive optimizer}_epsilon
. We
recommend a starting value of 0.001, which worked well across task and
optimizers. For a more in-depth discussion, see
Hyperparameters and Tuning.
For FedAdagrad and FedYogi, we use implementations of Adagrad and Yogi that
allow one to select the initial_accumulator_value
(see the Keras documentation
on
Adagrad).
For all experiments, we used initial accumulator values of 0 (the value fixed in
the Keras implementation of
Adam).
While this can be tuned, we recommend focusing on tuning learning rates,
momentum parameters, and epsilon values before tuning this value.
The client learning rate (client_learning_rate
) and server learning rate
(server_learning_rate
), can be vital for good performance on a task, as can
optimizer-specific hyperparameters. By default, we create flags for each
optimizer based on its placement (client or server) and the Keras argument
name. For example, if we set --client_optimizer=sgd
, then there will be a flag
client_sgd_momentum
corresponding to the momentum argument in the
Keras SGD.
In general, we have flags of the form {placement}_{optimizer}_{arg name}
.
In addition to the optimizer-specific hyperparameters, there are other
parameters that can be configured via flags, including the batch size
(batch_size
), the number of participating clients per round
(clients_per_round
), the number of client epochs (client_epochs
). We also
have a client_datasets_random_seed
flag that seeds a pseudo-random function
used to sample clients. All results in
Adaptive Federated Optimization used seed of
- Changing this may change convergence behavior, as the sampling order of clients is important in communication-limited settings.
For more details on hyperparameters and tuning, see Hyperparameters and Tuning.