This repo contains the data splits from the following paper.
Federated Visual Classification with Real-World Data Distribution
Tzu-Ming Harry Hsu, Hang Qi, Matthew Brown
https://arxiv.org/abs/2003.08082
Dataset | # users | # classes | # examples | Download split files |
---|---|---|---|---|
Landmarks-User-160k | 1,062 | 2,028 | 164,172 | Download |
iNaturalist-User-120k | 9,275 | 1,203 | 120,300 | Download |
iNaturalist-Geo | 11 to 3,606 | 1,203 | 120,300 | Download |
CIFAR-10 | 100 | 100 | 50,000 | Download |
CIFAR-100 | 100 | 100 | 50,000 | Download |
The train and test splits are provided in different files:
train
splits:federated_train*.csv
.test
splits:test.csv
.
The csv files contain the following columns:
-
user_id
: Images belong to the same "user" for federated learning are assigned with the same id. This can be numerical ids, geo cell identifiers, or author names of the images. Thetest
splits do not have this column. -
image_id
: The image ids in the source datasets. Images should be retrieved from the source datasets. -
class
: The class label used in the our federated visual classification paper. -
label
: The original class label in the source dataset (iNaturalist only).
We provide a simple tool for parsing the csv files and outputting general statistics about the datasets.
# Example usage for inspecting the CIFAR-10 alpha-0 split.
$ python inspect_splits.py --dataset=cifar \
--train_file=cifar10/federated_train_alpha_0.00.csv \
--test_file=test.csv
# For detailed instructions.
$ python inspect_splits.py --help
We do not distribute images in this repo. Images should be downloaded from the source datasets linked below. These datasets and images may have different licenses and terms of use. We do not own their copyright.
Please cite the following publication if you intend to use these datasets.
@inproceedings{hsu2020federated,
author = {Tzu-Ming Harry Hsu and Hang Qi and Matthew Brown},
title = {{Federated Visual Classification with Real-World Data Distribution}},
year = {2020}
}