initial commit

omarahmed19992 · Oct 9, 2018 · a1ca28a · a1ca28a
commit a1ca28a
Show file tree

Hide file tree

Showing 37 changed files with 2,654 additions and 0 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,15 @@
+.DS_Store
+datasets/
+checkpoints/
+scripts/
+torch.egg-info/
+*/**/__pycache__
+*/*.pyc
+*/**/*.pyc
+*/**/**/*.pyc
+*/**/**/**/*.pyc
+*/**/**/**/**/*.pyc
+*/*.so*
+*/**/*.so*
+*/**/*.dylib*
+*~
diff --git a/README.md b/README.md
@@ -0,0 +1,120 @@
+![Python 3](https://img.shields.io/badge/python-3-green.svg) ![Pytorch 0.3](https://img.shields.io/badge/pytorch-0.3-blue.svg)
+# FD-GAN: Pose-guided Feature Distilling GAN for Robust Person Re-identification
+
+<p align="center"><img src='framework.jpg' width="600px"></p>
+
+[[Paper]](https://arxiv.org/abs/1810.02936)
+
+[Yixiao Ge](mailto:[email protected])\*, [Zhuowan Li](mailto:[email protected])\*, [Haiyu Zhao](mailto:[email protected]), [Guojun Yin](mailto:[email protected]), [Shuai Yi](mailto:[email protected]), [Xiaogang Wang](mailto:[email protected]), and [Hongsheng Li](mailto:[email protected])  
+Neural Information Processing Systems (**NIPS**), 2018 (* equal contribution)
+
+Pytorch implementation for our NIPS 2018 work. With the proposed siamese structure, we are able to learn **identity-related** and **pose-unrelated** representations.
+
+## Prerequisites
+- Python 3
+- [Pytorch](https://pytorch.org/) (We run the code under version 0.3.1, maybe lower versions also work.)
+
+## Getting Started
+
+### Installation
+- Install dependencies (e.g., [visdom](https://github.com/facebookresearch/visdom) and [dominate](https://github.com/Knio/dominate)). You can install all the dependencies by:
+```
+pip install scipy, pillow, torchvision, sklearn, h5py, dominate, visdom
+```
+- Clone this repo:
+```
+git clone https://github.com/yxgeee/FD-GAN
+cd FD-GAN/
+```
+
+### Datasets
+We conduct experiments on [Market1501](https://drive.google.com/open?id=1LS5_bMqv-37F14FVuziK63gz0wPyb0Hh), [DukeMTMC](https://drive.google.com/open?id=1Ujtm-Cq7lpyslBkG-rSBjkP1KVntrgSL), [CUHK03](https://drive.google.com/open?id=1R7oCwyMHYIxpRVsYm7-2REmFopP9TSXL) datasets. We need pose landmarks for each dataset during training, so we generate the pose files by [Realtime Multi-Person Pose Estimation](https://github.com/tensorboy/pytorch_Realtime_Multi-Person_Pose_Estimation). And the raw datasets have been preprocessed by the code in [open-reid](https://github.com/Cysu/open-reid). 
+Download the prepared datasets following below steps:
+- Create directories for datasets:
+```
+mkdir datasets
+cd datasets/
+```
+- Download these datasets through the links above, and `unzip` them in the same root path.
+
+## Usage
+As mentioned in the original [paper](), there are three stages for training our proposed framework.
+
+### Stage I: reID baseline pretraining
+We use a Siamese baseline structure based on `ResNet-50`. You can train the model with follow commands,
+```
+python baseline.py -b 256 -j 4 -d market1501 -a resnet50 --combine-trainval \
+					--lr 0.01 --epochs 100 --step-size 40 --eval-step 5 \
+					--logs-dir /path/to/save/checkpoints/
+```
+You can train it on specified GPUs by setting `CUDA_VISIBLE_DEVICES`, and change the dataset name `[market1501|dukemtmc|cuhk03]` after `-d` to train models on different datasets.  
+Or you can download the pretrained baseline model directly following the link below,
+- [Market1501_baseline_model](https://drive.google.com/open?id=1oNLf-gazgfN0EqkdIOKtcJSBx22BuO1-)
+- [DukeMTMC_baseline_model](https://drive.google.com/open?id=1iVXIaXT6WQzKuLD3eDcBZB-3aNeZ6Ivf)
+- [CUHK03_baseline_model](https://drive.google.com/open?id=1jubhvKl_Ny9b89wbX0-u2GhPEeXMLaUQ)
+
+<a name="stageI"></a>And **test** them with follow commands,
+```
+python baseline.py -b 256 -d market1501 -a resnet50 --evaluate --resume /path/of/model_best.pth.tar
+```
+
+### Stage II: FD-GAN pretraining
+We need to pretain FD-GAN with the image encoder part (*E* in the original paper and *net_E* in the code) fixed first. You can train the model with follow commands,
+```
+python train.py --display-port 6006 --display-id 1 \
+	--stage 1 -d market1501 --name /directory/name/of/saving/checkpoints/ \
+	--pose-aug gauss -b 256 -j 4 --niter 50 --niter-decay 50 --lr 0.001 --save-step 10 \
+	--lambda-recon 100.0 --lambda-veri 0.0 --lambda-sp 10.0 --smooth-label \
+	--netE-pretrain /path/of/model_best.pth.tar
+```
+You can train it on specified GPUs by setting `CUDA_VISIBLE_DEVICES`. For main arguments,
+- `--display-port`: display port of [visdom](https://github.com/facebookresearch/visdom), e.g., you can visualize the results by `localhost:6006`.
+- `--display-id`: set `0` to disable [visdom](https://github.com/facebookresearch/visdom).
+- `--stage`: set `1` for Stage II, and set `2` for stage III.
+- `--pose-aug`: choose from `[no|erase|gauss]` to make augmentations on pose maps.
+- `--smooth-label`: smooth the label of GANloss or not. 
+
+Other arguments can be viewed in [options.py](https://github.com/yxgeee/FD-GAN/blob/master/fdgan/options.py).
+Also you can directly download the models for stage II,
+- [Market1501_stageII_model](https://drive.google.com/open?id=1kIBuPzz-Ig70dE3rU-5-kyo3nGJP01NS)
+- [DukeMTMC_stageII_model](https://drive.google.com/open?id=1dD1cbg2jo5qhPbkMbsRYACRcVMrm28-o)
+- [CUHK03_stageII_model](https://drive.google.com/open?id=1552oDot-vgA27b-mCspJAuzaOl685koz)
+
+There are four models in each directory for separate nets.
+
+### Stage III: Global finetuning
+Finetune the whole framework by optimizing all parts. You can train the model with follow commands,
+```
+python train.py --display-port 6006 --display-id 1 \
+	--stage 2 -d market1501 --name /directory/name/of/saving/checkpoints/ \
+	--pose-aug gauss -b 256 -j 4 --niter 25 --niter-decay 25 --lr 0.0001 --save-step 10 --eval-step 5 \
+	--lambda-recon 100.0 --lambda-veri 10.0 --lambda-sp 10.0 --smooth-label \
+	--netE-pretrain /path/of/100_net_E.pth --netG-pretrain /path/of/100_net_G.pth \
+	--netDi-pretrain /path/of/100_net_Di.pth --netDp-pretrain /path/of/100_net_Dp.pth
+```
+You can train it on specified GPUs by setting `CUDA_VISIBLE_DEVICES`.  
+We trained this model on a setting of batchsize 256. If you don't have such or better hardware, you may decrease the batchsize (the performance may also drop).
+Or you can directly download our final model,
+- [Market1501_stageIII_model](https://drive.google.com/open?id=1w8xqopW0icA3VIxZyelI9k-Fb8rRCME7)
+- [DukeMTMC_stageIII_model](https://drive.google.com/open?id=1axBHUcI7JmPbw8Y_mSpMKWIY9FUfFKMI)
+- [CUHK03_stageIII_model](https://drive.google.com/open?id=1q6HkDlDUIV9YNUwAggy-HI9zYQjt7Ihk)
+
+And **test** `best_net_E.pth` by the same way as mentioned in [Stage I](#stageI).
+
+## TODO
+- scripts for generate pose landmarks.
+- generate specified images.
+
+## Citation
+Please cite our paper if you find the code useful for your research.
+```
+@inproceedings{ge2018fdgan,
+  title={FD-GAN: Pose-guided Feature Distilling GAN for Robust Person Re-identification},
+  author={Ge, Yixiao and Li, Zhuowan and Zhao, Haiyu and Yin, Guojun and Wang, Xiaogang and Li, Hongsheng},
+  booktitle={Advances in Neural Information Processing Systems},
+  year={2018}
+}
+```
+
+## Acknowledgements
+Our code is inspired by [pytorch-CycleGAN-and-pix2pix](https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix) and [open-reid](https://github.com/Cysu/open-reid).
diff --git a/baseline.py b/baseline.py
@@ -0,0 +1,200 @@
+from __future__ import print_function, absolute_import
+import argparse
+import os.path as osp
+
+import numpy as np
+import os, sys
+from bisect import bisect_right
+import torch
+from torch import nn
+from torch.nn import functional as F
+from torch.autograd import Variable
+from torch.backends import cudnn
+from torch.utils.data import DataLoader
+
+from reid import datasets
+from reid import models
+from reid.utils.data import transforms as T
+from reid.utils.data.preprocessor import Preprocessor
+from reid.utils.logging import Logger
+from reid.utils.serialization import load_checkpoint, save_checkpoint, copy_state_dict
+
+from reid.utils.data.sampler import RandomPairSampler
+from reid.models.embedding import EltwiseSubEmbed
+from reid.models.multi_branch import SiameseNet
+from reid.evaluators import CascadeEvaluator
+from reid.trainers import SiameseTrainer
+
+def get_data(name, split_id, data_dir, height, width, batch_size, workers,
+             combine_trainval, np_ratio):
+    root = osp.join(data_dir, name)
+
+    dataset = datasets.create(name, root, split_id=split_id)
+
+    normalizer = T.Normalize(mean=[0.485, 0.456, 0.406],
+                             std=[0.229, 0.224, 0.225])
+
+    train_set = dataset.trainval if combine_trainval else dataset.train
+
+    train_transformer = T.Compose([
+        T.RandomSizedRectCrop(height, width),
+        T.RandomSizedEarser(),
+        T.RandomHorizontalFlip(),
+        T.ToTensor(),
+        normalizer,
+    ])
+
+    test_transformer = T.Compose([
+        T.RectScale(height, width),
+        T.ToTensor(),
+        normalizer,
+    ])
+
+    train_loader = DataLoader(
+        Preprocessor(train_set, root=dataset.images_dir,
+                     transform=train_transformer),
+        sampler=RandomPairSampler(train_set, neg_pos_ratio=np_ratio),
+        batch_size=batch_size, num_workers=workers, pin_memory=False)
+
+    val_loader = DataLoader(
+        Preprocessor(dataset.val, root=dataset.images_dir,
+                     transform=test_transformer),
+        batch_size=batch_size, num_workers=workers,
+        shuffle=False, pin_memory=False)
+
+    test_loader = DataLoader(
+        Preprocessor(list(set(dataset.query) | set(dataset.gallery)),
+                     root=dataset.images_dir, transform=test_transformer),
+        batch_size=batch_size, num_workers=workers,
+        shuffle=False, pin_memory=False)
+
+    return dataset, train_loader, val_loader, test_loader
+
+
+def main(args):
+    np.random.seed(args.seed)
+    torch.manual_seed(args.seed)
+    cudnn.benchmark = True
+
+    # Redirect print to both console and log file
+    if not args.evaluate:
+        sys.stdout = Logger(osp.join(args.logs_dir, 'log.txt'))
+    else:
+        log_dir = osp.dirname(args.resume)
+        sys.stdout = Logger(osp.join(log_dir, 'log_test.txt'))
+    # print("==========\nArgs:{}\n==========".format(args))
+
+    # Create data loaders
+    if args.height is None or args.width is None:
+        args.height, args.width = (256, 128)
+    dataset, train_loader, val_loader, test_loader = \
+        get_data(args.dataset, args.split, args.data_dir, args.height,
+                 args.width, args.batch_size, args.workers,
+                 args.combine_trainval, args.np_ratio)
+
+    # Create model
+    base_model = models.create(args.arch, cut_at_pooling=True)
+    embed_model = EltwiseSubEmbed(use_batch_norm=True, use_classifier=True,
+                                      num_features=2048, num_classes=2)
+    model = SiameseNet(base_model, embed_model)
+    model = nn.DataParallel(model).cuda()
+
+    # Evaluator
+    evaluator = CascadeEvaluator(
+        torch.nn.DataParallel(base_model).cuda(),
+        embed_model,
+        embed_dist_fn=lambda x: F.softmax(Variable(x), dim=1).data[:, 0])
+
+    # Load from checkpoint
+    best_mAP = 0
+    if args.resume:
+        checkpoint = load_checkpoint(args.resume)
+        if 'state_dict' in checkpoint.keys():
+            checkpoint = checkpoint['state_dict']
+        model.load_state_dict(checkpoint)
+
+        print("Test the loaded model:")
+        top1, mAP = evaluator.evaluate(test_loader, dataset.query, dataset.gallery, rerank_topk=100, dataset=args.dataset)
+        best_mAP = mAP
+
+    if args.evaluate:
+        return
+
+    # Criterion
+    criterion = nn.CrossEntropyLoss().cuda()
+    # Optimizer
+    param_groups = [
+        {'params': model.module.base_model.parameters(), 'lr_mult': 1.0},
+        {'params': model.module.embed_model.parameters(), 'lr_mult': 10.0}]
+    optimizer = torch.optim.SGD(param_groups, args.lr, momentum=args.momentum,
+                                weight_decay=args.weight_decay)
+    # Trainer
+    trainer = SiameseTrainer(model, criterion)
+
+    # Schedule learning rate
+    def adjust_lr(epoch):
+        lr = args.lr * (0.1 ** (epoch // args.step_size))
+        for g in optimizer.param_groups:
+            g['lr'] = lr * g.get('lr_mult', 1)
+
+    # Start training
+    for epoch in range(0, args.epochs):
+        adjust_lr(epoch)
+        trainer.train(epoch, train_loader, optimizer, base_lr=args.lr)
+
+        if epoch % args.eval_step==0:
+            mAP = evaluator.evaluate(val_loader, dataset.val, dataset.val, top1=False)
+            is_best = mAP > best_mAP
+            best_mAP = max(mAP, best_mAP)
+            save_checkpoint({
+                'state_dict': model.state_dict()
+            }, is_best, fpath=osp.join(args.logs_dir, 'checkpoint.pth.tar'))
+
+            print('\n * Finished epoch {:3d}  mAP: {:5.1%}  best: {:5.1%}{}\n'.
+                  format(epoch, mAP, best_mAP, ' *' if is_best else ''))
+
+    # Final test
+    print('Test with best model:')
+    checkpoint = load_checkpoint(osp.join(args.logs_dir, 'model_best.pth.tar'))
+    model.load_state_dict(checkpoint['state_dict'])
+    evaluator.evaluate(test_loader, dataset.query, dataset.gallery, dataset=args.dataset)
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser(description="Siamese reID baseline")
+    # data
+    parser.add_argument('-d', '--dataset', type=str, default='market1501',
+                        choices=datasets.names())
+    parser.add_argument('-b', '--batch-size', type=int, default=256)
+    parser.add_argument('-j', '--workers', type=int, default=4)
+    parser.add_argument('--split', type=int, default=0)
+    parser.add_argument('--height', type=int,
+                        help="input height, default: 256 for resnet")
+    parser.add_argument('--width', type=int,
+                        help="input width, default: 128 for resnet")
+    parser.add_argument('--combine-trainval', action='store_true',
+                        help="train and val sets together for training, "
+                             "val set alone for validation")
+    # model
+    parser.add_argument('-a', '--arch', type=str, default='resnet50',
+                        choices=models.names())
+    # optimizer
+    parser.add_argument('--lr', type=float, default=0.01, help="learning rate")
+    parser.add_argument('--np-ratio', type=int, default=3)
+    parser.add_argument('--momentum', type=float, default=0.9)
+    parser.add_argument('--weight-decay', type=float, default=5e-4)
+    parser.add_argument('--step-size', type=int, default=40)
+    # training configs
+    parser.add_argument('--resume', type=str, default='', metavar='PATH')
+    parser.add_argument('--evaluate', action='store_true',
+                        help="evaluation only")
+    parser.add_argument('--epochs', type=int, default=50)
+    parser.add_argument('--eval-step', type=int, default=20, help="evaluation step")
+    parser.add_argument('--seed', type=int, default=1)
+    # misc
+    working_dir = osp.dirname(osp.abspath(__file__))
+    parser.add_argument('--data-dir', type=str, metavar='PATH',
+                        default=osp.join(working_dir, 'datasets'))
+    parser.add_argument('--logs-dir', type=str, metavar='PATH',
+                        default=osp.join(working_dir, 'checkpoints'))
+    main(parser.parse_args())
diff --git a/fdgan/__init__.py b/fdgan/__init__.py
diff --git a/fdgan/losses.py b/fdgan/losses.py
@@ -0,0 +1,32 @@
+from __future__ import absolute_import
+import os, sys
+import functools
+import random
+
+import torch
+import torch.nn as nn
+from torch.autograd import Variable
+from torch.nn import functional as F
+from torch.nn import init
+
+class GANLoss(nn.Module):
+    def __init__(self, smooth=False):
+        super(GANLoss, self).__init__()
+        self.smooth = smooth
+
+    def get_target_tensor(self, input, target_is_real):
+        real_label = 1.0
+        fake_label = 0.0
+        if self.smooth:
+            real_label = random.uniform(0.7,1.0)
+            fake_label = random.uniform(0.0,0.3)
+        if target_is_real:
+            target_tensor = torch.ones_like(input).fill_(real_label)
+        else:
+            target_tensor = torch.zeros_like(input).fill_(fake_label)
+        return target_tensor
+
+    def __call__(self, input, target_is_real):
+        target_tensor = self.get_target_tensor(input, target_is_real)
+        input = F.sigmoid(input)
+        return F.binary_cross_entropy(input, target_tensor)