Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproduce the Potsdam results #39

Open
iseong83 opened this issue Jul 7, 2022 · 9 comments
Open

Reproduce the Potsdam results #39

iseong83 opened this issue Jul 7, 2022 · 9 comments

Comments

@iseong83
Copy link

iseong83 commented Jul 7, 2022

Could you help to reproduce the results with the Potsdam dataset? I trained STEGO with the same configuration used in potsdam_test.ckpt and then evaluate the model using eval_segmentation.py, but Accuracy and IoU of clustering are low.
Using potsdam_test.ckpt, I got

'final/linear/mIoU': 74.83345866203308, 
'final/linear/Accuracy': 85.84609031677246,
'final/cluster/mIoU': 62.565261125564575, 
'final/cluster/Accuracy': 77.03110575675964

but, using my checkpoint, I got

'final/linear/mIoU': 74.89467859268188, 
'final/linear/Accuracy': 85.89659333229065, 
'final/cluster/mIoU': 47.732433676719666, 
'final/cluster/Accuracy': 64.23421502113342

The results with the linear probe look good, but not the one with the cluster. Could you help to figure out what can make the difference?

Here is my configuration used to train STEGO:

output_root: ../
pytorch_data_dir: /home/bv/datasets/external_datasets
experiment_name: exp1
log_dir: potsdam
azureml_logging: true
submitting_to_aml: false
num_workers: 24
max_steps: 5000
batch_size: 16
num_neighbors: 7
dataset_name: potsdam
dir_dataset_name: null
dir_dataset_n_classes: 5
has_labels: false
crop_type: null
crop_ratio: 0.5
res: 224
loader_crop_type: center
extra_clusters: 0
use_true_labels: false
use_recalibrator: false
model_type: vit_small
arch: dino
use_fit_model: false
dino_feat_type: feat
projection_type: nonlinear
dino_patch_size: 8
granularity: 1
continuous: true
dim: 70
dropout: true
zero_clamp: true
lr: 0.0005
pretrained_weights: null
use_salience: false
stabalize: false
stop_at_zero: true
pointwise: true
feature_samples: 11
neg_samples: 5
aug_alignment_weight: 0.0
correspondence_weight: 1.0
neg_inter_weight: 0.63
pos_inter_weight: 0.25
pos_intra_weight: 0.67
neg_inter_shift: 0.76
pos_inter_shift: 0.02
pos_intra_shift: 0.08
rec_weight: 0.0
repulsion_weight: 0.0
crf_weight: 0.0
alpha: 0.5
beta: 0.15
gamma: 0.05
w1: 10.0
w2: 3.0
shift: 0.0
crf_samples: 1000
color_space: rgb
reset_probe_steps: null
n_images: 5
scalar_log_freq: 10
checkpoint_freq: 50
val_freq: 100
hist_freq: 100
full_name: potsdam/potsdam_exp1
@BradNeuberg
Copy link

For the record I'm seeing exactly the same problem -- I can replicate the STEGO results against the model already trained, but when I train myself I get a lower accuracy for the cluster probe than the paper reports.

@BradNeuberg
Copy link

BradNeuberg commented Jul 21, 2022

I attempted to train cocostuff to get a successful training run to see what the graphs looked like (#23 (comment)). Even with this, though, I could not successfully tune the Potsdam hyperparameters.

I decided to turn to a Bayesian hyperparameter optimizer, SigOpt. I had it run for about 100 times, tuning the various positive and negative hyperparameters, focused on just optimizing cluster mIoU. Technically I should have had it optimize linear accuracy/mIoU and cluster accuracy/mIoU all together, but for simplicity just chose cluster mIoU. It came up with these hyperparameter values for the Potsdam dataset:

Parameters:
neg_inter_shift: 0.9981259810906995
neg_inter_weight: 0.19914806514497108
pos_inter_shift: 0.17863135533504992
pos_inter_weight: 0.6098772723430869
pos_intra_shift: 0.003232418118101617
pos_intra_weight: 1

Unfortunately, even with this, I still could not replicate the Potsdam results listed in the paper:

image

At this point, I think there is something more fundamentally broken somewhere in STEGO related to Potsdam, perhaps in the dataset as a bug or elsewhere.

@mhamilton723
Copy link
Owner

Thanks for replicating this @BradNeuberg, this might be something related to the specifics of your distributed training setup. How many workers do you use and are you using same batch size? These models were trained on a single GPU so this might have affected training.

@BradNeuberg
Copy link

I am using Google Cloud, with the machine type being an n1-standard-8 with 8 CPU cores and a V-100 GPU. Since I have 8 CPU cores, I could potentially set num_workers to 8; however, I consistently get out of memory errors at about epoch 22 if I do that, so I've set the num_workers to 1, which gets rid of out of memory errors. My batch size is 32. I'm only using a single machine and a single GPU for training.

@tanveer6715
Copy link

Hi @BradNeuberg ,

Will you show some example how did you use Bayesian hyperparameter optimizer, SigOpt to optimize the hyperparameters for STEGO model?

@Cemm23333
Copy link

How to deal with the problem about potsdam repulicating?

@Cemm23333
Copy link

@mhamilton723 ,could you share the hyparams about postdam?

@axkoenig
Copy link

Hi folks,
congrats on the great paper! To add to the discussion, I'd like to share that we are publishing a follow-up study on STEGO in CVPR 23 Workshops, which also looks into the issues you describe. Figure 4 might be interesting to you! :)
Cheers, Alex

@22by7-raikar
Copy link

@mhamilton723 ,could you share the hyparams about postdam?

@Cemm23333 you can find them here: https://arxiv.org/abs/2304.07314

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants