This is the source code of our HKU Master Degree graduation project. The project aims to do the archaeological sites segmentation to the remote sensing images, and there are numerous challenges different from traditional object segmentation tasks, including large amount of unlabeled images, weakly supervised dataset, domain shift problem, etc. The dataset can be downloaded at https://drive.google.com/drive/folders/1YyODJlOZvifnA3vXRuQgWSIMeumnnMbt?usp=sharing.
If you want to reproduce the result, you may need a torch2.0 with gpu. Our python version is 3.10, please refer to the requirements.txt to config your python environment, which is automatically generated by pipreqs library. When the environment is ready, the configuration of our project is in config.py, which is also needed to be modified to suit your environment, including the data directories and batch size.
If you want to retrain the model, you need to call the train() functions, which may be commented out, and you need to remove the #. Similarly, if you want to see the results of our trained model, you'll need to call the predict() function and make sure that the pre-trained weights we've provided have been saved in the checkpoints folder. Then the results will be shown in visdom. We provide the checkpoints here: https://drive.google.com/drive/folders/1Yz7m9PZk_9Xx4niU5vM77JqvP_Xbqp0R?usp=sharing
The project folder includes two open source libraries, DINOv2 by Meta AI in the dinov2_source folder and Transformers by huggingface in the segformer_source folder. Among them, the code of DIVOv2 is not modified and used for building the traditional ViT based segmentation model, which is abandoned currently and replaced by SegFormer to be the basic segmentation model. As for the Transformers library, only the code of SegFormer part is used, and modified to support prompt tuning and domain prompting. As a result, the added functionality can be easily called through the models\SegFormer class we wrote. In addition, pretrained SegFormer model by Nvidia is used in our model as transfer learning encoder and initial weight.
- main_segformer stores the main functions to our research, which is the most significant part and the entrances of our code.
- checkpoints saves the eventual model of our training, as they are large, we have provided the link of Google Drive to download them: https://drive.google.com/drive/folders/1Yz7m9PZk_9Xx4niU5vM77JqvP_Xbqp0R?usp=sharing
- main_other stores the main functions to do CNN-based segmentation, json mask file to png mask file and feature matching comparisons between SuperGlue and DINOv2. The feature matching comparison is used in intern report, but no longer used in the final project.
- models stores the model classes for our segmentation task, which are designed by us, by implementing other basic models including SegFormer in Transformers by huggingface library.
- figures stores the figures when running hyperparameter tuning or training loss plot.
- segformer_source is the source code for Transformers by huggingface library, only the code of SegFormer part is used, and modified to support prompt tuning and domain prompting. As a result, the added functionality can be easily called through the models\SegFormer class we wrote.
- dinov2_source(abandoned) is the source code for DINOv2 by Meta AI, which is abandoned in our final research.
- main_vit_based(abandoned) is the main functions to do traditional ViT-based segmentation, which is no longer used in the final report and replaced by SegFormer.
- baseline_segmentation.py trains the CNN baseline models on our labeled dataset. segmentation_models_pytorch library is used for model collection.
- segFormer_main.py directly trains a SegFormer on our labeled dataset, as our ViT based baseline result.
- segFormer_autoencoder_main.py could train a SegFormer based Autoencoder model based on both the unlabeled images and labeled images. The trained weight could be utilized into our segmentation task via segFormer_transfer_learning_main.py.
- segFormer_transfer_learning_main.py could do prompt tuning for the pretrained weights.
- segFormer_semi_teacherstudent_main.py is the code for our proposed teacher-student structure, which can achieve both trustworthy pseudo-labels learning and weakly supervised learning.
- segFormer_fewshot_learning.py is the code for our few-shot domain prompting idea implementation.