SimpleNet: A Simple Network for Image Anomaly Detection and Localization
This repository contains code for SimpleNet implemented in PyTorch that can also use a CLIP backbone (i.e., ViT-B/32) as a feature extractor.
The code for the CLIP Backbone can be found in common.py
, where the class CLIPFeatureExtractor
contains the logic of the feature extractor component. In short, it aims to return a list of features that were obtained after the input was passed through the layers mentioned in the command line. For example, if we set the parameters -le
to 2 and 5, then the CLIP Feature Extractor will return a list of 2 feature vectors resulted from the layer 2 and 5, respectively.
The setup can be made by running the following commands in a terminal:
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
Edit run.sh
to edit dataset class and dataset path.
Download the dataset from here.
The dataset folders/files follow its original structure.
Please specicy dataset path (line1) and log folder (line10) in run.sh
before running.
run.sh
gives the configuration to train models on MVTecAD dataset.
bash run.sh
In order to run the code using a CLIP backbone, you can change the bash script run.sh
by using the following parameters:
-b clip \
-le 2 \
-le 3 \
--pretrain_embed_dimension 768 \
--target_embed_dimension 768 \
--patchsize 10 \
--meta_epochs 40 \
It is important to mention that the indices of the layers (i.e., '-le' parameters) need to be integers from 0 to 11 (because ViT-B/32 has 12 attention blocks).
One could perform hyperparameter tuning to find the best values of the parameters.
@inproceedings{liu2023simplenet,
title={SimpleNet: A Simple Network for Image Anomaly Detection and Localization},
author={Liu, Zhikang and Zhou, Yiming and Xu, Yuansheng and Wang, Zilei},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={20402--20411},
year={2023}
}
Thanks for great inspiration from PatchCore
All code within the repo is under MIT license