We present HuRef, a human-readable fingerprinting technique for LLMs that uniquely identifies the base model without interfering with its training or exposing model parameters.
For more details, please refer to our paper!
To set up the environment, follow these steps:
conda create -n huref python=3.8
conda activate huref
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.6 -c pytorch -c nvidia
pip install -r requirements.txt
To extract invariant terms from an LLM, we need a frequency-sorted token list. You can either use our pre-sorted token list or generate your own by running sort_tokens_frequency.py
.
The encoder maps the invariant terms to a Gaussian feature vector. You can download our trained encoder from this link, or you can train the encoder from scratch by running encoder_train.py
with the default settings.
Once everything is ready, generate the fingerprint using the following command:
python main.py --model_path your_model_name_or_path --sorted_tokens_path your_sorted_token_list_path --invariant_terms_saved_path your_invariant_terms_folder_path --encoder_path your_encoder_path --feature_extract_method Mean_pooling_or_CNN
You can choose either Mean pooling (a simpler method) or CNN for feature extraction. If you select Mean pooling, the encoder preparation step can be skipped. We have listed the LLMs' Hugging Face repository paths in models_hf_repo_path.txt
. You can choose one as your your_model_name_or_path
to generate the fingerprint.
To calculate the PCS between LLaMA-family models (reproducing the results in Table 1), simply run:
python Parameters_Cos_Sim.py
To calculate ICS between LLMs, run:
python Invariant_terms_Cos_Sim.py --invariant_terms_saved_path your_invariant_terms_folder_path --ics_calcu_models llama_family_models_or_offspring_models_or_independent_models
You can set the --ics_calcu_models
argument to one of the following options to reproduce the results in specific tables:
- llama_family_models (Table 3)
- offspring_models (Table 2)
- independent_models (Table 7)
Note: You need to first extract invariant terms for these models using code in main.py
before calculating ICS.
The code for the StyleGAN2 component in this project was adapted and modified from pytorch-pretrained-gans.
If you have any questions or encounter issues, feel free to open an issue or reach out to Boyi Zeng at Shanghai Jiao Tong University.
If you find this work useful, please consider citing our paper:
@inproceedings{zeng2023huref,
title={Huref: Human-readable fingerprint for large language models},
author={Zeng, Boyi and Wang, Lizheng and Hu, Yuncong and Xu, Yi and Zhou, Chenghu and Wang, Xinbing and Yu, Yu and Lin, Zhouhan},
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems}
}
.