Watermarking is one of the existing prominent techniques on tracing text provenance. It is implemented by covertly embedding watermark signals into an object (image, audio, text) that is helpful to track the ownership of the object. In this paper, we propose a novel lexical substitution (LS) method based on a paraphraser, and utilize it to design our watermarking method:
- Python>=3.6
- torch>=1.7.1
- transformers==4.9.2
- fairseq==0.10.2
- ParaLS (the proposed novel lexical substitution (LS) method)
- BLEURT (bleurt-large-512)
- BARTScore
- BERTScore
- three novels (Wuthering Heights, Dracula, and Pride and Prejudice)
- others (WikiText-2, IMDB, and NgNews)
# You can find the possible replacement word set by this command specifying the target word and the target word to be replaced.
python LSPara_Multi_with_bart_until_target_no_suffix.py en2en 'A good subject to start.' 'good' 2 20
python run_watermark_no_substitutable.py
python calculate_others.py
@article{qiang2022watermark,
title={Natural Language Watermarking via Paraphraser-based Lexical Substitution},
author={Jipeng Qiang, Shiyu Zhu, Yun Li, Yi Zhu, Yunhao Yuan, Xindong Wu},
journal={Artificial Intelligence},
year={2022},
}