Natural Language Watermarking via Paraphraser-based Lexical Substitution

Watermarking is one of the existing prominent techniques on tracing text provenance. It is implemented by covertly embedding watermark signals into an object (image, audio, text) that is helpful to track the ownership of the object. In this paper, we propose a novel lexical substitution (LS) method based on a paraphraser, and utilize it to design our watermarking method:

Dependecies

Python>=3.6
torch>=1.7.1
transformers==4.9.2
fairseq==0.10.2

Pre-trained model

ParaLS (the proposed novel lexical substitution (LS) method)
BLEURT (bleurt-large-512)
BARTScore
BERTScore

Dataset

three novels (Wuthering Heights, Dracula, and Pride and Prejudice)
others (WikiText-2, IMDB, and NgNews)

Obtain synonym set through ParaLS

# You can find the possible replacement word set by this command specifying the target word and the target word to be replaced.
python LSPara_Multi_with_bart_until_target_no_suffix.py en2en 'A good subject to start.' 'good' 2 20

Embed and Extract watermarks

python run_watermark_no_substitutable.py

Calculate Payload and Recoverability by counting the number of watermarks

python calculate_others.py

Citation

@article{qiang2022watermark,
    title={Natural Language Watermarking via Paraphraser-based Lexical Substitution},
    author={Jipeng Qiang, Shiyu Zhu, Yun Li, Yi Zhu, Yunhao Yuan, Xindong Wu},
    journal={Artificial Intelligence},
    year={2022},
}

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
novels		novels
LSPara_Multi_with_bart_until_target_no_suffix.py		LSPara_Multi_with_bart_until_target_no_suffix.py
README.md		README.md
batch_st_3.py		batch_st_3.py
calculate_others.py		calculate_others.py
mlm_install.sh		mlm_install.sh
run_watermark_no_substitutable.py		run_watermark_no_substitutable.py
watermarking.png		watermarking.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Natural Language Watermarking via Paraphraser-based Lexical Substitution

Dependecies

Pre-trained model

Dataset

Obtain synonym set through ParaLS

Embed and Extract watermarks

Calculate Payload and Recoverability by counting the number of watermarks

Citation

About

Releases

Packages

Languages

zsy200075/Para-NLW

Folders and files

Latest commit

History

Repository files navigation

Natural Language Watermarking via Paraphraser-based Lexical Substitution

Dependecies

Pre-trained model

Dataset

Obtain synonym set through ParaLS

Embed and Extract watermarks

Calculate Payload and Recoverability by counting the number of watermarks

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages