Skip to content

zsy200075/Para-NLW

Repository files navigation

Natural Language Watermarking via Paraphraser-based Lexical Substitution

Watermarking is one of the existing prominent techniques on tracing text provenance. It is implemented by covertly embedding watermark signals into an object (image, audio, text) that is helpful to track the ownership of the object. In this paper, we propose a novel lexical substitution (LS) method based on a paraphraser, and utilize it to design our watermarking method:

Dependecies

  • Python>=3.6
  • torch>=1.7.1
  • transformers==4.9.2
  • fairseq==0.10.2

Pre-trained model

Dataset

  • three novels (Wuthering Heights, Dracula, and Pride and Prejudice)
  • others (WikiText-2, IMDB, and NgNews)

Obtain synonym set through ParaLS

# You can find the possible replacement word set by this command specifying the target word and the target word to be replaced.
python LSPara_Multi_with_bart_until_target_no_suffix.py en2en 'A good subject to start.' 'good' 2 20 

Embed and Extract watermarks

python run_watermark_no_substitutable.py

Calculate Payload and Recoverability by counting the number of watermarks

python calculate_others.py

Citation

@article{qiang2022watermark,
    title={Natural Language Watermarking via Paraphraser-based Lexical Substitution},
    author={Jipeng Qiang, Shiyu Zhu, Yun Li, Yi Zhu, Yunhao Yuan, Xindong Wu},
    journal={Artificial Intelligence},
    year={2022},
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published