CARROT

Towards Robustness of Deep Program Processing Models -- Detection, Estimation and Enhancement

See the paper here.

Watch the video here.

Requirement

torch == 1.8.0
dgl == 0.7.2
transformers == 3.3

This is the recommended environment. Other versions may be compatible.

Preparing the Dataset

Use the pre-processed datasets

Download the already pre-processed datasets -- OJ, OJClone, and CodeChef.
Put the contents of OJ, OJClone and CodeChef in the corresponding directories of data, data_clone, and data_defect respectively.
The pre-processed datasets for GRU, LSTM, ASTNN, LSCNN, TBCNN, CodeBERT and CDLH are all included in the directories now.

Pre-process the datasets by yourself

Download the raw datasets, i.e., oj.tar.gz from OJ and codechef.zip from CodeChef
Put oj.tar.gz in the directory of data and codechef.zip in data_defect.
Run the following commands to build the OJ dataset for the DL models. The dataset format of CodeChef is almost identical to OJ, and the code can be reused.

> cd preprocess-lstm
> python3 main.py 
> cd ../preprocess-astnn
> python3 pipeline.py
> cd ../preprocess-tbcnn
> python3 main.py
> cd ..

Copy oj.pkl.gz, oj_uid.pkl.gz, and oj_inspos.pkl.gz in the directory of data and paste them into data_clone.
Run the following commands to build the OJClone dataset for the DL models.

> cd preprecess_clone-lstm
> python3 main.py
> cd ../preprocess_clone-astnn
> python3 main.py
> cd ../preprocess_clone-tbcnn
> python3 main.py
> cd ..

Everything is ready now.

Training the DL Model

The source code directories are named according to the dataset and the model. code, code_clone and code_defect refers to OJ, OJClone and CodeChef, respectively.

The source code files to train each model (i.e., GRU, LSTM, ASTNN, LSCNN, TBCNN, CodeBERT and CDLH) on each dataset (i.e., OJ, OJClone, and CodeChef) are included in each corresponding directory. For instance, code_defect-codebert refers to CodeBERT for CodeChef. Note that the GRU and LSTM models are both in the directory of lstm.

E.g., run the following commands to train a GRU model on OJ.

> cd code-lstm
> python3 lstm_train.py -gpu 0 -model GRU -lr 1e-3 -save_dir MODEL/SAVE/PATH --data ../data/oj.pkl.gz
> cd ..

Run the following commands to train a LSTM model on CodeChef.

> cd code_defect-lstm
> python3 lstm_train.py -gpu 0 -model LSTM -lr 1e-3 -save_dir MODEL/SAVE/PATH --data ../data_defect/oj.pkl.gz
> cd ..

Run the following commands to train a CDLH model on OJClone.

> cd code_clone-cdlh
> python3 train.py --save_dir MODEL/SAVE/PATH
> cd ..

Adversarial Attack

Run python3 attacker.py in each directory to attack the DL models.

E.g., run the following commands to attack the CodeBERT model on OJ.

> cd code-codebert
> python3 attacker.py --model_dir FINETUNED/CODEBERT/MODEL/PATH
> cd ..

The corresponding relationship between the attacking algorithm and the Attacker class is as the following table.

Attacking Algorithm	Class Name
I-CARROT	Attacker
S-CARROT	InsAttacker
I-RW	AttackerRandom
S-RW	InsAttackerRandom

One may use different attacking algorithm (including I-CARROT, S-CARROT, I-RW, and S-RW) by employing different Attacker's in the code.

Robustness Measurement

After adversarial attack, the logging files are obtained. Run the following commands to compute the robustness of the DL model.

> python3 compute_robustness.py -I PATH/TO/ICARROT/LOG -S PATH/TO/SCARROT/LOG

Adversarial Training

Take LSTM on OJClone for example.

Run the following commands to create the adversarial example training set.

> cd code_clone-lstm
> python3 attacker4training.py
> cd ..

Run the following commands to adversarially train the model.

> cd code_clone-lstm
> python3 lstm_train.py --adv_train_path PATH/TO/ADVERSARIAL/EXAMPLE/SET --OTHER_ARGUMENTS
> ..

Go back to step 1 to iteratively update the adversarial example set upon the current training set.

Citation

@article{zhang2022towards,
  title={Towards Robustness of Deep Program Processing Models--Detection, Estimation and Enhancement},
  author={Zhang, Huangzhao and Fu, Zhiyi and Li, Ge and Ma, Lei and Zhao, Zhehao and Yang, Hua’an and Sun, Yizhe and Liu, Yang and Jin, Zhi},
  journal={ACM Transactions on Software Engineering and Methodology},
  year={2022},
  publisher={ACM New York, NY}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CARROT

Requirement

Preparing the Dataset

Training the DL Model

Adversarial Attack

Robustness Measurement

Adversarial Training

Citation

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
code-astnn		code-astnn
code-codebert		code-codebert
code-lscnn		code-lscnn
code-lstm		code-lstm
code-tbcnn		code-tbcnn
code_clone-astnn		code_clone-astnn
code_clone-cdlh		code_clone-cdlh
code_clone-lscnn		code_clone-lscnn
code_clone-lstm		code_clone-lstm
code_clone-tbcnn		code_clone-tbcnn
code_defect-astnn		code_defect-astnn
code_defect-codebert		code_defect-codebert
code_defect-lscnn		code_defect-lscnn
code_defect-lstm		code_defect-lstm
code_defect-tbcnn		code_defect-tbcnn
data		data
data_clone		data_clone
data_defect		data_defect
preprocess-astnn		preprocess-astnn
preprocess-lstm		preprocess-lstm
preprocess-tbcnn		preprocess-tbcnn
preprocess_clone-astnn		preprocess_clone-astnn
preprocess_clone-lstm		preprocess_clone-lstm
preprocess_clone-tbcnn		preprocess_clone-tbcnn
README.md		README.md
compute_robustness.py		compute_robustness.py

SEKE-Adversary/CARROT

Folders and files

Latest commit

History

Repository files navigation

CARROT

Requirement

Preparing the Dataset

Training the DL Model

Adversarial Attack

Robustness Measurement

Adversarial Training

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages