Towards Robustness of Deep Program Processing Models -- Detection, Estimation and Enhancement
See the paper here.
Watch the video here.
torch == 1.8.0
dgl == 0.7.2
transformers == 3.3
This is the recommended environment. Other versions may be compatible.
Use the pre-processed datasets
-
Download the already pre-processed datasets -- OJ, OJClone, and CodeChef.
-
Put the contents of OJ, OJClone and CodeChef in the corresponding directories of
data
,data_clone
, anddata_defect
respectively. -
The pre-processed datasets for GRU, LSTM, ASTNN, LSCNN, TBCNN, CodeBERT and CDLH are all included in the directories now.
Pre-process the datasets by yourself
-
Download the raw datasets, i.e.,
oj.tar.gz
from OJ andcodechef.zip
from CodeChef -
Put
oj.tar.gz
in the directory ofdata
andcodechef.zip
indata_defect
. -
Run the following commands to build the OJ dataset for the DL models. The dataset format of CodeChef is almost identical to OJ, and the code can be reused.
> cd preprocess-lstm
> python3 main.py
> cd ../preprocess-astnn
> python3 pipeline.py
> cd ../preprocess-tbcnn
> python3 main.py
> cd ..
-
Copy
oj.pkl.gz
,oj_uid.pkl.gz
, andoj_inspos.pkl.gz
in the directory ofdata
and paste them intodata_clone
. -
Run the following commands to build the OJClone dataset for the DL models.
> cd preprecess_clone-lstm
> python3 main.py
> cd ../preprocess_clone-astnn
> python3 main.py
> cd ../preprocess_clone-tbcnn
> python3 main.py
> cd ..
- Everything is ready now.
The source code directories are named according to the dataset and the model. code
, code_clone
and code_defect
refers to OJ, OJClone and CodeChef, respectively.
The source code files to train each model (i.e., GRU, LSTM, ASTNN, LSCNN, TBCNN, CodeBERT and CDLH) on each dataset (i.e., OJ, OJClone, and CodeChef) are included in each corresponding directory. For instance, code_defect-codebert
refers to CodeBERT for CodeChef. Note that the GRU and LSTM models are both in the directory of lstm
.
E.g., run the following commands to train a GRU model on OJ.
> cd code-lstm
> python3 lstm_train.py -gpu 0 -model GRU -lr 1e-3 -save_dir MODEL/SAVE/PATH --data ../data/oj.pkl.gz
> cd ..
Run the following commands to train a LSTM model on CodeChef.
> cd code_defect-lstm
> python3 lstm_train.py -gpu 0 -model LSTM -lr 1e-3 -save_dir MODEL/SAVE/PATH --data ../data_defect/oj.pkl.gz
> cd ..
Run the following commands to train a CDLH model on OJClone.
> cd code_clone-cdlh
> python3 train.py --save_dir MODEL/SAVE/PATH
> cd ..
Run python3 attacker.py
in each directory to attack the DL models.
E.g., run the following commands to attack the CodeBERT model on OJ.
> cd code-codebert
> python3 attacker.py --model_dir FINETUNED/CODEBERT/MODEL/PATH
> cd ..
The corresponding relationship between the attacking algorithm and the Attacker
class is as the following table.
Attacking Algorithm | Class Name |
---|---|
I-CARROT | Attacker |
S-CARROT | InsAttacker |
I-RW | AttackerRandom |
S-RW | InsAttackerRandom |
One may use different attacking algorithm (including I-CARROT, S-CARROT, I-RW, and S-RW) by employing different Attacker
's in the code.
After adversarial attack, the logging files are obtained. Run the following commands to compute the robustness of the DL model.
> python3 compute_robustness.py -I PATH/TO/ICARROT/LOG -S PATH/TO/SCARROT/LOG
Take LSTM on OJClone for example.
- Run the following commands to create the adversarial example training set.
> cd code_clone-lstm
> python3 attacker4training.py
> cd ..
- Run the following commands to adversarially train the model.
> cd code_clone-lstm
> python3 lstm_train.py --adv_train_path PATH/TO/ADVERSARIAL/EXAMPLE/SET --OTHER_ARGUMENTS
> ..
- Go back to step 1 to iteratively update the adversarial example set upon the current training set.
@article{zhang2022towards,
title={Towards Robustness of Deep Program Processing Models--Detection, Estimation and Enhancement},
author={Zhang, Huangzhao and Fu, Zhiyi and Li, Ge and Ma, Lei and Zhao, Zhehao and Yang, Hua’an and Sun, Yizhe and Liu, Yang and Jin, Zhi},
journal={ACM Transactions on Software Engineering and Methodology},
year={2022},
publisher={ACM New York, NY}
}