Skip to content

Artifact for ISSTA'23 paper "Understanding and Tackling Label Errors in Deep Learning-based Vulnerability Detection (Experience Paper)"

Notifications You must be signed in to change notification settings

security-pride/Vulnerability-Dataset-Denoising

Repository files navigation

Vulnerability Dataset Denoising (Empirical)

This toolkit is all the code used by ISSTA'23 paper "Understanding and Tackling Label Errors in Deep Learning-based Vulnerability Detection (Experience Paper)"

Our study reveals the persistent error label issues in the existing datasets used for source code vulnerability detection tasks.We highlight the necessity to construct high-quality datasets collected using reliable techniques. Here we offer our implementation of the models described in our paper, including DeepWuKong, SySeVr, VulDeePecker and two corresponding denoising methods (CL and DT). The datasets we use are also listed below.

Usage Guidance

Folder Description:

configs:

config files for deep learning models. In this work, we just use deepwukong.yaml, silver.yaml, and vuldeepecker.yaml.

models:

code files for deep learning models.

prepare_data:

util files that prepare data for FFmpeg+qumu.

tools:

program slice util files.

utils:

commonly used functions.

confident_learning.py:

entrance of confident learning.

differential_training.py:

entrance of differential training.

dwk_train.py:

entrance of training deepwukong.

sys_train.py:

entrance of training sysevr.

vdp_train.py:

entrance of training vuldeepecker.

scrd_crawl.py:

code for crawling sard dataset.

Datasets:

SARD:

You can crawl vulnerability data from the SARD official website through script:

python sard_crawl.py

Qemu+FFmpeg:Qemu+FFmpeg

You can download it via this link.

Citation

Xu Nie, Ningke Li, Kailong Wang, Shangguang Wang, Xiapu Luo, and Haoyu Wang. 2023. Understanding and Tackling Label Errors in Deep Learning-based Vulnerability Detection. In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA ’23), July 17–21, 2023, Seattle, WA, USA. ACM, New York, NY, USA, 12 pages. https://doi.org/10.1145/3597926.3598037

About

Artifact for ISSTA'23 paper "Understanding and Tackling Label Errors in Deep Learning-based Vulnerability Detection (Experience Paper)"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •