A keras implementation of the paper Deep Code Search.
Tested in Ubuntu 16.04
- Python 3.6
- Keras 2.3.1 or newer
- Tensorflow 2.0.0 or Theano 0.8.0~0.9.1
-
models.py
: Neural network models for code/desc representation and similarity measure. -
main.py
: The main entry for code search, including four sub-tasks:- Train: train the code/desc representaton models;
- Eval: evaluate the learnt code/desc representation models;
- Code Embedding: encode code into vectors and store them to a file;
- Search: search relevant code for a given query.
-
configs.py
: Configurations for models defined in themodels.py
. Each function defines the hyperparameters for the corresponding model.
The /data
folder provides a small dummy dataset for quick deployment.
To train and test our model:
-
Download and unzip real dataset from Google Drive or Baidu Pan for Chinese users.
-
Replace each file in the
/data
folder with the corresponding real file.
Edit hyper-parameters and settings in config.py
python main.py --mode train
First, set reload
in config.py
to the number of optimal checkpoint, e.g., 500
Then, run
python main.py --mode repr_code
First, set reload
in config.py
to the number of optimal checkpoint, e.g., 500
Then, run
python main.py --mode search
An online tool demo can be found at http://211.249.63.55:81/ (Unavailable Now)
If you find it useful and would like to cite it, the following would be appropriate:
@inproceedings{gu2018deepcs,
title={Deep Code Search},
author={Gu, Xiaodong and Zhang, Hongyu and Kim, Sunghun},
booktitle={Proceedings of the 2018 40th International Conference on Software Engineering (ICSE 2018)},
year={2018},
organization={ACM}
}