We uploaded the dataset we used in our empirical study to Zenodo. The dataset is organized as follows:
Please download and unzip the dataset.zip
file from Zenodo. After unzipping, you should see the following directory structure:
PLTranslationEmpirical
├── dataset
├── codenet
├── avatar
├── evalplus
├── real-life-cli
├── ...
-
GPT-4 Vanilla Prompt:
$SOURCE_CODE # Translate the above $SOURCE_LANG code to $TARGET_LANG. Print only the $TARGET_LANG code and end with the comment "End of Code".
-
CodeGeeX Vanilla Prompt:
code translation $SOURCE_LANG: $SOURCE_CODE $TARGET_LANG:
-
StarCoder, CodeGen, Llama-2, TheBloke-Airoboros, and TheBloke-Vicuna Vanilla Prompt:
$SOURCE_LANG Code: $SOURCE_CODE Translate the above $SOURCE_LANG code to $TARGET_LANG. $TARGET_LANG Code:
-
GPT-4 Fix Prompt when effect is COMPILE ERROR or RUNTIME ERROR and dataset is Evalplus:
You were asked to translate the following $SOURCE_LANG code to $TARGET_LANG: $SOURCE_CODE Your response was the following $TARGET_LANG code: $TRANSLATED_CODE Executing your generated code gives the following error because it is syntactically incorrect: $STDERR Can you re-generate your response and translate the above $SOURCE_LANG code to $TARGET_LANG. Print only the $TARGET_LANG code inside ```$TARGET_LANG{response}``` and do not add any other natural language description in your output, and do not change the method signature from incorrect translation. Make sure your generated code is syntactically correct.
-
GPT-4 Fix Prompt when effect is COMPILE ERROR or RUNTIME ERROR and dataset is CodeNet or AVATAR:
You were asked to translate the following $SOURCE_LANG code to $TARGET_LANG: $SOURCE_CODE Your response was the following $TARGET_LANG code: $TRANSLATED_CODE Executing your generated code gives the following error because it is syntactically incorrect: $STDERR Can you re-generate your response and translate the above $SOURCE_LANG code to $TARGET_LANG. Print only the $TARGET_LANG code inside ```$TARGET_LANG{response}``` and do not add any other natural language description in your output. Make sure your generated code is syntactically correct.
-
GPT-4 Fix Prompt when effect is INCORRECT OUTPUT and dataset is Evalplus:
You were asked to translate the following $SOURCE_LANG code to $TARGET_LANG: $SOURCE_CODE Your response was the following $TARGET_LANG code: $TRANSLATED_CODE Executing your generated code gives the following test failure: $STDERR Can you re-generate your response and translate the above $SOURCE_LANG code to $TARGET_LANG. Print only the $TARGET_LANG code inside ```$TARGET_LANG{response}```and do not add any other natural language description in your output, and do not change the method signature from incorrect translation. Make sure your generated code is syntactically correct."
-
GPT-4 Fix Prompt when effect is INCORRECT OUTPUT and dataset is CodeNet or AVATAR:
You were asked to translate the following $SOURCE_LANG code to $TARGET_LANG: $SOURCE_CODE Your response was the following $TARGET_LANG code: $TRANSLATED_CODE Executing your generated code gives the following output: $GENERATED_OUTPUT instead of the following expected output: $EXPECTED_OUTPUT Can you re-generate your response and translate the above $SOURCE_LANG code to $TARGET_LANG. Print only the $TARGET_LANG code inside ```$TARGET_LANG{response}``` and do not add any other natural language description in your output. Make sure your generated code is syntactically correct. Your generated $TARGET_LANG code should take the following input and generate the expected output: Input: $TEST_INPUT Expected Output: $EXPECTED_OUTPUT
-
StarCoder, CodeGen, Llama-2 Fix Prompt when effect is COMPILE ERROR or RUNTIME ERROR and dataset is Evalplus:
You were asked to translate the following $SOURCE_LANG code to $TARGET_LANG: $SOURCE_CODE Your response was the following $TARGET_LANG code: $TRANSLATED_CODE Executing your generated code gives the following error because it is syntactically incorrect: $STDERR Can you re-generate your response and translate the above $SOURCE_LANG code to $TARGET_LANG. Do not add any natural language description in your response, and do not change the method signature from incorrect translation. $TARGET_LANG Code:
-
StarCoder, CodeGen, Llama-2 Fix Prompt when effect is COMPILE ERROR or RUNTIME ERROR and dataset is CodeNet or AVATAR:
You were asked to translate the following $SOURCE_LANG code to $TARGET_LANG: $SOURCE_CODE Your response was the following $TARGET_LANG code: $TRANSLATED_CODE Executing your generated code gives the following error because it is syntactically incorrect: $STDERR Can you re-generate your response and translate the above $SOURCE_LANG code to $TARGET_LANG. Do not add any natural language description in your response. $TARGET_LANG Code:
-
StarCoder, CodeGen, Llama-2 Fix Prompt when effect is INCORRECT OUTPUT and dataset is Evalplus:
You were asked to translate the following $SOURCE_LANG code to $TARGET_LANG: $SOURCE_CODE Your response was the following $TARGET_LANG code: $TRANSLATED_CODE Executing your generated code gives the following test failure: $STDERR Can you re-generate your response and translate the above $SOURCE_LANG code to $TARGET_LANG. Do not add any natural language description in your output, and do not change the method signature from incorrect translation. $TARGET_LANG Code:
-
StarCoder, CodeGen, Llama-2 Fix Prompt when effect is INCORRECT OUTPUT and dataset is CodeNet or AVATAR:
You were asked to translate the following $SOURCE_LANG code to $TARGET_LANG: $SOURCE_CODE Your response was the following $TARGET_LANG code: $TRANSLATED_CODE Executing your generated code gives the following output: $GENERATED_OUTPUT instead of the following expected output: $EXPECTED_OUTPUT Can you re-generate your response and translate the above $SOURCE_LANG code to $TARGET_LANG. Do not add any natural language description in your response. Your generated $TARGET_LANG code should take the following input and generate the expected output: Input: $TEST_INPUT Expected Output: $EXPECTED_OUTPUT $TARGET_LANG Code:
Note 1: For StarCoder, the prompt is encapsulated inside special tokens <fim_prefix>
and <fim_suffix><fim_middle>
.
Note 2: We consider Non-terminating Execution (NTE) effect as a RUNTIME ERROR and replace the STDERR with a custom feedback "the program enters infinite loop".
We provide bash scripts for reproducing our results in this work. First, we discuss the translation script. For doing translation with a model and dataset, first you need to create a .env
file in the repository and add the following:
OPENAI_API_KEY=<your openai api key>
LLAMA2_AUTH_TOKEN=<your llama2 auth token from huggingface>
STARCODER_AUTH_TOKEN=<your starcoder auth token from huggingface>
- Translation with GPT-4: You can run the following command to translate all
Python -> Java
code snippets incodenet
dataset with theGPT-4
while top-k sampling isk=50
, top-p sampling isp=0.95
, andtemperature=0.7
:
bash scripts/translate.sh GPT-4 codenet Python Java 50 0.95 0.7
- Translation with CodeGeeX: Prior to running the script, you need to clone the CodeGeeX repository from here and use the instructions from their artifacts to download their model weights. After cloning it inside
PLTranslationEmpirical
and downloading the model weights, your directory structure should be like the following:
PLTranslationEmpirical
├── dataset
├── codenet
├── avatar
├── evalplus
├── real-life-cli
├── CodeGeeX
├── codegeex
├── codegeex_13b.pt # this file is the model weight
├── ...
├── ...
You can run the following command to translate all Python -> Java
code snippets in codenet
dataset with the CodeGeeX
while top-k sampling is k=0
, top-p sampling is p=0.95
, and temperature=0.2
on GPU gpu_id=0
:
bash scripts/translate.sh CodeGeeX codenet Python Java 0 0.95 0.2 0
- For all other models (StarCoder, CodeGen, LLaMa, TB-Airoboros, TB-Vicuna), you can execute the following command to translate all
Python -> Java
code snippets incodenet
dataset with theStarCoder|CodeGen|LLaMa|TB-Airoboros|TB-Vicuna
while top-k sampling isk=0
, top-p sampling isp=0.95
, andtemperature=0.2
on GPUgpu_id=0
:
bash scripts/translate.sh StarCoder codenet Python Java 0 0.95 0.2 0