Name		Name	Last commit message	Last commit date
parent directory ..
prompts		prompts
tagging		tagging
README.md		README.md
completion_gpt.py		completion_gpt.py
completion_open_model.py		completion_open_model.py
model_configs.json		model_configs.json
run_test.sh		run_test.sh
step1.1_gen_questions.py		step1.1_gen_questions.py
step1.2_completion.sh		step1.2_completion.sh
step1.3_proccess_and_sanitize.py		step1.3_proccess_and_sanitize.py
step2.1_completion.sh		step2.1_completion.sh
step2.2_gen_unit_tests.py		step2.2_gen_unit_tests.py
step2.3_run_all_tests.sh		step2.3_run_all_tests.sh
step2.4_gen_verified_triplets.py		step2.4_gen_verified_triplets.py
utils.py		utils.py

README.md

KodCode Pipeline

Here, we provide our pipeline for generating KodCode dataset.

Step 1: Generate Questions

Step 1.1: Generate Questions from Code

To generate synthetic questions, we first need to put seed questions/snippets/docs in the ../seeds folder.

Then, we can run the following command to generate questions. Available modes are leetcode, algorithm, data_structure, package, apps, codeforces, code_contests, taco, and docs.

python step1.1_gen_questions.py --total_prompts [total_prompts] --mode [mode]

Step 1.2: Obtain GPT-4o Completion

We then call the GPT-4o API to generate instructions for each question.

Step 1.3: Filter Out Questions and Perform Deduplication

To do this step, simply run the following command.

python step1.3_proccess_and_sanitize.py --input_file [file_name]

Step 2: Generate Solutions and Tests

Step 2.1: Obtain GPT-4o Completion

After you get the filtered instructions, you can run the following command to generate solutions and tests.

bash step2.1_gpt_completion.sh [file_name]

Step 2.2: Process Responses and Generate Unit Tests

This step will generate unit tests for each solution. The input folder contains trials of solutions and tests. In our experiments, we use 10 trials for each solution.

python step2.2_gen_unit_tests.py --input_folder [folder_name]

A folder starts with unit_test_ will be generated, which contains the unit tests for each solution.

Step 2.3: Run All Tests

This step will run all the tests and generate the results.

bash step2.3_run_all_tests.sh [unit_test_folder_name]

Step 2.4: Generate Verified Triplets

This step will generate verified triplets for each solution.

python step2.4_gen_verified_triplets.py --unit_test_folder [unit_test_folder_name]

After this step, you will get the verified question-solution-test triplets.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pipeline

pipeline

README.md

KodCode Pipeline

Step 1: Generate Questions

Step 1.1: Generate Questions from Code

Step 1.2: Obtain GPT-4o Completion

Step 1.3: Filter Out Questions and Perform Deduplication

Step 2: Generate Solutions and Tests

Step 2.1: Obtain GPT-4o Completion

Step 2.2: Process Responses and Generate Unit Tests

Step 2.3: Run All Tests

Step 2.4: Generate Verified Triplets

Files

pipeline

Directory actions

More options

Directory actions

More options

Latest commit

History

pipeline

Folders and files

parent directory

README.md

KodCode Pipeline

Step 1: Generate Questions

Step 1.1: Generate Questions from Code

Step 1.2: Obtain GPT-4o Completion

Step 1.3: Filter Out Questions and Perform Deduplication

Step 2: Generate Solutions and Tests

Step 2.1: Obtain GPT-4o Completion

Step 2.2: Process Responses and Generate Unit Tests

Step 2.3: Run All Tests

Step 2.4: Generate Verified Triplets