Grammar Prompting for Domain-Specific Language Generation with Large Language Models

This repo contains the code and data that we used in the paper Grammar Prompting for Domain-Specific Language Generation with Large Language Models.

Setup

Basic Setup

conda create --name grammar-prompting
pip install -e .

LLM setup

In prompting scripts, you can use the following strings to specify which LLM to use: azure/code-davinci-002, azure/gpt-35-turbo-0301, openai/gpt-4, google/models/text-bison-001. You should also provide corresponding API keys in the scripts. Our original experiments are done via GPT APIs provided by Azure, from which you can still get access to Codex (WARNING: super expensive).

By default, the scoring function for constrained decoding is based on sentence BERT. If your setup has access to Codex, you can comment out these lines to activate Codex-based scoring.

Setup for molecule generation experiments

For molecule generation experiments, the following setups are additionally required.

pip install rdkit
pip install -e third_party/retro_star/retro_star/packages/mlp_retrosyn/
pip install -e third_party/retro_star/retro_star/packages/rdchiral/
pip install -e third_party/fuseprop/
pip install -e third_party/GCN

Code structure and scripts

neural_lark contains code for handling data, prompting and constrained generation. minEarley contains the code for Earley-based parsing. The parser code is adapted from Lark.

Semantic Parsing

run_geo_std_icl.sh  # standard prompting on geoqeury
run_geo_rot_icl.sh  # grammar prompting on geoquery
run_overnight_std_icl.sh # standard prompting on Overnight-Block
run_overnight_cot_icl.sh # grammar prompting on Overnight-Block
run_smc_std_icl.sh # standard prompting on SMC
run_smc_cot_icl.sh # grammar prompting on SMC

Here is the link to LLM cache, which you can download and place under the root directory. With the cache in place, the scripts above should reproduce the results in Table 3.

Molecule Generation

run_molgen_std_icl.sh # standard prompting on molecule generation
run_molgen_cot_icl.sh # grammar prompting on molecule generation

The results reported in the paper are obtained via azure/gpt-35-turbo-0301, though the current scripts use openai/gpt-4 for the setup without Azure.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
grammars		grammars
minEarley		minEarley
neural_lark		neural_lark
sample_prompts		sample_prompts
scripts		scripts
third_party		third_party
.gitignore		.gitignore
readme.md		readme.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Grammar Prompting for Domain-Specific Language Generation with Large Language Models

Setup

LLM setup

Setup for molecule generation experiments

Code structure and scripts

Semantic Parsing

Molecule Generation

About

Releases

Packages

Languages

berlino/grammar-prompting

Folders and files

Latest commit

History

Repository files navigation

Grammar Prompting for Domain-Specific Language Generation with Large Language Models

Setup

LLM setup

Setup for molecule generation experiments

Code structure and scripts

Semantic Parsing

Molecule Generation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages