FormatSpread or: How I learned to start worrying about prompt formatting

Prerequisites: download natural-instructions and instruction-induction.

Dump of all package versions used is stored in pip_list.txt. Many packages may not be necessary to run this code.

Executing FormatSpread and full evaluation of all candidates

main.py runs N+1 formats (N + original format) for a specific task. The formats are generated once and then reused for all future calls with the same --num_formats_to_analyze for a specific task, which ensures that all models are evaluated under the same set of formats. Evaluation (--evaluation_metric) may be done through string matching (exact_prefix_matching) or ranking between valid options (probability_ranking).

Example script for fully evaluating task158 from SuperNatural Instructions on 10 formats (9 + original) using 1-shot GPT3.5. The dataset will be reduced to 1000 samples:

python main.py \
    --task_filename task158_ \
    --dataset_name natural-instructions \
    --num_formats_to_analyze 9 \
    --batch_size_llm 2 \
    --num_samples 1000 \
    --model_name "gpt-3.5-turbo" \
    --n_shot 1 \
    --evaluation_metric exact_prefix_matching \
    --evaluation_type full

Example command for evaluating 5-shot LLaMA-2-70B on task158 from SuperNaturalInstructions using bits and bytes (--use_4bit) to be able to run the model on a single A100. This command uses FormatSpread (with 320 formats, B=20, E=40000) and ranking multiple choice options as accuracy metric.

Setting --num_formats_to_analyze 499 ensures usage of the set of selected 500 formats as defined in data/holistic_random_sample_task158_nodes_499_textdisabled.json, which users may want to share with other runs.

python main.py \
    --task_filename task158_ \
    --dataset_name natural-instructions \
    --num_formats_to_analyze 499 \
    --batch_size_llm 2 \
    --num_samples 1000 \
    --model_name "meta-llama/Llama-2-70b-hf" \
    --use_4bit \
    --n_shot 5 \
    --evaluation_metric probability_matching \
    --evaluation_type format_spread \
    --num_formats_format_spread 320 \
    --batch_size_format_spread 20 \
    --budget_format_spread 40000

Adding new tasks

Our scripts were generated to parse SuperNatural Instructions (--dataset_name natural-instructions) and Instruction Induction (--dataset_name instruction-induction) tasks.

If you want to run your own task, make the appropriate changes in data_loading.py using Instruction Induction as guidance. For reference of commonly used formats, refer to functions _one_text_field() and _two_text_field().

SuperNatural Instructions has some very complex formats that require using some extra params. We will add a tutorial (and simplify code) for them in the near future. Feel free to submit an issue if it is unclear to you how to should parse a specific format!

Evaluating on other models

Modify _load_model() accordingly. Current load model function is extremely hacky :)

Paper Citation

If you found the paper or datasets helpful, consider citing it:

@article{sclar2023quantifying,
  title={Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting},
  author={Sclar, Melanie and Choi, Yejin and Tsvetkov, Yulia and Suhr, Alane},
  journal={arXiv preprint arXiv:2310.11324},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
images		images
README.md		README.md
data_loading.py		data_loading.py
format_evaluation.py		format_evaluation.py
grammar_definition.py		grammar_definition.py
main.py		main.py
parsing_supernatural_instructions_tasks.py		parsing_supernatural_instructions_tasks.py
pip_list.txt		pip_list.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FormatSpread or: How I learned to start worrying about prompt formatting

Executing FormatSpread and full evaluation of all candidates

Adding new tasks

Evaluating on other models

Paper Citation

About

Releases

Packages

Languages

2024outsource10/formatspread

Folders and files

Latest commit

History

Repository files navigation

FormatSpread or: How I learned to start worrying about prompt formatting

Executing FormatSpread and full evaluation of all candidates

Adding new tasks

Evaluating on other models

Paper Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages