This repository contains the starter code for the LLM-Merging competition.
- Please do not specify any device_id in the code because the device_id might not hold in our setup. If you need to specify a device_id in your setup, one solution is to use environment variables like
export CUDA_VISIBLE_DEVICES=0
- Please do not specify any filepaths because they may not be the same in our setup. If you need to specify the HuggingFace cache, one solution is to use environment variables like
export HUGGINGFACE_HUB_CACHE=/tmp/
and then access this path in Python via
path=os.environ["HUGGINGFACE_HUB_CACHE"]
- When running
tar
on this repoLLM-Merging
to submit it, please ensure this directory is calledLLM-Merging
and not renamed to any directories. This can cause issues when evaluating your submissions.
The library was tested on CUDA 10.1 on an A6000.
conda env create -f environment.yml --name llm-merging
conda activate llm-merging
export PYTHONPATH=`pwd`
Authentication tokens are required for certain models like Llama2, which require users to agree to specific terms. You can find the authentication token here.
export HF_AUTH_TOKEN=""
Do not modify any files other than the new file you create, and the files below in the manner described. Doing so can result in the grounds for invalidating your submission. If you need to change code in other ways, feel free to open a pull request.
-
To add a new merging method, create a new file in
llm_merging/merging
.This file should extend
llm_merging/merging/Merges
and implement__init__()
andmerge()
functions. Seellm_merging/merging/FlanT5Avg.py
,llm_merging/merging/LlamaAvg.py
, andllm_merging/merging/TinyLlamaAvg.py
for examples. -
Add the new merging method to the dictionary returned by
all_merge_handlers()
inllm_merging/main.py
-
Add the new module to
llm_merging/merging/__init__.py
-
Add any additional required libraries to
setup.py
.
python llm_merging/setup.py install
python llm_merging/main.py -m {merging_method}
The validation dataset (consisting of CosmosQA and XSum) is mainly included to ensure the merging method (with evaluation on those datasets) runs in under the 1-hour time limit. Our results on llama_avg
are {"cosmos_qa": {"accuracy": 0.234}, "xsum": {"rouge1": 0.123, "rouge2": 0.023, "rougeL": 0.093, "rougeLsum": 0.102}}
, which run in about 25 minutes on our A6000.
You must submit the output file on Kaggle, and the model files via the instructions below.
First, generate the output file, using the input dataset file found in data/test.csv
. Name your output file submission.csv
.
To submit to Kaggle, go to our Kaggle competition site and click Submit Prediction
, uploading your submission.csv
.
Next, tar this repo for submission:
tar -cvf {merging_method}.tar LLM-Merging
Submit the tar file using this form
The leaderboard being used is on our Kaggle competition site.
The leaderboard's standings are not final.The final results of the competition will be calculated after the conclusion of the competition. At that point, we will release the inputs for our final held out evaluation, and you will have a week to run your model code on this input. The input will be in the same format as the test.csv
file in this competition. You will then be responsible for submitting this final output file to us. For all top placers, we will be verifying that the code you submitted via the form before the closing of the competition does indeed yield your final submission csv.
The old leaderboard of the submitted solutions can be found here.