DRAGINUS

Instructions mostly copied from DRAGIN's repository.

DRAGINUS

Specific to DRAGINUS

This fork of DRAGIN uses substitutes of sentences and produces data with them in order to decide whether to activate or not the retrieval. The main different file from the original repository is generate.py, it contains separated functions to compute values such as probability, entropy and attention. I also adapted main.py to add parameters

The default substitutes are negations in config files. They can be changed to "ditto" and "substitution" for now (see fewshots)
The intolerance parameter can be added by specifying the option "-i" in the command line. It is set to 1.0 by default.
feel free to ask questions and if I have time, I might update config files and add the above parameters there.
TODO: Compare usefulness of with DRAGINUS and without RAG

Install environment

conda create -n dragin python=3.9
conda activate dragin
pip install torch==2.1.1
pip install -r requirements.txt
python -m spacy download en_core_web_sm

Run DRAGIN

Build Wikipedia index

Download the Wikipedia dump from the DPR repository using the following command:

mkdir -p data/dpr
wget -O data/dpr/psgs_w100.tsv.gz https://dl.fbaipublicfiles.com/dpr/wikipedia_split/psgs_w100.tsv.gz
pushd data/dpr
gzip -d psgs_w100.tsv.gz
popd

Use Elasticsearch to index the Wikipedia dump:

cd data
wget -O elasticsearch-7.17.9.tar.gz https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.17.9-linux-x86_64.tar.gz  # download Elasticsearch
tar zxvf elasticsearch-7.17.9.tar.gz
rm elasticsearch-7.17.9.tar.gz 
cd elasticsearch-7.17.9
nohup bin/elasticsearch &  # run Elasticsearch in background
cd ../..
python prep_elastic.py --data_path data/dpr/psgs_w100.tsv --index_name wiki  # build index

Download Dataset

For 2WikiMultihopQA:

Download the 2WikiMultihop dataset from its repository https://www.dropbox.com/s/ms2m13252h6xubs/data_ids_april7.zip?e=1. Unzip it and move the folder to data/2wikimultihopqa.

For StrategyQA:

wget -O data/strategyqa_dataset.zip https://storage.googleapis.com/ai2i/strategyqa/data/strategyqa_dataset.zip
mkdir -p data/strategyqa
unzip data/strategyqa_dataset.zip -d data/strategyqa
rm data/strategyqa_dataset.zip

For HotpotQA:

mkdir -p data/hotpotqa
wget -O data/hotpotqa/hotpotqa-dev.json http://curtis.ml.cmu.edu/datasets/hotpot/hotpot_dev_distractor_v1.json

For IIRC:

wget -O data/iirc.tgz https://iirc-dataset.s3.us-west-2.amazonaws.com/iirc_train_dev.tgz
tar -xzvf data/iirc.tgz
mv iirc_train_dev/ data/iirc
rm data/iirc.tgz

Run

The parameters that can be selected in the config file config.json are as follows:

parameter	meaning	example/options
`model_name_or_path`	Hugging Face model.	`meta-llama/Llama-2-13b-chat`
`method`	way to generate answers	`non-retrieval`, `single-retrieval`, `token`, `fix-sentence-retrieval`, `fix-length-retrieval`, `attn_entropy`
`dataset`	Dataset	`2wikimultihopqa`, `hotpotqa`, `iirc`, `strategyqa`
`data_path`	the folder where the data is located. If you use the above code to download the data, the folder will be `../data/dataset`.	`../data/2wikimultihopqa`
`fewshot`	Few shot.	6
`sample`	number of questions sampled from the dataset. `-1` means use the entire data set.	1000
`shuffle`	Whether to disrupt the data set. Without this parameter, the data set will not be shuffled.	`true`, `false`(without)
`generate_max_length`	maximum generated length of a question	64
`query_formulation`	way to generate retrieval question.	main: `direct`, `real_words` another options: `current_wo_wrong`, `current`, `forward_all`, `last_n_tokens`, `last_sentence`
`retrieve_keep_top_k`	number of reserved tokens when generating a search question	35
`output_dir`	The generated results will be stored in a folder with a numeric name at the output folder you gave. If the folder you give does not exist, one will be created.	`../result/2wikimultihopqa_llama2_13b`
`retriever`	type of retriever.	`BM25`, `SGPT`
`retrieve_topk`	number of related documents retained.	3
`hallucination_threshold`	threshold at which a word is judged to be incorrect.	1.2
`check_real_words`	Whether only content words participate in threshold judgment. Without this parameter, all words will be considered.	`true`, `false`(without)
`use_counter`	Whether to use counters to count the number of generation, retrieval, number of problems, number of tokens generated, and number of sentences generated. Without this parameter, the number will not be counted.	`true`, `false`(without)

If you are using BM25 as the retriever, you should also include the following parameters

Parameter	Meaning	example
`es_index_name`	The name of the index in the Elasticsearch	`wiki`

If you are using SGPT as the retriever, you should also include the following parameters.

Parameter	Meaning	example
`sgpt_model_name_or_path`	SGPT model	`Muennighoff/SGPT-1.3B-weightedmean-msmarco-specb-bitfit`
`sgpt_encode_file_path`	Folders to save SGPT encoding results	`../sgpt/encode_result`
`passage_file`	Path to the Wikipedia dump	`../data/dpr/psgs_w100.tsv`

Here is the config file for using our approach to generate answers to the top 1000 questions of 2WikiMultihopQA using the model Llama-2-13b-chat.

{
    "model_name_or_path": "meta-llama/Llama-2-13b-chat",
    "method": "attn_entropy",
    "dataset": "2wikimultihopqa",
    "data_path": "../data/2wikimultihopqa",
    "generate_max_length": 64,
    "query_formulation": "real_words",
    "retrieve_keep_top_k": 40,
    "output_dir": "../result/2wikimultihopqa_llama2_13b",
    "retriever": "BM25",
    "retrieve_topk": 3,
    "hallucination_threshold": 1.2,
    "fewshot": 6,
    "sample": 1000,
    "shuffle": false,
    "check_real_words": true,
    "es_index_name": "34051_wiki",
    "use_counter": true
}

The config files of the main experiments in the paper are all in the config/.

When you have prepared the configuration file, run the following command in the src directory:

python main.py -c path_to_config_file

Evaluate

Upon completion of the program, you will find a folder named with a numerical identifier within your designated output directory. This identifier corresponds to the sequential order of runs within that folder, allowing for easy organization of multiple executions. Additionally, during the runtime, you will receive prompts indicating the specific folder where the current run's results will be saved.

Assume that the results of your run are saved in the result/2wikimultihopqa_llama2_13b/1，run the following command in the src directory to evaluate:

python evaluate.py --dir path_to_folder(result/2wikimultihopqa_llama2_13b/1)

After the evaluation program has finished running, the results folder will contain the following files:

result/
└── 2wikimultihopqa_llama2_13b/
    └── 1/
        ├── config.json # the configuration file you use when running
        ├── details.txt # Evaluation details
        ├── output.txt # Original output file, which will contain statistical results if use_counter is set to true
        └── result.tsv # Evaluation results

The elements in output.txt are as follows:

{
    "qid": "question id", 
    "prediction": "origin outputs", 
    "retrieve_count": 0, 
    "generate_count": 1, 
    "hallucinated_count": 0, 
    "token_count": 64, 
    "sentence_count": 5
}

The elements in details.txt are as follows:

{
    "qid": "question id", 
    "final_pred": "the output used for evaluation after extraction", 
    "EM": "EM result", 
    "F1": "F1 result"
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DRAGINUS

Specific to DRAGINUS

Install environment

Run DRAGIN

Build Wikipedia index

Download Dataset

Run

Evaluate

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
config		config
result		result
src		src
.gitignore		.gitignore
README.md		README.md
prep_elastic.py		prep_elastic.py
requirements.txt		requirements.txt

QuentinLoriaux/DRAGINUS

Folders and files

Latest commit

History

Repository files navigation

DRAGINUS

Specific to DRAGINUS

Install environment

Run DRAGIN

Build Wikipedia index

Download Dataset

Run

Evaluate

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages