Evaluation Pipeline

The repository contains the code for the Evaluation Pipeline for Consistency in Summarization.

The evaluation is performed by leveraging the capabilities of LLMs to spot inconsistencies in the generated summaries through a Question-Answering (QA) system.

The experiment is run on a dataset of 1000 documents and their corresponding summaries from Vodafone Italy's Customer Service. For confidentiality reasons, the dataset is not included in the repository.

Methodology

PaLM 2 Text and Gemini 1.5 Pro are the models employed in this evaluation process. Three runs of the evaluation are performed: The first two runs are performed by using PaLM 2 Text and Gemini 1.5 Pro, respectively. The third run is performed by using Gemini 1.5 Pro with a different and more optimized pipeline.

The pipeline is as follows:

The input document and the generated summary are fed to the QA system.
The QA system generates a set of questions based on the input document.
The QA system generates a set of answers based on the generated summary.
The QA system answers the questions generated from the document, based on the generated summary.
The QA system answers the questions generated from the summary, based on the input document.
The QA system compares the answers generated based on the document and the summary from the questions generated from the document.
The QA system compares the answers generated based on the document and the summary from the questions generated from the summary.
The QA system compares the answers generated from the questions generated from the document and the summary.
Scores are generated based on the comparison of the answers.

The Evaluation Pipeline is implemented in Python and the flow of the pipeline displayed below:

Structure of the Repository

The repository is structured as follows:

📁 Evaluation Pipeline
├── Evaluation_Pipeline.ipynb
├── global_variables.py
├── models.ipynb
├── LLM_evals_flow.png
└── utils.py

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Evaluation_Pipeline.ipynb		Evaluation_Pipeline.ipynb
LLM_evals_flow.png		LLM_evals_flow.png
README.md		README.md
global_variables.py		global_variables.py
models.py		models.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Evaluation Pipeline

Methodology

Structure of the Repository

About

Releases

Packages

Languages

morresiedoardo/EvaluationPipeline

Folders and files

Latest commit

History

Repository files navigation

Evaluation Pipeline

Methodology

Structure of the Repository

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages