First, a motivating example:
Suppose your original query for ChatGPT is "Generate a cover letter for a research internship position at [insert research institution here], my name is Siyan Li, here is my CV: [insert well-formatted CV texts here]."; the italicized parts represent Personally Identifiable Information (PII). We have limited control over how our data is used once the servers hosting ChatGPT gains access to it. Therefore, a good way to be privacy-conscious is to prevent your PII to be exposed to ChatGPT in the first place.
Ideally, we want a system we use to prompt a cloud-based LLM such that:
- You receive high-quality responses from the system, which interacts with this cloud-based LLM
- As little of your PII is leaked to this cloud-based LLM as possible
So, we built PAPILLON.
What is PAPILLON?
PAPILLON is a semi-local framework where trusted but weaker models (e.g. locally-hosted Llama-3 models) can use untrusted but more powerful models as tools in order to preserve user inference-time privacy.
We have an end-to-end tutorial notebook for defining and optimizing your own PAPILLON module using our newest version of PUPA dataset. Please click on the Colab badge below, or refer to the papillon_tutorial.ipynb
file.
We now also have a USER INTERFACE! Please navigate to the papillon_ui/
folder. Here is a video tutorial for the UI!
Please refer to the papillon_v1.0
branch for the original version of our code and data to reproduce the results.
We are working on making PAPILLON a PyPI package. Until then, you would unfortunately need to clone the repository first.
To create a conda environment to run PAPILLON in, run the following command:
conda create -f environment.yml
conda activate papillon
Provide your OpenAI API Key:
export OPENAI_API_KEY="<YOUR_OPENAI_API_KEY>"
To use the DSPy-optimized PAPILLON pipeline, you would need to do the following:
- Host your trusted model of choice on a server
- Supply private user prompts
- Run the provided PAPILLON pipeline or optimize new PAPILLON pipelines on new data
We will build a Flask server and UI for PAPILLON for easy use in the future. For now, you would have to manually enter the private user query, or read queries from a CSV file.
You can interact with PAPILLON pipelines directly using the papillon/run_papillon_interactive.py
script.
We currently have optimized PAPILLON prompts for the following models:
- Llama-3.2-1B-Instruct
- Llama-3.2-3B-Instruct
- Llama-3-8B-Instruct
- Llama-3.1-8B-Instruct
- Mistral-7B-Instruct
- Mistral-Small
There are multiple options to host these models. For Llama-3.2, the current official method of hosting is via VLLM. You can also host the 1B and 3B models on Ollama. The other models can be hosted through SGLang. Here, we use SGLang as an example:
python -m sglang.launch_server --model-path meta-llama/Llama-3.1-8B-Instruct --port <PORT_NUMBER>
This script should display a terminal prompt that allows you to type in your user queries manually, and then print out the corresponding PAPILLON-synthesized privacy-preserving prompt and final PAPILLON responses.
cd papillon
python3 run_papillon_interactive.py --port <PORT_NUMBER> --model_name <MODEL_NAME>
You may use PUPA data or your new data, formatted according to the PUPA format (see pupa
), to optimize PAPILLON pipelines with different local and API-based model ensembles.
cd papillon
python3 run_dspy_optimization_llama.py --port <PORT_NUMBER> --prompt_output "output.json" --data_file "../pupa/PUPA_New.csv"
This will print out the average quality and leakage scores according to the LLM judge defined in papillon/llm_judge.py
.
cd papillon
python3 evaluate_papillon.py --port <PORT_NUMBER> --model_name <MODEL_NAME> (e.g. meta-llama/Llama-3.1-8B-Instruct)
You can find PUPA on Huggingface. You can also see the pupa
directory for raw CSV files for PUPA-TNB and PUPA-New datasets.
If you have new user-assistant interaction data containing private information and you want to process it to the PUPA data format, you can use code in the pupa
directory to scaffold this process.
- Add version of DSPy from the original code base for optimization and inference for reproducibility; currently, PAPILLON is compatible with the newest version of DSPy for inference (interactive mode).
- Complete PUPA data processing code.
- Add PUPA to Huggingface.
- Build a Flask server and simple UI for PAPILLON.
- Make PAPILLON installable via PyPI.
Here is the original paper.
If you use PAPILLON in your work, please consider citing it:
@article{siyan2024papillon,
title={PAPILLON: PrivAcy Preservation from Internet-based and Local Language MOdel ENsembles},
author={Siyan, Li and Raghuram, Vethavikashini Chithrra and Khattab, Omar and Hirschberg, Julia and Yu, Zhou},
journal={arXiv preprint arXiv:2410.17127},
year={2024}
}