Skip to content

Marcus-Friis/research-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Leveraging LLMs to create semantically enriched citation networks

research-project

By Mads Høgenhaug, Marcus Friis, Morten Pedersen.

This repo contains all code and data used for producing the accompanying paper. The created Cit-Hep-Ph-Aug dataset can be found in:

The base dataset augmented is Arxiv HEP-PH (high energy physics phenomenology) citation graph.

Abstract embeddings

Abstract embeddings are created using Llama2 13B. Specifically, we use openbuddy-llama2-13b-v11.1.Q5_K_M, and interact with it through llama-cpp-python.

Embeddings are created using embedder.py, which iterate through abstracts from arxiv.csv, embeds prompts and stores them in embeds.json. The core of the script is the embedding part. Embeddings are created as follows

from llama_cpp import Llama
llama_kwargs = {
    model_path: 'YOUR_MODEL_PATH'
    embedding: True,
    ...
}
llama = Llama(**llama_kwargs)
prompt = 'SAMPLE PROMPT'
embedding = llama.embed(prompt)

We ran this code through the IT University of Copenhagen's HPC cluster, which allowed us to use the 13B parameter model.

Edge labeling

Edge labels were achieved with ChatGPT, interacting with it through its API. The code is in chatgpt.py. To facilitate batch prompts, we implement the prompting in an asynchronous manner. The implemented class Chad handles the asynchronous requests, with logic for handling timeouts, retrys and various relevant errors. To use it, you must insert your API key in config.ini. When running, you specify how many edges you want to label through the CLI. For more modifications, changes has to be made to the script, or the Chad class can be imported into another script. It can be run as follows:

cd src
python chatgpt.py start_index end_index

Where start_index and end_index indicate the range of edges you want to label.

Producing the results

All results are produced by scripts located in src.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages