Skip to content
forked from pat-jj/KARE

Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval

Notifications You must be signed in to change notification settings

Data-Designer/KARE

 
 

Repository files navigation

Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval

1. Prepare EHR data

cd ehr_prepare
python ehr_data_prepare.py
python sample_prepare.py

(Baseline Models Evaluation)

```bash
baselines/baseline_play.ipynb
```

2. Knowledge Graph (KG) Construction

General Preparation:

cd kg_construct
python query_data_prepare.py

Extract KG from PubMed:

Preparation:

cd kg_construct/pubmed_index
python download_pubmed.py
python embed_pubmed.py
python convert_dat.py

Construct KG:

cd kg_construct
python pubmed_source.py

Extract KG from UMLS:

Our processed UMLS KG: Google Drive

cd kg_construct
python umls_source.py

Extract KG from LLM:

cd kg_construct
python llm_source.py

Semantic Clustering:

After combining all the KGs into kg_raw.txt under "graph" folder (in project root), run:

cd kg_construct
python refine_kg.py

3. KG Community Detection and Indexing (Summarization)

cd kg_index
python structure_partition_leiden.py

4. Patient Context Construction and Augmentation

cd patient_context
python base_context.py
python get_emb.py
python sim_patient_ret_faiss.py
python augment_context.py

5. Reasoning Chain Generation

cd prediction
python data_prepare.py
python split_task.py

6. LLM Fine-tuning

Please follow https://github.com/huggingface/alignment-handbook to build the environment for fine-tuning. Start the fine-tuning for the specific task (mortality/readmission):

sh finetune/sft_{task}.sh

7. Prediction & Evaluation

# For the prediction
cd prediction
cd llm_inference
python generate.py

# For the evaluation
cd prediction
python eval.py

* A Cost-Effective/Naive Approach (Skipping Step 1-3) to Validate Our Results


This approach will directly retrieve the knowledge summaries from an LLM, and use them to construct the input and output for LLM fine-tuning. However, the result would not be as good as the original method, but can still be used to validate the philosophy underlying our method.

(This approach is suitable for those who do not want to spend money on building their own context-aware and concept-specific KG.)

**Major Advantange** over our method: 
(1) Much lower cost than our original implementation.
(2) No need to tune the hyperparameters for the context augmentation.

**Major Disadvantage**: 
(1) Relatively lower performance as it only uses the knowledge from LLM. 
(2) For real-world application, you will need to prepare the knowledge from the **same LLM** for the every new sample during the inference -> higher cost if the application is long-term.
cd prediction
python dp_new.py

Others

To call LLM APIs in this work, you need to

Enter AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_DEFAULT_REGION in apis_example/claude_api.py to call Claude APIs.

Enter your OpenAI API key in apis_example/openai.key to call OpenAI APIs.

Then, you need to rename apis_example to apis, and put it under each folder where you need to call APIs.


Cite KARE

@misc{jiang2024reasoningenhancedhealthcarepredictionsknowledge,
      title={Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval}, 
      author={Pengcheng Jiang and Cao Xiao and Minhao Jiang and Parminder Bhatia and Taha Kass-Hout and Jimeng Sun and Jiawei Han},
      year={2024},
      eprint={2410.04585},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2410.04585}, 
}

Thank you for your interest in our work!

About

Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 98.8%
  • Jupyter Notebook 1.1%
  • Shell 0.1%