Skip to content

Data-Designer/meds-torch

 
 

Repository files navigation

MEDS-torch

PyTorch Lightning Config: Hydra Template
Python PyPI Hydra Tests Code Quality Contributors Pull Requests License Documentation Status

Description

This repository provides a flexible suite for advanced machine learning over Electronic Health Records (EHR) using PyTorch, PyTorch Lightning, and Hydra for configuration management. The project ingests tensorized data from the MEDS_transforms repository, a robust system for transforming EHR data into ML ready sequence data. By employing a variety of tokenization strategies and sequence model architectures, this framework facilitates the development and testing of models that can perform.

Key features include:

  • Configurable ML Pipeline: Utilize Hydra to dynamically adjust configurations and seamlessly integrate with PyTorch Lightning for scalable training across multiple environments.
  • Advanced Tokenization Techniques: Explore different approaches to embedding EHR data in tokens that sequence model can reason over.
  • Supervised Models: Support for supervised training on arbitrary tasks defined on MEDS format data.
  • Transfer Learning: Pretrain via contrastive learning, forecasting, and other pre-training methods, and finetune to supervised tasks.

The goal of this project is to push the boundaries of what's possible in healthcare machine learning by providing a flexible, robust, and scalable sequence model tools that accommodate a wide range of research and operational needs. Whether you're conducting academic research or developing clinical applications with MEDS format EHR data, this repository offers tools and flexibility to develop deep sequence models.

Installation

Pip

PyPi

pip install meds-torch

git

# clone project
git clone [email protected]:Oufattole/meds-torch.git
cd meds-torch

# [OPTIONAL] create conda environment
conda create -n meds-torch python=3.12
conda activate meds-torch

# install pytorch according to instructions
# https://pytorch.org/get-started/

# install requirements
pip install -e .

How to run

Train model with default configuration

# train on CPU
python -m meds_torch.train trainer=cpu

# train on GPU
python -m meds_torch.train trainer=gpu

Train model with chosen experiment configuration from configs/experiment/

python -m meds_torch.train experiment=experiment_name.yaml

You can override any parameter from command line like this

python -m meds_torch.train trainer.max_epochs=20 data.batch_size=64

📌  Introduction

Why you might want to use it:

✅ Support different tokenization methods for EHR data

  • Triplet
  • Everything Is text
  • Everything Is a code

✅ MEDS data Supervised Learning and Transfer Learning Support

  • randomly initialize a model and train it in a supervised maner on your MEDS format medical data.
  • General Contrastive window Pretraining
  • Random EBCL Example
  • OCP Example
  • STraTS Value Forecasting

✅ Ease of Use and Reusability
Collection of useful EHR sequence modeling tools, configs, and code snippets. You can use this repo as a reference for developing your own models. Additionally you can easily add new models, datasets, tasks, experiments, and train on different accelerators, like multi-GPU.

Loggers

By default wandb logger is installed with the repo. Please install a different logger below if you wish to use it:

pip install neptune-client
pip install mlflow
pip install comet-ml
pip install aim>=3.16.2  # no lower than 3.16.2, see https://github.com/aimhubio/aim/issues/2550

Development Help

To run tests on 8 parallel workers run:

pytest -n 8

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 96.2%
  • Shell 3.5%
  • Makefile 0.3%