This repository provides a flexible suite for advanced machine learning over Electronic Health Records (EHR) using PyTorch, PyTorch Lightning, and Hydra for configuration management. The project ingests tensorized data from the MEDS_transforms repository, a robust system for transforming EHR data into ML ready sequence data. By employing a variety of tokenization strategies and sequence model architectures, this framework facilitates the development and testing of models that can perform.
Key features include:
- Configurable ML Pipeline: Utilize Hydra to dynamically adjust configurations and seamlessly integrate with PyTorch Lightning for scalable training across multiple environments.
- Advanced Tokenization Techniques: Explore different approaches to embedding EHR data in tokens that sequence model can reason over.
- Supervised Models: Support for supervised training on arbitrary tasks defined on MEDS format data.
- Transfer Learning: Pretrain via contrastive learning, forecasting, and other pre-training methods, and finetune to supervised tasks.
The goal of this project is to push the boundaries of what's possible in healthcare machine learning by providing a flexible, robust, and scalable sequence model tools that accommodate a wide range of research and operational needs. Whether you're conducting academic research or developing clinical applications with MEDS format EHR data, this repository offers tools and flexibility to develop deep sequence models.
PyPi
pip install meds-torch
git
# clone project
git clone [email protected]:Oufattole/meds-torch.git
cd meds-torch
# [OPTIONAL] create conda environment
conda create -n meds-torch python=3.12
conda activate meds-torch
# install pytorch according to instructions
# https://pytorch.org/get-started/
# install requirements
pip install -e .
Train model with default configuration
# train on CPU
python -m meds_torch.train trainer=cpu
# train on GPU
python -m meds_torch.train trainer=gpu
Train model with chosen experiment configuration from configs/experiment/
python -m meds_torch.train experiment=experiment_name.yaml
You can override any parameter from command line like this
python -m meds_torch.train trainer.max_epochs=20 data.batch_size=64
Why you might want to use it:
✅ Support different tokenization methods for EHR data
- Triplet
- Everything Is text
- Everything Is a code
✅ MEDS data Supervised Learning and Transfer Learning Support
- randomly initialize a model and train it in a supervised maner on your MEDS format medical data.
- General Contrastive window Pretraining
- Random EBCL Example
- OCP Example
- STraTS Value Forecasting
✅ Ease of Use and Reusability
Collection of useful EHR sequence modeling tools, configs, and code snippets. You can use this repo as a reference for developing your own models. Additionally you can easily add new models, datasets, tasks, experiments, and train on different accelerators, like multi-GPU.
By default wandb
logger is installed with the repo. Please install a different logger below if you wish to use it:
pip install neptune-client
pip install mlflow
pip install comet-ml
pip install aim>=3.16.2 # no lower than 3.16.2, see https://github.com/aimhubio/aim/issues/2550
To run tests on 8 parallel workers run:
pytest -n 8