Transformer-based deep learning integrates multi-omic data with cancer pathways. Cai, et al., 2023
DeePathNet is a transformer-based deep learning tool that integrates multi-omic data to improve predictions for cancer subtyping and drug response. It combines pathway-level information with deep learning to enhance precision in oncology research.
- Multi-omic data integration using a transformer architecture
- Support for pathway-level feature importance analysis
- Pre-trained models for cancer type classification and drug response prediction
- Cross-validation and independent test scripts for model evaluation
To ensure compatibility and avoid potential issues, it is recommended to use Python 3.8 and PyTorch 1.10. Below are detailed instructions to set up the coding environment:
-
Install Anaconda
- Follow the Anaconda installation guide for your operating system.
-
Create a Virtual Environment
- Once Anaconda is installed, create and activate a virtual environment:
conda create -n deepathnet_env python=3.8 anaconda conda activate deepathnet_env
- Once Anaconda is installed, create and activate a virtual environment:
-
Install PyTorch
- Install the appropriate version of PyTorch based on your hardware:
- For CUDA-enabled systems:
pip install torch==1.10.0 torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113
- For CPU-only systems:
pip install torch==1.10.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
- For CUDA-enabled systems:
- Install the appropriate version of PyTorch based on your hardware:
-
Install Additional Dependencies
- DeePathNet requires several Python packages to run effectively. Create a
requirements.txt
file with the following contents:torch==1.10.0 torchvision torchaudio numpy pandas scikit-learn matplotlib seaborn json5 scipy tqdm
- Install all dependencies:
pip install -r requirements.txt
- DeePathNet requires several Python packages to run effectively. Create a
DeePathNet provides pre-trained models and test datasets to facilitate model evaluation:
-
Download Pre-trained Models and Test Data
- Access pre-trained models and test data from the Figshare repository.
- Save the files to local directories, such as
models/
for models anddata/
for test data.
-
Configure Paths for Models and Test Data
- Update the paths in the configuration file, such as:
{ "model": "DeePathNet", "pretrained_model_path": "models/deepathnet_pretrained.pth", "test_data_path": "data/test_data.csv", "output_dir": "results/", ... }
- Ensure that the paths in
pretrained_model_path
andtest_data_path
are correctly set to the local files.
- Update the paths in the configuration file, such as:
-
Load and Run Pre-trained DeePathNet Model
- DeePathNet can be run using the
deepathnet_independent_test.py
script, which loads a pre-trained model and performs inference:python scripts/deepathnet_independent_test.py configs/sanger_train_ccle_test_gdsc/mutation_cnv_rna_prot/deepathnet_mutation_cnv_rna_prot.json
- This command will load the pre-trained model and run inference on the specified test dataset. The results will be saved to the designated output directory as specified in the configuration file.
- DeePathNet can be run using the
The following example demonstrates how to generate predictions using DeePathNet for various tasks:
-
Predict Drug Response
- To predict drug response (IC50 values), run:
python scripts/deepathnet_independent_test.py configs/sanger_train_ccle_test_gdsc/mutation_cnv_rna_prot/deepathnet_mutation_cnv_rna_prot.json
- This script reads the configuration, loads the pre-trained model, and performs inference on the test dataset.
- To predict drug response (IC50 values), run:
-
Classify Cancer Types
- For cancer type classification:
python scripts/deepathnet_cv.py configs/tcga_all_cancer_types/mutation_cnv_rna/deepathnet_mutation_cnv_rna.json
- This script performs cross-validation using a specified dataset and configuration file.
- For cancer type classification:
-
Breast Cancer Subtyping
- For breast cancer subtyping:
python scripts/deepathnet_independent_test.py configs/tcga_train_cptac_test_brca/cnv_rna/deepathnet_cnv_rna.json
- For breast cancer subtyping:
- The predictions generated by DeePathNet (e.g., IC50 values or cancer subtypes) are saved in the output directory defined in the configuration file.
- The output includes performance metrics, predictions, and optional feature importance scores.
To compare DeePathNet with baseline models like moCluster and mixOmics, use the provided scripts:
- moCluster Baseline Comparison
python scripts/baseline_ec_cv.py configs/sanger_gdsc_intersection_noprot/mutation_cnv_rna/moCluster_rf_allgenes_drug_mutation_cnv_rna.json
- Cancer Type Baseline Comparison
python scripts/cancer_type_baseline_23cancertypes.py
DeePathNet supports pathway-level and gene-level feature importance analysis:
-
Pathway-level Feature Importance
python scripts/transformer_explantion_cancer_type.py configs/tcga_brca_subtypes/mutation_cnv_rna/deepathnet_allgenes_mutation_cnv_rna.json
-
Gene-level Feature Importance
python scripts/transformer_shap_cancer_type.py configs/tcga_brca_subtypes/mutation_cnv_rna/deepathnet_allgenes_mutation_cnv_rna.json
The input files for DeePathNet should have samples as rows and features as columns. Features should be formatted with an underscore separating the gene name and the omic data type (e.g., GeneA_RNA
). For example:
Sample | GeneA_RNA | GeneA_PROT | GeneB_RNA | GeneB_PROT |
---|---|---|---|---|
Cell_lineA | 10 | 8 | 2 | 3 |
Cell_lineB | 15 | 12 | 1 | 2 |
Cell_lineC | 5 | 3 | 10 | 8 |
The output includes predictions such as:
- Drug response (IC50 values)
- Cancer types/subtypes
- Feature importance scores for interpretability
We recommend using the specified Python and PyTorch versions for compatibility. If issues arise, please open a ticket in the Issues tab with details about your setup, the steps you followed, and error logs.
For more information, please contact the study authors via the associated publication.