Start the installation of the dependencies using
git clone [email protected]:jsun9003/PSPHunter.git
cd PSPhunter
# install pixi
curl -fsSL https://pixi.sh/install.sh | bash
# setup the pixi environment
pixi install
then start the prediction on your fasta
pixi run predict -i fasta.fa
Dissecting the functions and the regulatory mechanisms of intracellular phase separation is fundamental to understanding transcriptional control, cell fate transition and disease development. However, the driving residues, which impact phase separation the most and therefore is the key for the functional study of protein phase separation, remain largely undisclosed. We developed PSPHunter, a machine learning method for predicting driving residues in phase-separating proteins. Validation through in vivo and in vitro methods, including FRAP and saturation measurements, confirms PSPHunter's accuracy. Applying PSPHunter, we demonstrate that truncating just 6 driving residues in SOX2 and GATA3 significantly disrupts their phase separation properties. Furthermore, PSPHunter identified nearly 80% of the phase-separating proteins associated with diseases. Remarkably, frequently mutated pathological residues (glycine and proline) tend to localize within driving residues, exerting a significant influence on phase separation. PSPHunter thus emerges as a crucial tool to uncover driving residues, facilitating insights into phase separation mechanisms governing transcriptional control, cell fate transitions, and disease development.
--------------------------
- psiblast for PSSM, https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
- hhsuit for remote homology detection, https://github.com/soedinglab/hh-suite
- SPINE-D-local for Intrinsically disordered regions (IDRs) detection, http://zhouyq-lab.szbl.ac.cn/download/
- SNBRFinder for DNA and RNA binding prediction, http://ibi.hzau.edu.cn/SNBRFinder
- GPS5.0 for PTM prediction, https://gps.biocuckoo.cn/
- hhsuit for HMM (Steinegger M. et al. BMC Bioinformatics, 2019)
- scikit-learn (Pedregosa, F. et al. Journal of Machine Learning Research, 2011)
- Phase separation proteins used to construct PSPHunter are in the ./datasets folder.
- Trained models, including Sequence-based model, word2vec-based model, and Merged Model, are stored in the ./train/ directory.
- Code for generating all features is located in scripts/featureExtraction, encompassing both sequence and functional features. The merged output can be used for model training.
We will demonstrate the usage of PSPHunter using its word2vec sub-model (The complete model is stored in the 'trained model' folder.)
cd Test
perl ../scripts/Standalone/predict_proteinProb.pl -i seq.fasta
cd Test
perl ../scripts/Standalone/predict_DrivingRegion.pl -i seq.fasta -o outfile
cd Test
perl ../scripts/Standalone/predict_MutationEffect.pl -i seq.fasta -o outfile
We have developed a user-friendly website, accessible at http://psphunter.stemcellding.org/, to facilitate the use of PSPHunter. This platform enables the prediction of phase-separating proteins and their driving regions using only protein sequences as input. By offering the capability to assess the impact of mutations on phase separation, our users can promptly identify mutations that disrupt normal phase separation functions.
Cite our paper by
Sun, J., Qu, J., Zhao, C. et al. Precise prediction of phase-separation key residues by machine learning. Nature Communications 15, 2662 (2024). https://doi.org/10.1038/s41467-024-46901-9 (IF: 16.6)
Please contact [email protected] or raise an issue in the github repo with any questions about installation or usage.