Architecture of WD3QNE algorithm. The real-time treatment process of the WD3QNE agent. Patients' vital signs and drug level are monitored and recorded in real time. Then construct continuous state space and discrete action space. The agent makes decisions based on states, actions, and clinician’s experience. Finally, patients are treated dynamically.
If you use code or concepts available in this repository, we would be grateful if you would: cite the paper:
GB/T 7714: Wu X D, Li R C, He Z, et al. A value-based deep reinforcement learning model with human expertise in optimal treatment of sepsis[J]. npj Digital Medicine, 2023, 6(1): 15.
MLA: Wu, XiaoDan, et al. "A value-based deep reinforcement learning model with human expertise in optimal treatment of sepsis." npj Digital Medicine 6.1 (2023): 15.
APA: Wu, X., Li, R., He, Z., Yu, T., & Cheng, C. (2023). A value-based deep reinforcement learning model with human expertise in optimal treatment of sepsis. npj Digital Medicine, 6(1), 15.
@article{wu2023value, title={A value-based deep reinforcement learning model with human expertise in optimal treatment of sepsis}, author={Wu, XiaoDan and Li, RuiChang and He, Zhen and Yu, TianZhi and Cheng, ChangQing}, journal={npj Digital Medicine}, volume={6}, number={1}, pages={15}, year={2023}, publisher={Nature Publishing Group UK London} }
The open-source MIMIC-III data used in this present study can be retrieved from https://physionet.org/content/mimiciii/1.4/. The open-source eICU-RI data: http://eicu-crd.mit.edu/