Skip to content

The official repository for the CBM paper "Deep Reinforcement Learning Enables Better Bias Control in Benchmark for Virtual Screening".

License

Notifications You must be signed in to change notification settings

taoshen99/MUBDsyn

Repository files navigation

MUBD-DecoyMaker 3.0: Making Maximal Unbiased Benchmarking Data Sets with Deep Reinforcement Learning

Introduction

MUBD-DecoyMaker 3.0 is a brand-new computational software to make Maximal Unbiased Benchmarking Data Sets (MUBD) for in silico screening. Compared with our earlier two versions, i.e. MUBD-DECOYMAKER (Pipeline Pilot-based version, or MUBD-DecoyMaker 1.0) and MUBD-DecoyMaker 2.0, MUBD-DecoyMaker 3.0 has two noteworthy features:

  1. Virtual molecules generated by recurrent neural netwrok (RNN)-based molecular generator with reinforcement learning (RL), instead of chemical library molecules, constitue the unbiased decoy set (UDS) component of MUBD.

  2. The criteria (or rule) for an ideal decoy previously defined in the earlier versions are integrated into a new scoring function for RL to fine-tune the generator.

Below is how to implement and run MUBD-DecoyMaker3.0.

Figure from manuscript

Requirements

As REINVENT is used to make virtual decoys of MUBD 3.0, users are required to install the conda environment reinvent.v3.2. Please note we have modified the packages reinvent_chemistry and reinvent_scoring here in order to include our scoring functions specific for MUBD:

  1. Clone this repository and navigate to it(指的是否是进入该目录).
  2. Merge modifications to original reinvent_chemistry and reinevnt_scoring :
$ cp -r reinvent_chemistry/ reinvent_scoring/ ~/anaconda3/envs/reinvent.v3.2/lib/python3.7/site-packages

create a conda environment called MUBD3.0 (for preprocessing and postprocessing):

$ conda env create -f MUBD3.0.yml

Usage

ACM Agonists is used as a test case to demonstrate how to build MUBD-ACM-AGO data set with MUBD-DecoyMaker3.0. All the test files are included in the directory of resources.

Get unbiased ligand set (ULS)

Run get_ligands.py to process the raw ligand set. This script takes raw ligands in the representation of SMILES raw_actives.smi as input and outputs unbiased ligand set Diverse_ligands.csv. Another four property profiles Diverse_ligands_PS.csv, Diverse_ligands_PS_maxmin.csv, Diverse_ligands_sims_maxmin.txt and Diverse_ligands_len.txt are also recorded. Please use the --cure option to preprocess the SMILES if no curation is performed before. (?1. cure是写错了cura还是专门用cure让用户理解该选项是用于未准备分子的“救治”?2. curation的内容要列一下,包括哪些?)

$ conda activate MUBD3.0
(MUBD3.0) $ python get_ligands.py

Generate virtual decoys

mk_config.py writes out the configuration for MUBD3.0 virtual decoy generation. In order to automatically set up the configuration for each ligand and proceed to the next ligand, we provide gen_decoys.sh. Please replace the </path/to/REINVENT> and </path/to/MUBD3.0> in scripts with user defined directories.

$ mkdir output
$ chmod +x ./gen_decoys.sh
$ conda activate reinvent.v3.2
(reinvent.v3.2) $ ./gen_decoys.sh

Get unbiased decoy set (UDS)

After decoy generation, each potential decoy set for ligand_$idx is stored in output/ligand_$idx/results/scaffold_memory.csv. Decoy refinement including SMILES curation and molecular clustering are performed to get unbiased decoy set Final_decoys.csv. We provide process_decoys.sh to automatically run agglomerative_clustering.py and pool_decoys.py.

$ chmod +x ./process_decoys.sh
$ conda activate MUBD3.0
(MUBD3.0) $ ./process_decoys.sh

Validation

Basic validation is conducted based on four metrics. Please go through the notebook basic_validation.ipynb for more details.

$ conda activate MUBD3.0
(MUBD3.0) $ jupyter notebook

About

The official repository for the CBM paper "Deep Reinforcement Learning Enables Better Bias Control in Benchmark for Virtual Screening".

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published