To install EvoMol, run the following commands in your terminal.
$ git clone https://github.com/jules-leguy/EvoMol.git # Clone EvoMol
$ cd EvoMol # Move into EvoMol directory
$ conda env create -f evomol_env.yml # Create conda environment
$ conda activate evomolenv # Activate environment
$ python -m pip install . # Install EvoMol
Launching a QED optimization for 500 steps. Beware, you need to activate the evomolenv conda environment when you use EvoMol.
from evomol import run_model
run_model({
"obj_function": "qed",
"optimization_parameters": {
"max_steps": 500
},
"io_parameters": {
"model_path": "examples/1_qed"
},
})
To run a model, you need to pass a dictionary describing the run to the run_model function. This dictionary can have up to 4 entries that are described in this section.
Default values are represented in bold.
The "obj_function"
attribute can take the following values.
- Implemented functions (see article) : "qed", "plogp", "norm_plogp", "sascore", "norm_sascore", "clscore", "homo", "lumo".
- A custom function evaluating a SMILES.
- A dictonary describing a multiobjective function containing the following entries.
"type"
: "linear_combination" (linear combination of the properties) or "product_sigm_lin" (product of the properties after passing a linear function and a sigmoid function)."functions"
: list of functions (string keys describing implemented functions or custom functions).- Specific to the linear combination.
"coef"
: list of coefficients.
- Specific to the product of sigmoid/linear functions
"a"
list of a coefficients for the ax+b linear function definition."b"
list of b coefficients for the ax+b linear function definition."lambda"
list of λ coefficients for the sigmoid function definition.
"guacamol"
for taking the goal directed GuacaMol benchmarks.
The "action_space_parameters"
attribute can be set with a dictionary containing the following entries.
"atoms"
: text list of available heavy atoms ("C,N,O,F,P,S,Cl,Br")."max_heavy_atoms"
: maximum molecular size in terms of number of heavy atoms (38)."substitution"
: whether to use substitute atom type action (True)."cut_insert"
: whether to use cut atom and insert carbon atom actions (True)."move_group"
: whether to use move group action (True).
The "optimization_parameters"
attribute can be set with a dictionary containing the following entries.
"pop_max_size"
: maximum population size (1000)."max_steps"
: number of steps to be run (1500)."k_to_replace"
: number of individuals replaced at each step (2)."problem_type"
: whether it is a maximization ("max") or minimization ("min") problem."max_steps"
: number of steps to be run (1500)."mutation_max_depth"
: maximum number of successive actions on the molecular graph during a single mutation (2)."mutation_find_improver_tries"
: maximum number of mutations to find an improver (50)."guacamol_init_top_100"
: whether to initialize the population with the 100 best scoring individuals of the GuacaMol ChEMBL subset in case of taking the GuacaMol benchmarks (True). The list of SMILES must be given as initial population."mutable_init_pop"
: if True, the individuals of the initial population can be freely mutated. If False, they can be branched but their atoms and bonds cannot be modified (True).
The "io_parameters"
attribute can be set with a dictionary containing the following entries.
"model_path"
: path where to save model's output data ("EvoMol_model")."smiles_list_init_path"
: path where to find the SMILES list describing the initial population (None: initialization of the population with a single methane molecule)."record_history"
: whether to save exploration tree data. Must be set to True to further draw the exploration tree (False)."save_n_steps"
: frequency (steps) of saving the data (100)."print_n_steps"
: frequency (steps) of printing current population statistics (1)."dft_working_dir"
: path where to save DFT optimization related files ("/tmp")."dft_cache_files"
: list of json files containing a cache of previously computed HOMO or LUMO values ([]).
Performing a QED optimization run of 500 steps, while recording the exploration data.
from evomol import run_model
model_path = "examples/2_large_tree"
run_model({
"obj_function": "qed",
"optimization_parameters": {
"max_steps": 500},
"io_parameters": {
"model_path": model_path,
"record_history": True
}
})
Plotting the exploration tree with solutions colored according to their score. Nodes represent solutions. Edges represent mutations that lead to an improvement in the population.
from evomol.plot_exploration import exploration_graph
exploration_graph(model_path=model_path, layout="neato")
Performing the experiment of mutating a fixed core of acetylsalicylic acid to increase its QED value.
from evomol import run_model
model_path = "examples/3_detailed_tree"
run_model({
"obj_function": "qed",
"optimization_parameters": {
"max_steps": 10,
"pop_max_size": 10,
"k_to_replace": 2,
"mutable_init_pop": False
},
"io_parameters": {
"model_path": model_path,
"record_history": True,
"smiles_list_init_path": "examples/acetylsalicylic_acid.smi"
}
})
Plotting the exploration tree including molecular drawings, scores and action types performed during mutations. Also plotting a table of molecular drawings.
from evomol.plot_exploration import exploration_graph
exploration_graph(model_path=model_path, layout="dot", draw_actions=True, plot_images=True, draw_scores=True,
root_node="O=C(C)Oc1ccccc1C(=O)O", legend_scores_keys_strat=["total"], mol_size=0.3,
legend_offset=(-0.007, -0.05), figsize=(20, 20/1.5), legends_font_size=13)
As the CLscore is dependent of prior
data to be computed, EvoMol needs to be given the data location.
To do so, the $SHINGLE_LIBS
environment variable must be set to the location of the shingle_libs folder that can
be downloaded here.
To perform DFT and Molecular Mechanics computation (necessary for HOMO and LUMO optimization), you need to bind Gaussian09 and OpenBabel with EvoMol.
To do so, the $OPT_LIBS
variable must point to a folder containing :
- run.sh : a script launching a DFT optimization with Gaussian09 of the input filepath given as parameter.
- obabel/openbabel-2.4.1 : directory containing an installation of OpenBabel 2.4.1. Make sure to also set OpenBabel's
$BABEL_DATADIR
environment variable to$OPT_LIBS/obabel/openbabel-2.4.1/data
.
To use EvoMol for GuacaMol goal directed benchmarks optimization using the best scoring molecules from their subset of ChEMBL as initial population, you need to :
- Download the ChEMBL subset.
- Give the path of the data using the
"smiles_list_init_path"
attribute. - Insure that the
"guacamol_init_top_100"
attribute is set to True.