Skip to content

Evolutionary algorithm for molecular properties optimization

License

Notifications You must be signed in to change notification settings

jyryu3161/EvoMol

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EvoMol

Install

To install EvoMol, run the following commands in your terminal.

$ git clone https://github.com/jules-leguy/EvoMol.git     # Clone EvoMol
$ cd EvoMol                                               # Move into EvoMol directory
$ conda env create -f evomol_env.yml                      # Create conda environment
$ conda activate evomolenv                                # Activate environment
$ python -m pip install .                                 # Install EvoMol

Quickstart

Launching a QED optimization for 500 steps. Beware, you need to activate the evomolenv conda environment when you use EvoMol.

from evomol import run_model
run_model({
    "obj_function": "qed",
    "optimization_parameters": {
        "max_steps": 500
    },
    "io_parameters": {
        "model_path": "examples/1_qed"
    },
})

Model parametrization

To run a model, you need to pass a dictionary describing the run to the run_model function. This dictionary can have up to 4 entries that are described in this section.

Default values are represented in bold.

Objective function

The "obj_function" attribute can take the following values.

  • Implemented functions (see article) : "qed", "plogp", "norm_plogp", "sascore", "norm_sascore", "clscore", "homo", "lumo".
  • A custom function evaluating a SMILES.
  • A dictonary describing a multiobjective function containing the following entries.
    • "type" : "linear_combination" (linear combination of the properties) or "product_sigm_lin" (product of the properties after passing a linear function and a sigmoid function).
    • "functions" : list of functions (string keys describing implemented functions or custom functions).
    • Specific to the linear combination.
      • "coef" : list of coefficients.
    • Specific to the product of sigmoid/linear functions
      • "a" list of a coefficients for the ax+b linear function definition.
      • "b" list of b coefficients for the ax+b linear function definition.
      • "lambda" list of λ coefficients for the sigmoid function definition.
  • "guacamol" for taking the goal directed GuacaMol benchmarks.

Search space

The "action_space_parameters" attribute can be set with a dictionary containing the following entries.

  • "atoms" : text list of available heavy atoms ("C,N,O,F,P,S,Cl,Br").
  • "max_heavy_atoms": maximum molecular size in terms of number of heavy atoms (38).
  • "substitution": whether to use substitute atom type action (True).
  • "cut_insert": whether to use cut atom and insert carbon atom actions (True).
  • "move_group": whether to use move group action (True).

Optimization parameters

The "optimization_parameters" attribute can be set with a dictionary containing the following entries.

  • "pop_max_size" : maximum population size (1000).
  • "max_steps" : number of steps to be run (1500).
  • "k_to_replace" : number of individuals replaced at each step (2).
  • "problem_type" : whether it is a maximization ("max") or minimization ("min") problem.
  • "max_steps" : number of steps to be run (1500).
  • "mutation_max_depth" : maximum number of successive actions on the molecular graph during a single mutation (2).
  • "mutation_find_improver_tries" : maximum number of mutations to find an improver (50).
  • "guacamol_init_top_100" : whether to initialize the population with the 100 best scoring individuals of the GuacaMol ChEMBL subset in case of taking the GuacaMol benchmarks (True). The list of SMILES must be given as initial population.
  • "mutable_init_pop" : if True, the individuals of the initial population can be freely mutated. If False, they can be branched but their atoms and bonds cannot be modified (True).

Input/Output parameters

The "io_parameters" attribute can be set with a dictionary containing the following entries.

  • "model_path" : path where to save model's output data ("EvoMol_model").
  • "smiles_list_init_path" : path where to find the SMILES list describing the initial population (None: initialization of the population with a single methane molecule).
  • "record_history" : whether to save exploration tree data. Must be set to True to further draw the exploration tree (False).
  • "save_n_steps" : frequency (steps) of saving the data (100).
  • "print_n_steps" : frequency (steps) of printing current population statistics (1).
  • "dft_working_dir" : path where to save DFT optimization related files ("/tmp").
  • "dft_cache_files" : list of json files containing a cache of previously computed HOMO or LUMO values ([]).

Drawing exploration trees

Large exploration tree

Performing a QED optimization run of 500 steps, while recording the exploration data.

from evomol import run_model

model_path = "examples/2_large_tree"

run_model({
    "obj_function": "qed",
    "optimization_parameters": {
        "max_steps": 500},
    "io_parameters": {
        "model_path": model_path,
        "record_history": True
    }
})

Plotting the exploration tree with solutions colored according to their score. Nodes represent solutions. Edges represent mutations that lead to an improvement in the population.

from evomol.plot_exploration import exploration_graph
exploration_graph(model_path=model_path, layout="neato")

Large exploration tree

Detailed exploration tree

Performing the experiment of mutating a fixed core of acetylsalicylic acid to increase its QED value.

from evomol import run_model

model_path = "examples/3_detailed_tree"

run_model({
    "obj_function": "qed",
    "optimization_parameters": {
        "max_steps": 10,
        "pop_max_size": 10,
        "k_to_replace": 2,
        "mutable_init_pop": False
    },
    "io_parameters": {
        "model_path": model_path,
        "record_history": True,
        "smiles_list_init_path": "examples/acetylsalicylic_acid.smi"
    }
})

Plotting the exploration tree including molecular drawings, scores and action types performed during mutations. Also plotting a table of molecular drawings.

from evomol.plot_exploration import exploration_graph

exploration_graph(model_path=model_path, layout="dot", draw_actions=True, plot_images=True, draw_scores=True,
                  root_node="O=C(C)Oc1ccccc1C(=O)O", legend_scores_keys_strat=["total"], mol_size=0.3,
                  legend_offset=(-0.007, -0.05), figsize=(20, 20/1.5), legends_font_size=13)

Detailed exploration tree

Detailed molecular drawings table

Environment variables and data requirements

CLscore

As the CLscore is dependent of prior data to be computed, EvoMol needs to be given the data location. To do so, the $SHINGLE_LIBS environment variable must be set to the location of the shingle_libs folder that can be downloaded here.

DFT and Molecular Mechanics optimization

To perform DFT and Molecular Mechanics computation (necessary for HOMO and LUMO optimization), you need to bind Gaussian09 and OpenBabel with EvoMol.

To do so, the $OPT_LIBS variable must point to a folder containing :

  • run.sh : a script launching a DFT optimization with Gaussian09 of the input filepath given as parameter.
  • obabel/openbabel-2.4.1 : directory containing an installation of OpenBabel 2.4.1. Make sure to also set OpenBabel's $BABEL_DATADIR environment variable to $OPT_LIBS/obabel/openbabel-2.4.1/data.

GuacaMol initial population

To use EvoMol for GuacaMol goal directed benchmarks optimization using the best scoring molecules from their subset of ChEMBL as initial population, you need to :

  • Download the ChEMBL subset.
  • Give the path of the data using the "smiles_list_init_path" attribute.
  • Insure that the "guacamol_init_top_100" attribute is set to True.

About

Evolutionary algorithm for molecular properties optimization

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 100.0%