This repository contains tools and routines to calculate distances between synthesis routes and to cluster them.
This repository is mainly intended for developers and researchers. If you want a fully functional tool that is easy to use, please consider looking into the AiZynthFinder project.
Before you begin, ensure you have met the following requirements:
-
Linux, Windows or macOS platforms are supported - as long as the dependencies are supported on these platforms.
-
You have installed anaconda or miniconda with python 3.8 to 3.9
The tool has been developed on a Linux platform, but the software has been tested on Windows 10 and macOS Catalina.
Setup your python environment and then run
pip install route_distances
First clone the repository using Git.
Then execute the following commands in the root of the repository
conda env create -f conda-env.yml
conda activate routes-env
poetry install
the route_distances
package is now installed in editable mode.
The tool will install the cluster_aizynth_output
that is used
to calculate distances and clusters for AiZynthFinder output
cluster_aizynth_output --files finder_output1.hdf5 finder_output2.hdf5 --output finder_distances.hdf5 --nclusters 0 --model ted
This will perform TED calculations and add a column distance_matrix
with the distances and column cluster_labels
with the cluster labels for each route to the output file.
An ML model for fast predictions can be found here: https://zenodo.org/record/4925903.
This can be used with the cluster_aizynth_output
tool
cluster_aizynth_output --files finder_output1.hdf5 finder_output2.hdf5 --output finder_distances.hdf5 --nclusters 0 --model chembl_10k_route_distance_model.ckpt
For further details, please consult the documentation.
Tests uses the pytest
package, and is installed by poetry
Run the tests using:
pytest -v
The documentation is generated by Sphinx from hand-written tutorials and docstrings
The HTML documentation can be generated by
invoke build-docs
We welcome contributions, in the form of issues or pull requests.
If you have a question or want to report a bug, please submit an issue.
To contribute with code to the project, follow these steps:
- Fork this repository.
- Create a branch:
git checkout -b <branch_name>
. - Make your changes and commit them:
git commit -m '<commit_message>'
- Push to the remote branch:
git push
- Create the pull request.
Please use black
package for formatting, and follow pep8
style guide.
- Samuel Genheden
The contributors have limited time for support questions, but please do not hesitate to submit an issue (see above).
The software is licensed under the MIT license (see LICENSE file), and is free and provided as-is.
- Genheden S, Engkvist O, Bjerrum E (2021) Clustering of synthetic routes using tree edit distance. J. Chem. Inf. Model. 61:3899–3907 https://doi.org/10.1021/acs.jcim.1c00232
- Genheden S, Engkvist O, Bjerrum E (2022) Fast prediction of distances between synthetic routes with deep learning. Mach. Learn. Sci. Technol. 3:015018 https://doi.org/10.1088/2632-2153/ac4a91