Skip to content
/ molll Public

Data Driven Estimation of Molecular Log-Likelihood using Fingerprint Key Counting

License

Notifications You must be signed in to change notification settings

EBjerrum/molll

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

molll: Data Driven Estimation of Molecular Log-Likelihood using Fingerprint Key Counting

This software provides models for estimating the likelihood of a molecule belonging to a specific dataset based on simple fingerprint key counting. The models, AtomLL and MolLL, are designed for outlier detection and class membership assignment. They offer potential applications in molecular generation and optimization. PropLL is included and uses scikit kernel density estimates on RDKit derived and user-selectable properties.

Installation

currently clone and install directly from main directory

pip install .

or directly from the repository without cloning

pip install git+https://github.com/EBjerrum/molll.git

Usage

The code work on lists of RDKit Mol objects

from molll import MolLL
molll = MolLL()
molll.analyze_dataset(mols_list)
molll.calculate_lls(other_or_same_mols)
#Or a single Mol object
molll.calculate_ll(single_mols)

For convenience some classes with precomputed data are available, currently based on LibInvent train data.

from molll import LibInventMolLLr1
molll = LibInventMolLLr1
molll.calculate_lls(mols_list)

Saving and loading from a text based format

molll.save("MySaveFile.json")

molll_clone = MolLL()
molll_clone.load("MySaveFile.json")

Additional Reading

There's a preprint on ChemRxiv with some example usages: https://doi.org/10.26434/chemrxiv-2024-hzddj

About

Data Driven Estimation of Molecular Log-Likelihood using Fingerprint Key Counting

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages