Fixed:
- Fix minor typos in tutorials
Authors:
- Hadrien Mary
- michelml
Added:
- Add configurations for dev containers based on the micromamba Docker image. More informations about dev container at https://docs.github.com/en/codespaces/setting-up-your-project-for-codespaces/introduction-to-dev-containers.
- support for two additional forcefields: MMFF94s with and without electrostatic component
- energies output along with delta-energy to lowest energy conformer
Changed:
- API of dm.conformers.generate() to support choice of forcefield. In addition ewindow and eratio flags added to reject high energy conformers, either on absoute scale, or as ratio to rotatable bonds
- Revamped all the datamol tutorials and add new tutorials. Huge thanks to @Valence-jonnyhsu for leading the refactoring of the datamol tutorials.
- Improve documentation for dm.standardize_mol()
- Multiple various docstring and typing improvments.
- Embed the cdk2.sdf and solubility_*.sdf files within the datamol package to prevent issue with the RDKit config dir.
- Enable strict mode on the documentation to prevent any issues and inconsistency with the types and docstrings of datamol.
- Refactor micromamba CI to use latest and simplify it.
Removed:
- Remove unused and unmaintained dm.actions and dm.reactions module.
- Remove copy args from add_hs and remove_hs (RDKit already returns copies).
Fixed:
- Errors in ECFP fingerprints that computes FCFP instead of ECFP.
Authors:
- Emmanuel Noutahi
- Hadrien Mary
- Matt
Added:
- New possibilities for ambiguous matching of molecules in the function reorder_mol_from_template
Changed:
- Replaced allow_ambiguous_hs_only by the option "hs_only" for the ambiguous_match_mode parameter
- ambiguous_match_mode is now a String, no longer a bool.
Deprecated:
- allow_ambiguous_hs_only is no longer deprecated, but without warning since the feature is brand new.
- Same for ambiguous_match_mode being a bool.
Authors:
- DomInvivo
- Hadrien Mary
Added:
- datamol.graph.match_molecular_graphs, with unit-tests
- datamol.graph.reorder_mol_from_template, with unit-tests
Changed:
- Typing in datamol.graph.py, changed rdkit.Chem.rdchem.Mol to dm.Mol
Deprecated:
- NOTHING
Removed:
- NOTHING
Fixed:
- NOTHING
Security:
- NOTHING
Authors:
- DomInvivo
- Emmanuel Noutahi
Fixed:
- Bug in dm.conformer.generate() when multiple conformers had equal energies
- Fix the documentation.
Authors:
- Cas
- Hadrien Mary
Added:
- Add dm.read_molblock() and dm.to_molblock() functions.
- Add dm.to_xlsx() function.
Fixed:
- Fix the API doc.
Authors:
- Hadrien Mary
Changed:
- Add joblib_batch_size in dm.parallelized_with_batches() to be able to control the joblib batch size (which is different than the dm.parallelized_with_batches batch size.
- Various small improvements for unit tests.
Authors:
- Hadrien Mary
Added:
- Add dm.parallelized_with_batches() to parallelize workload with a function that take a batch of inputs.
Authors:
- Hadrien Mary
Changed:
- Don't import sasscorer by default but only during the call to dm.descriptors.sas(mol)
Authors:
- Hadrien Mary
Changed:
- Use micromamba during CI.
- Add CI tests for RDKit=2022.03.
- Adapt a test to new rdkit version.
Fixed:
- typing for what is returned by dm.align.template_align
Authors:
- Hadrien Mary
- michelml
Changed:
- allow_r_groups option in dm.align.auto_align_many
Removed:
- should_align
Authors:
- Hadrien Mary
- michelml
Added:
- A new dm.align module with various functions to align a list of molecules. Use dm.align.template_align to align a molecule to a template and dm.align.auto_align_many to automatically partition and align a list of molecules.
- New descriptors: formal_charge
- New descriptors: refractivity
- New descriptors: n_rigid_bonds
- New descriptors: n_stereo_centers
- New descriptors: n_charged_atoms
- Add dm.clear_props to clear all the properties of a mol.
- Add a new dataset in addition to freesolv based on RDKit CDK2 at dm.cdk2().
- Add dm.strip_mol_to_core to remove all R groups from a molecule.
- Add dm.UNSPECIFIED_BOND
- dm.compute_ring_system to extract the ring systems from a molecule.
Changed:
- Improve typing.
- Improve relative imports coverage.
- Adapt dm.to_image to use the align module.
Removed:
- Remove a lot of # type: ignore as those can be error prone (hopefully the tests are here!)
Authors:
- Hadrien Mary
Added:
- Add dm.conformers.keep_conformers in order to only keep one or multiple conformers from a molecules.
Changed:
- Change the conformer generation arguments to use useRandomCoords=True by default.
- Start using explicit Optional instead of implicit Optional for typing.
- Start using relative imports instead of absolute ones.
- When conformers are not minimized, sort them by energy (can be turned to False).
Removed:
- Remove fallback_to_random_coords argument from generate_conformers.
Authors:
- Hadrien Mary
Added:
- Support for selfies<2.0.0 in tests
Changed:
- Behaviour of all inchi functions to return None with a warning instead of silently returning an empty string
- Order of str evaluation on convertion function. isinstance(str) is now evaluated before is None
Fixed:
- Bug in unique_id making this evaluation falling back on 'd41d8cd98f00b204e9800998ecf8427e' on unsupported inputs. Instead None is returned now
Authors:
- Emmanuel Noutahi
Changed:
- Add remove_hs flag in dm.read_sdf().
Authors:
- Hadrien Mary
Added:
- Add dm.descriptors.n_aromatic_atoms
- Add dm.descriptors.n_aromatic_atoms_proportion
- Add dm.predictors.esol
- Add dm.predictors.esol_from_data
Changed:
- Make descriptors a folder (backward compatible).
- Rename any_descriptor to any_rdkit_descriptor to be more explicit.
Authors:
- Hadrien Mary
Added:
- Add dm.conformers.align_conformers() to align the conformers of a list of molecules.
Changed:
- New lower bound rdkit version to >=2021.09. See #81 for details.
Authors:
- Hadrien Mary
Fixed:
- Catch too long integer values in set_mol_props and switch to SetDoubleProp instead of SetIntProp
Authors:
- Hadrien Mary
Changed:
- Expose the clean_it flag when enumerating stereoisomers.
Authors:
- Hadrien Mary
- Julien Horwood
Added:
- Parameters allowing to customize or ignore failures when running the conformer generation.
Changed:
- When the conformer embedding fails, it will now optionally fall back to using random coordinates.
Authors:
- Hadrien Mary
- Julien Horwood
Added:
- Add a new total arg in dm.parallelized() (only useful when the progress is set to True)
Changed:
- Prevent tqdm_kwargs` collision in dm.parallelized().
Authors:
- Hadrien Mary
Added:
- Add dm.to_inchi_non_standard() and dm.to_inchikey_non_standard() in order to generate InChi values that are sensitive to tautomerism as well as undefined stereoisomery.
- Add dm.unique_id to generate unique molecule identifiers based on dm.to_inchikey_non_standard
Changed:
- Add use_non_standard_inchikey flag argument to dm.same_mol.
Authors:
- Hadrien Mary
Added:
- Add dm.utils.fs.copy_dir() to recursively copy directories across filesystems + tests.
- Add dm.utils.fs.mkdir + tests.
- Add a new dm.descriptors module with compute_many_descriptors and batch_compute_many_descriptors + tests.
- Add dm.viz.match_substructure to highlight one or more substructures in a list of molecules + tests. Note that the current function does not show different colors per match and submatch because of a limitation in MolsToGridImage. We plan to address this in a future version of datamol.
- Add a new mcs module backed by rdkit.Chem.rdFMCS with find_mcs function + tests.
- Add a new function dm.viz.utils.align_2d_coordinates to align 2d coordinates of molecules using either a given pattern or MCS.
- Add dm.canonical_tautomer to canonicalize tautomers.
- Add dm.remove_stereochemistry().
- Add a bond_line_width arg to to_image.
- Add dm.atom_list_to_bond()
- Add enable flag to dm.without_rdkit_log()
- Add a tutorial about the filesystem module.
- Add a tutorial about the viz module (still incomplete).
- Add dm.substructure_matching_bonds to perform a standard substructure match but also return the matching bonds instead of only the matching atoms.
- Add new dm.isomers module + move relevant functions from dm.mol to dm.isomers
- Add dm.add_hs and dm.remove to add and remove hydrogens from molecules.
Changed:
- Set fsspec minimum version to >=2021.9.
- Pimp up dm.utils.to_image to make it more robust (don't fail on certain molecules due to incorrect aromaticity) and also propagate more drawing options to RDKit such as legend_fontsize and others.
- Add a new align argument in dm.to_image() to align the 2d coordinates of the molecules.
- In dm.to_image, use_svg is now set to True by default.
- Change the default mol_size from 200 to 300 in to_image.
- Link datamol.utils.fs to datamol.fs.
- Change default chunk_size in copy_file from 2048 to 1024 * 1024 (1MB).
- Support parallel chunked distances computation in dm.similarity.cdist
Authors:
- Hadrien Mary
Changed:
- The default git branch is now main
- appdirs is now an hard dep.
- Change CI to use rdkit [2021.03, 2021.09] and add the info the readme and doc.
Fixed:
- Test related to SELFIES to make it work with the latest 2.0 version.
- dm.to_mol accept mol as input but the specified type was only str.
Authors:
- Hadrien Mary
Fixed:
- Force the input value(s) of dm.molar.log_to_molar to be a float since power of integers are not allowed.
Authors:
- Hadrien Mary
Removed:
- py.typed file that seems unused beside confusing static analyzer tools.
Authors:
- Hadrien Mary
Added:
- to_smarts for exporting molecule objects as SMARTS
- from_smarts for reading molecule from SMARTS string
Changed:
- Allow exporting smiles in kekule representaiton
- to_smarts is properly renamed into smiles_as_smarts
Authors:
- Emmanuel Noutahi
Removed:
- Revert batch_size fix to use default joblib instead
Fixed:
- Issue #58: sequence bug in parallel.
Authors:
- Emmanuel Noutahi
Added:
- Add a new function to measure execution time dm.utils.perf.watch_duration.
Changed:
- Add a batch_size option to dm.utils.parallelized. The default behaviour batch_size=None is unchanged and so 100% backward compatible.
Authors:
- Hadrien Mary
Changed:
- get_protocol is more general
Fixed:
- Bug in fs.glob due to protocol being a list
Authors:
- Emmanuel Noutahi
Added:
- Add missing appdirs dependency
- Add missing appdirs dependency
Fixed:
- Propagate tqdm_kwargs for parallel (was only done for sequential)
Authors:
- Hadrien Mary
Added:
- Add tqdm_kwargs to dm.utils.JobRunner()
- Add tqdm_kwargs to dm.utils.parallelized()
Changed:
- Propagate job_kwargs to dm.utils.parallelized()`
Authors:
- Hadrien Mary
Added:
- Add a DOI so datamol can get properly cited.
- Better doc about compat and CI
- Add a datamol Mol type: dm.Mol identical to Chem.rdchem.Mol
Changed:
- Bump test coverage from 70% to 80%.
Authors:
- DeepSource Bot
- Hadrien Mary
- deepsource-autofix[bot]
Added:
- More tests for the dm.similarity modules + check against RDKit equivalent methods.
- dm.same_mol(mol1, mol2) to check whether 2 molecules are the same based on their InChiKey.
Changed:
- use scipy in dm.similarity.pdist().
- Raise an error when a molecule is invalid in dm.similarity.pdist/cdist.
Deprecated:
- dm.similarity.pdist() nows returns only the dist matrix without the valid_idx vector.
Fixed:
- A bug returning an inconsistent dist matrix with dm.similarity.pdist().
Authors:
- Hadrien Mary
Changed:
- A better and manually curated API documentation.
Authors:
- Hadrien Mary
Added:
- Add support for more fingerprint types.
- Two utility functions for molar concentration conversion: dm.molar_to_log() and dm.log_to_molar().
- Add the dm.utils.fs module to work with any type of paths (remote or local).
Authors:
- Hadrien Mary
Added:
- Add a sanitize flag to from_df.
- Automatically detect the mol column in from_df.
- Add add_hs arg to sanitize_mol.
Changed:
- Allow input a single molecule to dm.to_sdf instead of a list of mol.
- Preserve mol properties and the frist conformer in dm.sanitize_mol.
- Display a warning message when input mol has multiple conformers in dm.sanitize_mol.
Fixed:
- Remove call to sanitize_mol in read_sdf, instead use sanitize=True from RDKit.
- Remove the mol column from the mol properties in from_df. It also fixes to_sdf.
Authors:
- Hadrien Mary
Changed:
- Propagate sanitize and strict_parsing to dm.read_sdf.
Authors:
- Hadrien Mary
- Ishan Kumar
- michelml
Fixed:
- Fix again and hopefully the last time google analytics.
Authors:
- Hadrien Mary
Changed:
- Add s3fs and gcsfs as hard dep
Authors:
- Hadrien Mary
Authors:
- Hadrien Mary
- michelml
Authors:
- Hadrien Mary
Changed:
- New logo.
Authors:
- Hadrien Mary
Fixed:
- Fixed typo in readme
Authors:
- Emmanuel Noutahi
- Hadrien Mary
Authors:
- Hadrien Mary
Added:
- dm.copy_mol
- dm.set_mol_props
- dm.copy_mol_props
- dm.conformers.get_coords
- dm.conformers.center_of_mass
- dm.conformers.translate
- dm.enumerate_stereoisomers
- dm.enumerate_tautomers
- dm.atom_indices_to_mol
Changed:
- rdkit fp to numpy array conversion is purely numpy-based now (x4 faster).
- Cleaning of various docstrings (removing explicit types).
- Clean various types.
- Allow dm.to_image instead of dm.viz.to_image
- Add atom indices drawing option to dm.to_image
- Allow to smiles to fail (default is to not fail but return None as before).
- Add CXSmiles bool flag to to_smiles.
- Rename utils.paths to utils.fs
- Integrate pandatools into dm.to_df.
- Build a mol column from smiles in read_csv and read_excel
- Rename dm.sanitize_best to dm.sanitize_first
Fixed:
- Scaffold tests for new rdkit version
- Conformer cluster tests for new rdkit version
Authors:
- Hadrien Mary
- Therence1
- michelml
- mike
Fixed:
- Tqdm progress bar update on completion of job and not submission
Authors:
- Emmanuel Noutahi
Changed:
- Make ipywidgets an optional dep.
Authors:
- Hadrien Mary
Changed:
- Propagate more options to dm.reorder_atoms.
Authors:
- Hadrien Mary
Added:
- dm.pick_centroids for picking a set of centroid molecules using various algorithm
- dm.assign_to_centroids for clustering molecules based on precomputed centroids.
Changed:
- Make add_hs optional in conformers.generate and removed them when add_hs is True. Explicit hydrogens will be lost.
Fixed:
- Doc string of dm.pick_diverse
Authors:
- Emmanuel Noutahi
- Hadrien Mary
Added:
- Added outfile to viz.to_image
Changed:
- Replace ete3 by networkx due to GPL licensing.
- Fix some typos in docs.
Fixed:
- Null pointer exception during conformers generation.
Authors:
- Emmanuel Noutahi
- Hadrien Mary
- Honoré Hounwanou
- michelml
Added:
- Add a test to monitor datamol import duration.
Changed:
- Add rms cutoff option during conformers generation.
- Refactor conformer cluster function.
Authors:
- Hadrien Mary
Added:
- Include stub files for rdkit generated using stubgen from mypy.
Authors:
- Hadrien Mary
Added:
- Add to_smi and from_smi in the IO module.
- Support filelike object in io module.
- Add kekulization to to_mol
Changed:
- Switch tests of the IO module to regular functions.
Deprecated:
- In the IO module, use urlpath instead of file_uri to follow fsspec conventions.
Fixed:
- Fix bug in read_excel where sheet_name wasnt being used.
Authors:
- Emmanuel Noutahi
- Hadrien Mary
Changed:
- Constraint rdkit to 2020.09 to get rdBase.LogStatus()
Authors:
- Hadrien Mary
Changed:
- Better rdkit log disable/enable.
Authors:
- Hadrien Mary
Added:
- Test that execute the notebooks.
Fixed:
- Force rdkit >=2020.03.6 to avoid thread-related bug in rdMolStandardize
Authors:
- Hadrien Mary
Added:
- Add cdist function to compute tanimoto sim between two list of molecules.
Fixed:
- Fix a bug in dm.from_df when the dataframe has a size of zero.
Authors:
- Hadrien Mary
Added:
- Add all the common sanitize functions.
- Add the 2_Preprocessing_Molecules notebook.
- Add fragment module.
- Add scaffold module.
- Add cluster module.
- Add assemble module.
- Add actions module.
- Add reactions module.
- Add dm.viz.circle_grid function
- Add doc with mkdocs
Authors:
- Hadrien Mary
Authors:
- Hadrien Mary
Authors:
Added:
- first release!
Authors: