DeepChem contains an extensive collection of featurizers. If you haven't run into this terminology before, a "featurizer" is chunk of code which transforms raw input data into a processed form suitable for machine learning. Machine learning methods often need data to be pre-chewed for them to process. Think of this like a mama penguin chewing up food so the baby penguin can digest it easily.
Now if you've watched a few introductory deep learning lectures, you might ask, why do we need something like a featurizer? Isn't part of the promise of deep learning that we can learn patterns directly from raw data?
Unfortunately it turns out that deep learning techniques need
featurizers just like normal machine learning methods do. Arguably,
they are less dependent on sophisticated featurizers and more capable
of learning sophisticated patterns from simpler data. But
nevertheless, deep learning systems can't simply chew up raw files.
For this reason, deepchem
provides an extensive collection of
featurization methods which we will review on this page.
The dc.feat.Featurizer
class is the abstract parent class for all featurizers.
.. autoclass:: deepchem.feat.Featurizer :members:
Molecular Featurizers are those that work with datasets of molecules.
.. autoclass:: deepchem.feat.MolecularFeaturizer :members:
Here are some constants that are used by the graph convolutional featurizers for molecules.
.. autoclass:: deepchem.feat.graph_features.GraphConvConstants :members: :undoc-members:
There are a number of helper methods used by the graph convolutional classes which we document here.
.. autofunction:: deepchem.feat.graph_features.one_of_k_encoding
.. autofunction:: deepchem.feat.graph_features.one_of_k_encoding_unk
.. autofunction:: deepchem.feat.graph_features.get_intervals
.. autofunction:: deepchem.feat.graph_features.safe_index
.. autofunction:: deepchem.feat.graph_features.get_feature_list
.. autofunction:: deepchem.feat.graph_features.features_to_id
.. autofunction:: deepchem.feat.graph_features.id_to_features
.. autofunction:: deepchem.feat.graph_features.atom_to_id
This function helps compute distances between atoms from a given base atom.
.. autofunction:: deepchem.feat.graph_features.find_distance
This function is important and computes per-atom feature vectors used by graph convolutional featurizers.
.. autofunction:: deepchem.feat.graph_features.atom_features
This function computes the bond features used by graph convolutional featurizers.
.. autofunction:: deepchem.feat.graph_features.bond_features
This function computes atom-atom features (for atom pairs which may not have bonds between them.)
.. autofunction:: deepchem.feat.graph_features.pair_features
.. autoclass:: deepchem.feat.ConvMolFeaturizer :members:
.. autoclass:: deepchem.feat.WeaveFeaturizer :members:
.. autoclass:: deepchem.feat.CircularFingerprint :members:
.. autoclass:: deepchem.feat.RDKitDescriptors :members:
.. autoclass:: deepchem.feat.CoulombMatrix :members:
.. autoclass:: deepchem.feat.CoulombMatrixEig :members:
.. autoclass:: deepchem.feat.AtomicCoordinates :members:
.. autoclass:: deepchem.feat.AdjacencyFingerprint :members:
.. autoclass:: deepchem.feat.SmilesToSeq :members:
.. autoclass:: deepchem.feat.SmilesToImage :members:
The dc.feat.ComplexFeaturizer
class is the abstract parent class for all featurizers that work with three dimensional molecular complexes.
.. autoclass:: deepchem.feat.ComplexFeaturizer :members:
.. autoclass:: deepchem.feat.RdkitGridFeaturizer :members:
.. autoclass:: deepchem.feat.NeighborListComplexAtomicCoordinates :members:
Material Structure Featurizers are those that work with datasets of crystals with periodic boundary conditions. For inorganic crystal structures, these featurizers operate on pymatgen.Structure objects, which include a lattice and 3D coordinates that specify a periodic crystal structure. They should be applied on systems that have periodic boundary conditions. Structure featurizers are not designed to work with molecules.
.. autoclass:: deepchem.feat.MaterialStructureFeaturizer :members:
.. autoclass:: deepchem.feat.SineCoulombMatrix :members:
.. autoclass:: deepchem.feat.StructureGraphFeaturizer :members:
Material Composition Featurizers are those that work with datasets of crystal compositions with periodic boundary conditions. For inorganic crystal structures, these featurizers operate on chemical compositions (e.g. "MoS2"). They should be applied on systems that have periodic boundary conditions. Composition featurizers are not designed to work with molecules.
.. autoclass:: deepchem.feat.MaterialCompositionFeaturizer :members:
.. autoclass:: deepchem.feat.ElementPropertyFingerprint :members:
.. autoclass:: deepchem.feat.BindingPocketFeaturizer :members:
.. autoclass:: deepchem.feat.UserDefinedFeaturizer :members:
.. autoclass:: deepchem.feat.BPSymmetryFunctionInput :members:
.. autoclass:: deepchem.feat.OneHotFeaturizer :members:
.. autoclass:: deepchem.feat.RawFeaturizer :members: