Feature Engine is a python library that contains several transformers to engineer features for use in machine learning models. The transformers follow scikit-learn like functionality. They first learn the imputing or encoding methods from the training set, and subsequently transform the dataset. Currently the transformers include functionality for:
- Missing value imputation
- Categorical variable encoding
- Outlier removal
- Discretisation
- Numerical Variable Transformation
Documentation: http://feature-engine.readthedocs.io
- MeanMedianImputer
- RandomSampleImputer
- EndTailImputer
- AddNaNBinaryImputer
- CategoricalVariableImputer
- FrequentCategoryImputer
- ArbitraryNumberImputer
- CountFrequencyCategoricalEncoder
- OrdinalCategoricalEncoder
- MeanCategoricalEncoder
- WoERatioCategoricalEncoder
- OneHotCategoricalEncoder
- RareLabelCategoricalEncoder
- Windsorizer
- ArbitraryOutlierCapper
- EqualFrequencyDiscretiser
- EqualWidthDiscretiser
- DecisionTreeDiscretiser
- LogTransformer
- ReciprocalTransformer
- ExponentialTransformer
- BoxCoxTransformer
pip install feature_engine
or
git clone https://github.com/solegalli/feature_engine.git
from feature_engine.categorical_encoders import RareLabelEncoder
rare_encoder = RareLabelEncoder(tol = 0.05, n_categories=5)
rare_encoder.fit(data, variables = ['Cabin', 'Age'])
data_encoded = rare_encoder.transform(data)
See more usage examples in the jupyter notebooks in the example section
You can find jupyter notebooks in the examples folder, with directions on how to use this package and its multiple transformers.
BSD 3-Clause
- Soledad Galli - Initial work - Feature Engineering Online Course
Most of the engineering and encoding functionality is inspired by this series of articles from the 2009 KDD competition
To learn more about the rationale, functionality, pros and cos of each imputer, encoder and transformer, refer to the Feature Engineering Online Course
For a summary of the methods check this presentation