Feature-engine is a Python library with multiple transformers to engineer features for use in machine learning models. Feature-engine's transformers follow Scikit-learn functionality with fit() and transform() methods to first learn the transforming paramenters from data and then transform the data.
Feature Engineering for Machine Learning, Online Course. Python Feature Engineering Cookbook
- Documentation: http://feature-engine.readthedocs.io
- Home page: https://www.trainindata.com/feature-engine
- Missing data imputation
- Categorical variable encoding
- Outlier removal
- Discretisation
- Numerical Variable Transformation
- MeanMedianImputer
- RandomSampleImputer
- EndTailImputer
- AddNaNBinaryImputer
- CategoricalVariableImputer
- FrequentCategoryImputer
- ArbitraryNumberImputer
- CountFrequencyCategoricalEncoder
- OrdinalCategoricalEncoder
- MeanCategoricalEncoder
- WoERatioCategoricalEncoder
- OneHotCategoricalEncoder
- RareLabelCategoricalEncoder
- Winsorizer
- ArbitraryOutlierCapper
- OutlierTrimmer
- EqualFrequencyDiscretiser
- EqualWidthDiscretiser
- DecisionTreeDiscretiser
- LogTransformer
- ReciprocalTransformer
- PowerTransformer
- BoxCoxTransformer
- YeoJohnsonTransformer
pip install feature_engine
or
git clone https://github.com/solegalli/feature_engine.git
>>> from feature_engine.categorical_encoders import RareLabelCategoricalEncoder
>>> import pandas as pd
>>> data = {'var_A': ['A'] * 10 + ['B'] * 10 + ['C'] * 2 + ['D'] * 1}
>>> data = pd.DataFrame(data)
>>> data['var_A'].value_counts()
Out[1]:
A 10
B 10
C 2
D 1
Name: var_A, dtype: int64
>>> rare_encoder = RareLabelCategoricalEncoder(tol=0.10, n_categories=3)
>>> data_encoded = rare_encoder.fit_transform(data)
>>> data_encoded['var_A'].value_counts()
Out[2]:
A 10
B 10
Rare 3
Name: var_A, dtype: int64
See more usage examples in the jupyter notebooks in the example folder of this repository, or in the documentation: http://feature-engine.readthedocs.io
- Clone the repo and cd into it
- Run
pip install tox
- Run
tox
if the tests pass, your local setup is complete
PR's are welcome! Please make sure the CI tests pass on your branch.
BSD 3-Clause
- Soledad Galli - Initial work - Feature Engineering for Machine Learning, Online Course.
Many of the engineering and encoding functionality is inspired by this series of articles from the 2009 KDD competition.
To learn more about the rationale, functionality, pros and cos of each imputer, encoder and transformer, refer to the Feature Engineering for Machine Learning, Online Course
For a summary of the methods check this presentation and this article
To stay alert of latest releases, sign up at trainindata