dirty_cat

dirty_cat is a Python module for machine-learning on dirty categorical variables.

dirty_cat's SuperVectorizer automatically turns pandas data frames into numerical arrays suitable for learning.

For a detailed description of the problem of encoding dirty categorical data, see Similarity encoding for learning with dirty categorical variables [1] and Encoding high-cardinality string categorical variables [2].

Installation

Dependencies

dirty_cat requires:

Python (>= 3.8)
NumPy (>= 1.17.3)
SciPy (>= 1.4.0)
scikit-learn (>= 0.22.0)
pandas (>= 1.1.5)

User installation

If you already have a working installation of NumPy and SciPy, the easiest way to install dirty_cat is using pip

pip install -U --user dirty_cat

Other implementations

Spark ML: https://github.com/rakutentech/spark-dirty-cat

References

[1]	Patricio Cerda, Gaël Varoquaux, Balázs Kégl. Similarity encoding for learning with dirty categorical variables. 2018. Machine Learning journal, Springer.

[2]	Patricio Cerda, Gaël Varoquaux. Encoding high-cardinality string categorical variables. 2020. IEEE Transactions on Knowledge & Data Engineering.

Name		Name	Last commit message	Last commit date
Latest commit History 1,138 Commits
.circleci		.circleci
.github		.github
benchmarks		benchmarks
build_tools		build_tools
dirty_cat		dirty_cat
doc		doc
examples		examples
.coveragerc		.coveragerc
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGES.rst		CHANGES.rst
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.txt		LICENSE.txt
README.rst		README.rst
RELEASE_PROCESS.md		RELEASE_PROCESS.md
codecov.yml		codecov.yml
pyproject.toml		pyproject.toml
requirements-min.txt		requirements-min.txt
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dirty_cat

Installation

Dependencies

User installation

Other implementations

References

About

Releases

Packages

Languages

License

GaelVaroquaux/skrub

Folders and files

Latest commit

History

Repository files navigation

dirty_cat

Installation

Dependencies

User installation

Other implementations

References

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages