datacleanbot

Automated Data Cleaning Tool. The main goal is to develop a Python tool datacleanbot such that: Given a random parsed raw dataset representing a supervised learning problem, the Python tool is capable of automatically identifying the potential issues and reporting the results and recommendations to the end-user in an effective way.

Install

$ pip install datacleanbot

QuickStart

Install OpenML (version 0.9.0):

OpenML is used to easily import datasets and share models and experiments.

$ pip install openml

For Windows, you need to have C++ Compiler installed.

Acquire data from OpenML:

>>> import openml as oml
>>> data = oml.datasets.get_dataset(id) # id: openml dataset id
>>> X, y, categorical_indicator, features = data.get_data(target=data.default_target_attribute, dataset_format='array')
>>> Xy = np.concatenate((X,y.reshape((y.shape[0],1))), axis=1)

Autoclean data with datacleanbot:

>>> import datacleanbot.dataclean as dc
>>> Xy = dc.autoclean(Xy, data.name, features)

Description

datacleanbot is equipped with the following capabilities:

Present an overview report of the given dataset
- The most important features
- Statistical information (e.g., mean, max, min)
- Data types of features
Clean common data problems in the raw dataset
- Duplicated records
- Inconsistent column names
- Missing values
- Outliers

The two aspects datacleanbot meaningfully automates are marked in bold.

User's Guide

The user's guide can be found at datacleanbot.

Name		Name	Last commit message	Last commit date
Latest commit History 110 Commits
datacleanbot		datacleanbot
doc		doc
paper		paper
process		process
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
contributing.md		contributing.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

datacleanbot

Install

QuickStart

Install OpenML (version 0.9.0):

Acquire data from OpenML:

Autoclean data with datacleanbot:

Description

User's Guide

About

Releases

Packages

Languages

License

Ji-Zhang/datacleanbot

Folders and files

Latest commit

History

Repository files navigation

datacleanbot

Install

QuickStart

Install OpenML (version 0.9.0):

Acquire data from OpenML:

Autoclean data with datacleanbot:

Description

User's Guide

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages