conda create -n auto_ad python=3.11
conda activate auto_ad
cd AutoAD
pip install -r requirements.txt
Also Checkout the Examples in examples.ipynb
.
import pandas as pd
import numpy as np
from sklearn import set_config
set_config(transform_output="pandas")
X_train = pd.DataFrame({
'ID': [1, 2, 3, 4],
'Name': ['Alice', 'Alice', 'Alice', "Alice"],
'Rank': ['A','B','C','D'],
'Age': [25, 30, 35, 40],
'Salary': [50000.00, 60000.50, 75000.75, 8_000],
'Hire Date': pd.to_datetime(['2020-01-15', '2019-05-22', '2018-08-30', '2021-04-12']),
'Is Manager': [False, True, False, ""]
})
X_test = pd.DataFrame({
'ID': [1, 2, 3, 4],
'Name': ['Alice', 'Alice', 'Alice', "Bob"],
'Rank': ['A','B','C','D'],
'Age': [25, 30, 35, np.nan],
'Salary': [50000.00, 60000.50, 75000.75, 8_000_000],
'Hire Date': pd.to_datetime(['2020-01-15', '2019-05-22', '2018-08-30', '2021-04-12']),
'Is Manager': [False, True, False, ""]
})
########################################
import pdb
from autoad.autoad import AutoAD
from pyod.models.iforest import IForest
pipeline_ad = AutoAD()
pipeline_ad.fit(X=X_train, clf_ad=IForest())
X_transformed = pipeline_ad.transform(X=X_test)
X_transformed
Both methods (MOD Z-Value and Tukey Method) are resilient against outliers, ensuring that the position measurement will not be biased. They also support multivariate anomaly detection algorithms in identifying univariate anomalies.
Newest research shows similar results for encoding nominal columns with significantly fewer dimensions.
- (John T. Hancock and Taghi M. Khoshgoftaar. "Survey on categorical data for neural networks." In: Journal of Big Data 7.1 (2020), pp. 1–41.), Tables 2, 4
- (Diogo Seca and João Mendes-Moreira. "Benchmark of Encoders of Nominal Features for Regression." In: World Conference on Information Systems and Technologies. 2021, pp. 146–155.), P. 151
- I used sklearn's Pipeline and Transformer concept to create this preprocessing pipeline