Adegorical

Adegorical is a python package for performing advanced transformations on categorical data. This can be particularily useful in regression analysis but can be applied to other machine learning techniques (at your own peril).

Getting Started

This function returns the data structure it is given:

Pandas series input returns a pandas dataframe
Numpy column input returns a numpy array
Python list input returns a list of lists

import adegorical as ad

encoding_types = ad.help()	
print(encoding_types)	
 ['dummy', 'binary', 'simple_contrast', 'simple_regression','backward_difference_contrast', 'forward_difference_contrast', 'simple_helmert']

The encoding methods in this package were built off of the work found on UCLA's Advance Categorical Variable Encoding and a Presentation by Harris Holly. Unfortunately, UCLA removed the webpage from their website. An archived version of the website can be found in this repository.

Dummy

Dummy is the standard when it comes to categorical variable encoding. N-1 columns is expected where N is the number of unique categorical variables.

colors = ['yellow', 'red', 'green', 'wenge', 'orange', 'red', 'yellow', 'blue', 'magenta', 'wenge']
df = pd.DataFrame({'colors':colors})

categorial_frame = ad.get_categorical(df['colors'],
                                          encoding='dummy',
                                          column_name=None)

yellow_dummy	wenge_dummy	red_dummy	green_dummy	magenta_dummy	magenta_dummy
1	0	0	0	0	0
0	0	1	0	0	0
0	0	0	1	0	0
0	1	0	0	0	0
0	0	0	0	0	0
0	0	1	0	0	0
1	0	0	0	0	0
0	0	0	0	0	1
0	0	0	0	1	0
0	1	0	0	0	0

Binary

All the columns in sequential combination compose a binary representation of the categorical variable. The length of the string of the binary representation of the unique number of categorical variables is expected.

colors = ['yellow', 'red', 'green', 'wenge', 'orange', 'red', 'yellow', 'blue', 'magenta', 'wenge']
df = pd.DataFrame({'colors':colors})

categorial_frame = ad.get_categorical(df['colors'],
                                          encoding='binary',
                                          reference='red',
                                          column_name='binary')

binary_1	binary_2	binary_3
0	0	0
1	1	0
0	1	1
0	0	1
0	1	0
1	1	0
0	0	0
1	0	1
1	0	0
0	0	1

Simple Contrast

Instead of all zeros on our reference value as with dummy variables, the row becomes negative one. N-1 columns is expected

colors = ['yellow', 'red', 'green', 'wenge', 'orange', 'red', 'yellow', 'blue', 'magenta', 'wenge']
df = pd.DataFrame({'colors':colors})

categorial_frame = ad.get_categorical(df['colors'],
                                          encoding='simple_contrast',
                                          reference='red',
                                          column_name='simple_contrast')

yellow_simple_contrast	wenge_simple_contrast	orange_simple_contrast	green_simple_contrast	magenta_simple_contrast	blue_simple_contrast
1	0	0	0	0	0
-1	-1	-1	-1	-1	-1
0	0	0	1	0	0
0	1	0	0	0	0
0	0	1	0	0	0
-1	-1	-1	-1	-1	-1
1	0	0	0	0	0
0	0	0	0	0	1
0	0	0	0	1	0
0	1	0	0	0	0

Todo

Enhancements

Encoding Methods

Forward Difference Regression
Backward Difference Regression
Simple Helmert Regression
Reverse Helmert
Polynomial
Regression Polynomial
Deviation
Deviation Regression

Performance

Manipulate data in native format rather than converting to lists and back to native format (i.e. pandas data input, transforming via optimized pandas methods)

Miscellaneous

Redo column naming convension on binary. Results are a combination of columns so having a "blue" column doesn't make much sense

Readme

Simple Regression
Backward Difference Contrast
Forward Difference Contrast
Simple Helmert
Remaining encoding methods found in todo encoding methods

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
Resources		Resources
NEWS.MD		NEWS.MD
README.md		README.md
adegorical.py		adegorical.py
test_adegorical.py		test_adegorical.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Adegorical

Table of Contents

Getting Started

Encoding Methods

Dummy

Binary

Simple Contrast

Todo

Enhancements

Encoding Methods

Performance

Miscellaneous

Readme

About

Releases

Packages

Languages

yellow_dummy	wenge_dummy	red_dummy	green_dummy	magenta_dummy	magenta_dummy
1	0	0	0	0	0
0	0	1	0	0	0
0	0	0	1	0	0
0	1	0	0	0	0
0	0	0	0	0	0
0	0	1	0	0	0
1	0	0	0	0	0
0	0	0	0	0	1
0	0	0	0	1	0
0	1	0	0	0	0

yellow_simple_contrast	wenge_simple_contrast	orange_simple_contrast	green_simple_contrast	magenta_simple_contrast	blue_simple_contrast
1	0	0	0	0	0
-1	-1	-1	-1	-1	-1
0	0	0	1	0	0
0	1	0	0	0	0
0	0	1	0	0	0
-1	-1	-1	-1	-1	-1
1	0	0	0	0	0
0	0	0	0	0	1
0	0	0	0	1	0
0	1	0	0	0	0

yellow_dummy	wenge_dummy	red_dummy	green_dummy	magenta_dummy	magenta_dummy
1	0	0	0	0	0
0	0	1	0	0	0
0	0	0	1	0	0
0	1	0	0	0	0
0	0	0	0	0	0
0	0	1	0	0	0
1	0	0	0	0	0
0	0	0	0	0	1
0	0	0	0	1	0
0	1	0	0	0	0

yellow_simple_contrast	wenge_simple_contrast	orange_simple_contrast	green_simple_contrast	magenta_simple_contrast	blue_simple_contrast
1	0	0	0	0	0
-1	-1	-1	-1	-1	-1
0	0	0	1	0	0
0	1	0	0	0	0
0	0	1	0	0	0
-1	-1	-1	-1	-1	-1
1	0	0	0	0	0
0	0	0	0	0	1
0	0	0	0	1	0
0	1	0	0	0	0

joshuabragge/adegorical

Folders and files

Latest commit

History

Repository files navigation

Adegorical

Table of Contents

Getting Started

Encoding Methods

Dummy

Binary

Simple Contrast

Todo

Enhancements

Encoding Methods

Performance

Miscellaneous

Readme

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages

yellow_dummy	wenge_dummy	red_dummy	green_dummy	magenta_dummy	magenta_dummy
1	0	0	0	0	0
0	0	1	0	0	0
0	0	0	1	0	0
0	1	0	0	0	0
0	0	0	0	0	0
0	0	1	0	0	0
1	0	0	0	0	0
0	0	0	0	0	1
0	0	0	0	1	0
0	1	0	0	0	0

yellow_simple_contrast	wenge_simple_contrast	orange_simple_contrast	green_simple_contrast	magenta_simple_contrast	blue_simple_contrast
1	0	0	0	0	0
-1	-1	-1	-1	-1	-1
0	0	0	1	0	0
0	1	0	0	0	0
0	0	1	0	0	0
-1	-1	-1	-1	-1	-1
1	0	0	0	0	0
0	0	0	0	0	1
0	0	0	0	1	0
0	1	0	0	0	0