Skip to content
forked from nicodv/kmodes

Implementation of the k-modes clustering algorithm for categorical data, and several of its variations.

License

Notifications You must be signed in to change notification settings

raleighgee/kmodes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

98 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Test Status Test Coverage Code Health

kmodes

Description

Python implementations of the k-modes and k-prototypes clustering algorithms. Relies on numpy for a lot of the heavy lifting.

k-modes is used for clustering categorical variables. It defines clusters based on the number of matching categories between data points. (This is in contrast to the more well-known k-means algorithm, which clusters numerical data based on Euclidean distance.) The k-prototypes algorithm combines k-modes and k-means and is able to cluster mixed numerical / categorical data.

Implemented are:

  • k-modes [1][2]
  • k-modes with initialization based on density [3]
  • k-prototypes [1]

The code is modeled after the k-means module in scikit-learn and has the same familiar interface.

Usage examples of both k-modes ('soybean.py') and k-prototypes ('stocks.py') are included.

I would love to have more people play around with this and give me feedback on my implementation.

Enjoy!

Usage

```python import numpy as np from kmodes import kmodes

# random categorical data data = np.random.choice(20, (100, 10))

km = kmodes.KModes(n_clusters=4, init='Huang', n_init=5, verbose=1) km.fit_predict(data) ```

References

[1] Huang, Z.: Clustering large data sets with mixed numeric and categorical values, Proceedings of the First Pacific Asia Knowledge Discovery and Data Mining Conference, Singapore, pp. 21-34, 1997.

[2] Huang, Z.: Extensions to the k-modes algorithm for clustering large data sets with categorical values, Data Mining and Knowledge Discovery 2(3), pp. 283-304, 1998.

[3] Cao, F., Liang, J, Bai, L.: A new initialization method for categorical data clustering, Expert Systems with Applications 36(7), pp. 10223-10228., 2009.

About

Implementation of the k-modes clustering algorithm for categorical data, and several of its variations.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%