Skip to content

Using Apache Spark MLlib clustering library to explore data sets via unsupervised machine learning techniques. Scripts to determine optimal number of clusters; compare performance between three clustering algorithms via consensus clustering; and finally running using a trick to examine feature importance for determining cluster via Random Forest…

Notifications You must be signed in to change notification settings

tcampbell90/SparkConsensusClustering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Credit to Konur Unyelioglu for his code & explanation of consensus clustering that helped me apply this project in real life. His original article Link to git repo for consensus clustering

SparkConsensusClustering

Using Apache Spark MLlib clustering library to explore data sets via unsupervised machine learning techniques. Scripts to determine optimal number of clusters; compare performance between three clustering algorithms via consensus clustering; and finally running using a trick to examine feature importance for determining cluster via Random Forest feature importance vector.

About

Using Apache Spark MLlib clustering library to explore data sets via unsupervised machine learning techniques. Scripts to determine optimal number of clusters; compare performance between three clustering algorithms via consensus clustering; and finally running using a trick to examine feature importance for determining cluster via Random Forest…

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages