Practical K-Means Clustering in Python

The Jupyter notebooks in this directory follow the code examples in Real Python's Practical K-Means Clustering in Python article. The article is structured such that there are two main sections with code. The first section works with synthetic data. The second section starts when the TCGA cancer gene expression dataset is introduced.

Getting Started

Follow the instructions below to get up and running with a Jupyter notebook and all the code from the article.

Install Dependencies

These notebooks have dependencies. One way to install these dependencies is to use the Anaconda Python distribution.

(base) $ conda install jupyter matplotlib numpy pandas seaborn scikit-learn
(base) $ conda install -c conda-forge kneed

You can also install all the requirements using pip and the requirements.txt file included in this directory.

$ python3 -m pip install -r requirements.txt

Synthetic Data Notebook

Open the notebook that accompanies the sections of the article that work with synthetic data:

(base) $ jupyter notebook practical-kmeans-synthetic.ipynb

Cancer Gene Expression Data Notebook

Open the notebook that accompanies the sections of the article that work with TCGA cancer gene expression data:

(base) $ jupyter notebook practical-kmeans-cancer-gene-expression.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Practical K-Means Clustering in Python

Getting Started

Install Dependencies

Synthetic Data Notebook

Cancer Gene Expression Data Notebook

Files

README.md

Latest commit

History

README.md

File metadata and controls

Practical K-Means Clustering in Python

Getting Started

Install Dependencies

Synthetic Data Notebook

Cancer Gene Expression Data Notebook