The Jupyter notebooks in this directory follow the code examples in Real Python's Practical K-Means Clustering in Python article. The article is structured such that there are two main sections with code. The first section works with synthetic data. The second section starts when the TCGA cancer gene expression dataset is introduced.
Follow the instructions below to get up and running with a Jupyter notebook and all the code from the article.
These notebooks have dependencies. One way to install these dependencies is to use the Anaconda Python distribution.
(base) $ conda install jupyter matplotlib numpy pandas seaborn scikit-learn
(base) $ conda install -c conda-forge kneed
You can also install all the requirements using pip
and the requirements.txt
file included in this directory.
$ python3 -m pip install -r requirements.txt
Open the notebook that accompanies the sections of the article that work with synthetic data:
(base) $ jupyter notebook practical-kmeans-synthetic.ipynb
Open the notebook that accompanies the sections of the article that work with TCGA cancer gene expression data:
(base) $ jupyter notebook practical-kmeans-cancer-gene-expression.ipynb