Skip to content

Computational prediction of patients diagnosis and feature selection applied to a mesothelioma dataset

License

Notifications You must be signed in to change notification settings

davidechicco/mesothelioma-diagnosis-predictions

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

75 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Computational prediction of patients diagnosis and feature selection applied to a mesothelioma dataset

Computational prediction of patients diagnosis and feature selection applied to a mesothelioma dataset

Installation

To run the scripts, you need to have installed:

  • R (version 3.3.2)
  • R packages rgl, clusterSim and randomForest
  • Python 3
  • Python package xlsx2csv
  • git (version 1.8.3.1)
  • Torch (version 7)
  • LuaRocks (version 2.3.0)

You need to have root privileges, an internet connection, and at least 1 GB of free space on your hard disk. We here provide the instructions to install all the needed programs and dependencies on Linux CentOS, Linux Ubuntu, and Mac OS. Our scripts were originally developed on a Linux Ubuntu computer.

Dependency installation for Linux Ubuntu

Here are the instructions to install all the programs and libraries needed by our scripts on a Linux Ubuntu computer, from a shell terminal. We tested these instructions on a Dell Latitude 3540 laptop, running Linux Ubuntu 16.10 operating system, and having a 64-bit kernel, in February 2017. If you are using another operating system version, some instructions might be slightly different.

(Optional) First of all, update:
sudo apt-get update

Install R and its rgl, clusterSim, randomForest packages:
sudo apt-get -y install r-base-core
sudo apt-get -y install r-cran-rgl
sudo Rscript -e 'install.packages(c("rgl", "clusterSim", "randomForest"), repos="https://cran.rstudio.com")'

Install xlsx2csv and git:
sudo apt-get -y install xlsx2csv
sudo apt-get -y install git

Install Torch and luarocks:
# in a terminal, run the commands WITHOUT sudo
git clone https://github.com/torch/distro.git ~/torch --recursive
cd ~/torch; bash install-deps;
./install.sh

source ~/.bashrc
cd ~

sudo apt-get -y install luarocks
sudo luarocks install csv

Dependency installation for Linux CentOS

Here are the instructions to install all the programs and libraries needed by our scripts on a Linux CentOS computer, from a shell terminal. We tested these instructions on a Dell Latitude 3540 laptop, running Linux Ubuntu 16.10 operating system, and having a 64-bit kernel, in February 2017. If you are using another operating system version, some instructions might be slightly different.

(Optional) First of all, update:
sudo yum -y update

Install R, its dependencies, and is rgl, clusterSim, randomForest packages:
sudo yum -y install R
sudo yum -y install mesa-libGL
sudo yum -y install mesa-libGL-devel
sudo yum -y install mesa-libGLU
sudo yum -y install mesa-libGLU-devel
sudo yum -y install libpng-devel
sudo Rscript -e 'install.packages(c("rgl", "clusterSim", "randomForest"), repos="https://cran.rstudio.com")'

Install Python, its dependencies, and its packages pip and xlsxcsv:
sudo yum -y install python
sudo yum -y install epel-release
sudo yum -y install python-pip
sudo pip install xlsx2csv

Install Torch and luarocks:
sudo apt-get -y install git
# in a terminal, run the commands WITHOUT sudo
git clone https://github.com/torch/distro.git ~/torch --recursive
cd ~/torch; bash install-deps;
./install.sh

source ~/.bashrc
cd ~

sudo yum -y install luarocks
sudo luarocks install csv

Dependency installation for Mac OS

Here are the instructions to install all the programs and libraries needed by our scripts on a Mac computer, from a shell terminal. We tested these instructions on an Apple computer running a Mac OS macOS 10.12.2 Sierra operating system, in March 2017. If you are using another operating system version, some instructions might be slightly different.

(Optional) First of all, update:
sudo softwareupdate -iva

Manually download and install XQuartz from https://www.xquartz.org

Install R and its packages:
brew install r
sudo Rscript -e 'install.packages(c("rgl”, "clusterSim”, "randomForest”), repos="https://cran.rstudio.com")'

Install rudix:
curl -O https://raw.githubusercontent.com/rudix-mac/rpm/2016.12.13/rudix.py
sudo python rudix.py install rudix

Install the development tools (such as gcc):
xcode-select --install

Install xlsx2csv:
sudo easy_install xlsx2csv

Install Torch and laurocks:
git clone https://github.com/torch/distro.git ~/torch --recursive
cd ~/torch; bash install-deps
./install.sh
cd ~

brew install lua
source ~/.profile
sudo luarocks install csv

Dataset preparation

Move to the project main directory, then use the script to download the mesothelioma dataset file, normalize the columns, and remove the "diagnosis method" feature (that is a duplicate of the target feature "class of diagnosis"):

cd /mesothelioma-diagnosis-predictions/

./script_prepare_dataset.sh

Execution for all (Linux Ubuntu, Linux CentOS, and Mac)

Diagnosis prediction

To run the Torch software of the perceptron-based artificial neural network:
th mesothelioma_ann_script_val.lua

To run the Python 3 software of the probabilistic neural network:
python3 pnn_mesothelioma_initial_py3.py

To run the R software of the random forest classifier:
Rscript random_forests_class.r

To run the R software of the CART classifier:
Rscript cart.r

To run the R software of the onre rule classifier:
Rscript oner_class.r

Feature selection

To run the random forest R code for feature selection:
Rscript random_forests.r Mesothelioma_data_set_COL_NORM.csv

Reference

More information about this project can be found on this paper:

Davide Chicco, and Cristina Rovelli, "Computational prediction of diagnosis and feature selection on mesothelioma patient health records", PLoS ONE 14(1): e0208737, 2019. https://doi.org/10.1371/journal.pone.0208737

License

All the software code is licensed under the GNU General Public License, version 2 (GPLv2).
The mesothelioma dataset is publically available on the website of the University of California Irvine Machine Learning Repository, under its copyright license.

Contacts

This sofware was developed by Davide Chicco at the Princess Margaret Cancer Centre and at the Peter Munk Cardiac Centre (Toronto, Ontario, Canada).
For questions or help, please write to davidechicco(AT)davidechicco.it

About

Computational prediction of patients diagnosis and feature selection applied to a mesothelioma dataset

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published