-
Build docker image:
docker build -t pyspark:latest -f Dockerfile .
-
To start the docker in interactive mode: Run
docker run -p 8888:8888 -it pyspark /bin/bash
-
To run notebook within docker: run
jupyter notebook
. -
More details can be found at: https://github.com/jupyter/docker-stacks
Info about Flint: https://github.com/twosigma/flint/tree/master/python Follow the instructions to install
Info about MNE: https://mne.tools/stable/overview/cookbook.html
Use conda
or anaconda
:
- install
anaconda
- in
anaconda prompt
, navigate toLearnFromSleep
project.conda env create -f environment.yaml
- activate environment:
conda activate pysleep
PyCharm
IDE provides handy features to accelerate code development.
For the deep neural network models, it's recommended to run with GPU.
To check is GPU is utilized:
import tensorflow as tf tf.config.list_physical_devices('GPU')
If the output is []
, that means GPU is not used. If GPU exists and is utilized, output will look something like this:
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
High level code organization:
.
├── data
├── Dockerfile
├── docs
├── environment.yaml
├── playground
├── __pycache__
├── pysleep
├── README.md
├── saved_models
├── setup.py
data
folder: raw data and processed data.
docs
: documentaion and references
pysleep
: python package developed for this project
saved_models
: trained model, saved for reuse.
setup.py
: installation script for pysleep
package.
Dockerfile
: dockerfile to created isolated dev environment if choose to use Option 1 Docker to set up dev environment.
environment.yaml
: configuration file for conda
if choose to use Option 2 Conda to set up dev environment.
playground
: scripts for eda, run models, plot etc.
README.md
: quick introduction.
A more detailed view of code structure (files may change):
.
├── data
│ ├── physionet_sleep
│ ├── sleep-edf-database-expanded-1.0.0
│ └── sleep-edf-database-expanded-1.0.0.zip
├── Dockerfile
├── docs
├── CSE6250_project_2020Fall.pdf
│ └── Team38_LearningFromSleepData.pdf
├── environment.yaml
├── models.py
├── playground
│ ├── baseline_model.py
│ ├── eda.py
│ ├── experimnents.py
│ ├── explore_models.py
│ ├── RandomForest.ipynb
│ ├── LoadData.ipynb
│ ├── prepare_train_test_dataset.py
│ └── try_data_loader.py
├── pysleep
│ ├── data.py
│ ├── dhedfreader.py
│ ├── __init__.py
│ ├── models.py
│ ├── prepare_physionet.py
│ └── __pycache__
├── README.md
├── reference
│ └── deepsleepnet-master
├── saved_models
│ ├── base_dnn_model.h5
│ └── dnn_model2.h5
├── setup.py
Then entry point is the playground
folder. RandomForest.ipynb
contains initial exploration using random forest and nearest neighbors.
sleep_project_optimized.py
contains model finetuning and furture feature engineeering.
deeplearnibg
folder contains explorations with CNN and LSTM models.
- PhysioNet Online data viewer: https://archive.physionet.org/cgi-bin/atm/ATM
Papers using this dataset:
-
Huy Phan, Fernando Andreotti, Navin Cooray, Oliver Y. Chén, and Maarten De Vos. Joint Classification and Prediction CNN Framework for Automatic Sleep Stage Classification. IEEE Transactions on Biomedical Engineering, vol. 66, no. 5, pp. 1285-1296, 2019 https://github.com/pquochuy/MultitaskSleepNet
Both EDF and EDF+ formats are free and can be viewed using free software such as:
- ISRUC_Sleep https://sleeptight.isr.uc.pt/ISRUC_Sleep/
Polyman (for MS-Windows only; for details, please follow the link) EDFbrowser (for Linux, Mac OS X, and MS-Windows; at www.teuniz.net) LightWAVE and the PhysioBank ATM, platform-independent web applications from PhysioNet WAVE and other applications for Linux, Mac OS X, and MS-Windows in the WFDB Software Package, also from PhysioNet
Transform EEG data into numpy array or pandas DataFrame: https://github.com/Zhao-Kuangshi/sleep-edf-converter/blob/master/annotation_convertor.py
Note: I have trouble processing this file:
sleep-edf-database-expanded-1.0.0\sleep-cassette\SC4362F0-PSG.edf
https://paperswithcode.com/task/sleep-stage-detection https://github.com/SuperBruceJia/EEG-DL
https://github.com/akaraspt/deepsleepnet
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0216456
https://github.com/MousaviSajad/SleepEEGNet
This Medium article presents a simple CNN model that is easy to reproduce. https://towardsdatascience.com/sleep-stage-classification-from-single-channel-eeg-using-convolutional-neural-networks-5c710d92d38e https://github.com/CVxTz/EEG_classification
For single sleep epoch (30s by 100Hz = 3000 data points), need to encode into a vector/tensor. For sleep epoch sequences (for example, 8H sleep represented by multiple 30s epochs), output a sequence of categories.
CNN, SVM, or other classification models for single sleep epoch encoding. LSTM for sequence classification.
In command line:
tensorboard --logdir playground\logs\fit
- Find dataset
- Data ETL (raw dataset to numpy arrays)
- raw dataset to numpy arrays
- data + lables
- train, val, test split
- Feature engineering
- automatic FE? (i.e. use CNN to extract features)
- handcraft features
- normalization?
- data augmentation? (i.e. add noises)
- Data imbalance
- SMOTE
- oversampling,
- Explore model archicture
- conventional ML models, SVM, tree based models
- Deep networks, CNN, RNN, LSTM
- Visualize
- Write report or paper if time permits google doc link: https://docs.google.com/document/d/1fRVr9LfeSHq5dsV8gXI2MyE2kPCZxkOKsmP9IM1i88U/edit#