GitHub - shahcompbio/scvis at fe42d424714c6b65c9f6e22981749439f43202fb

shahcompbio / scvis Public

Notifications You must be signed in to change notification settings
Fork 5
Star 16

Python package for dimension reduction of high-dimensional biological data.

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
lib/scvis		lib/scvis
LICENCE		LICENCE
README.txt		README.txt
scvis		scvis
setup.py		setup.py

Repository files navigation

scvis is a python package for dimension reduction of high-dimensional biological data, especially single-cell RNA-sequencing data.


# License

scvis is free for academic/non-profit use.


# Versions

## 0.1.0


# Installation

To install scvis, please make sure you have the necessary libraries (below) installed.
After that scvis can be installed from terminal: python setup.py install

Dependencies:

tensorflow >= 1.1
PyYAML >= 3.11
matplotlib >= 1.5.1
numpy >= 1.11.1
pandas >= 0.19.1


# How to use

After installing scvis, you can use the scvis command.

1, the 'train' function can be used to learn a probabilistic parametric mapping (the directories of the files should change based on their actual positions in the computer system):
scvis train --data_matrix_file ./data/bipolar_pca100.tsv \
    --out_dir ./output/bipolar \
    --data_label_file ./data/bipolar_label.tsv \
    --verbose_interval 50

--data_matrix_file: a high-dimensional data matrix with the first row as the column names, in the tab delimited format. Each row represents a data point, e.g., the expression profile of a cell.
--out_dir: path for output files
--data_label_file: a one column file (with column header) provides the corresponding cluster information for each data point, just used for coloring scatter plots
--verbose_interval: the mini-bach interval to show running information

A trained model is saved in the folder ./output/bipolar/model/
In addition to the model file, the low-dimensional embedding and the log-likelihoods are also written to two files in ./output/bipolar,
and are shown as two scatter plots colored by the given label information and the log-likelihoods (the log-likelihood files are names as *_log_likelihood.tsv and *_log_likelihood.png).
The different components of the objective function are also saved to a file (*_obj.tsv) and shown in a graph (*_obj.png).
If you want to plot intermediate embeddings during optimizations, you can set the flag: --show_plot=True
By default, the data_matrix_file is normalized by the maximum absolute value (--normalize=True).


2, after learning a probabilistic parametric mapping, the 'map' function can be used to embed new data to an existing embedding:
scvis map --data_matrix_file ./data/retina_pca100_bipolar_normalized.tsv \
    --out_dir ./output/retina \
    --pretrained_model_file ./output/bipolar/model/xxx.ckpt

--data_matrix_file: a high-dimensional data matrix with the first row as the column names, in tab delimited format
--out_dir: path for output files
--pretrained_model_file: a pre-trained scvis model by calling the 'scvis train', where 'xxx' should be replaced by the checkpoint file prefix in the model folder.

As for calling the 'train' command, this command will also output the likelihood files and the low-dimensional embedding files, but without the model files and the objective function trace file and plots.

The data matrix files for calling both 'train' and 'map' should be normalized similarly.

For both train and map, if you want to use your own config file, you can pass the config file as a parameter with flag: --config_file
The default config file is in scvis/config/model_config.yaml