nJSD is a python package for calculating distance between two biological networks instantiated with gene-expression profiles using entropy concept. It was designed to measure intratumor heterogeneity from bulk RNA-sequencing data. Transcriptome-based ITH (tITH) of tumor state was calculated by considering both normal state and ideally heterogeneous state.
pip install njsd
nJSD supports command-line invocation as below:
usage: njsd [-h] -n NETWORK -r REF -q QUERY -o OUTPUT [-t GENESET]
Calculate network-based Jensen-Shannon Divergence.
optional arguments:
-h, --help show this help message and exit
-n NETWORK, --network NETWORK
Pre-defined network
-r REF, --ref REF Reference gene expression profile
-q QUERY, --query QUERY
Query gene expression profile
-o OUTPUT, --output OUTPUT
Output file.
-t GENESET, --geneset GENESET
Gene set list
Note that -t GENESET
option is optional. If -t
option is specified, gene set-specified nJSD and tITH will be computed. Otherwise, njsd
will compute transcriptome-wide nJSD of the two expression profiles and tITH of query gene expression profile.
Network file, which should be given with -n/--network
option must be formatted as below where each line specifies an edge in the network. njsd
will simply ignore the header by skipping a single line, so you may name each column in a human-friendly way:
GeneA GeneB # Header
GeneSymbol1 GeneSymbol2
GeneSymbol1 GeneSymbol3
GeneSymbol1 GeneSymbol4
...
Gene expression profile file must follow the format below. Again, the header doesn't matter. Note that njsd
will automatically apply log2-transformation to expression values by taking log2(expression + 1), we recommend giving njsd
unnormalized expression values, such as raw FPKM, RPKM or TPM.
GeneSymbol ExpressionValue # Header
GeneA 10
GeneB 20
BeneC 30
...
Gene set list file must follow the format below. Please be warned that this file should not have a header. The first column denotes names of each gene set(or group), and the following columns represent the member of each group.
Group1Name GeneA GeneB GeneC ...
Group2Name GeneD GeneE GeneF ...
Group3Name GeneA GeneG GeneH ...
...
When the gene set of reference GEP is differ to gene set of query GEP file and geneset list file. The difference is dumped into a file with name "dumpgene+date".
In the example directory, there are toy data.
example:
python nJSD.py whole -n example/Toy.network -r example/Toy.profile1 -i example/Toy.profile2
result:
example/Toy.profile2 [Ref. -> Query: 0.003935] [Query -> stateH: 0.006820] <tITH: 0.365869>
example:
python nJSD.py geneset -n example/Toy.network -r example/Toy.profile1 -i example/Toy.profile2 -t example/Toy.geneset
result:
example/Toy.profile2 1st_pwy [Ref. -> Query: 0.007822] [Query -> stateH: 0.009385] <tITH: 0.454582>
example/Toy.profile2 3rd_pwy [Ref. -> Query: 0.005215] [Query -> stateH: 0.007102] <tITH: 0.423379>
example/Toy.profile2 2nd_pwy [Ref. -> Query: 0.000000] [Query -> stateH: 0.004261] <tITH: 0.000000>
example/Toy.profile2 4th_pwy [Ref. -> Query: 0.007909] [Query -> stateH: 0.004261] <tITH: 0.649850>
example/Toy.profile2 5th_pwy [Ref. -> Query: 0.004470] [Query -> stateH: 0.012175] <tITH: 0.268536>
Y. Park, S. Lim, J. Nam, S. Kim, Measuring intratumor heterogeneity by network entropy using RNA-seq data, Scientific Reports (2016)