Skip to content

Commit

Permalink
Merge branch 'master' of https://github.com/MigleSur/GenAPI
Browse files Browse the repository at this point in the history
  • Loading branch information
Migle Survilaite committed Oct 25, 2018
2 parents efd9b1a + 5d612a1 commit a72767d
Showing 1 changed file with 28 additions and 4 deletions.
32 changes: 28 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,23 @@
# GenAPI

GenAPI is a program for gene presence absence table generation for series of closely related bacterial genomes from annotated FASTA file. For genome annotation [prokka](https://github.com/tseemann/prokka) could be used.
GenAPI is a program for gene presence absence table generation for series of closely related bacterial genomes from annotated GFF files. For genome annotation [prokka](https://github.com/tseemann/prokka) could be used.

### Purpose

Initially the program was written as an alternative for [Roary](http://sanger-pathogens.github.io/Roary/). It was used for the analysis of the bacterial isolates from the same source taken over time, thus the isolates were very closely related. The program performed well even with minor differences between samples and managed to identify them.
Initially the program was written as an alternative for [Roary](http://sanger-pathogens.github.io/Roary/) for incomplete very closely related bacteria genomes. It was used for the analysis of the bacterial isolates from the same source taken over time, thus the isolates were very closely related. The program performed well even with minor differences between samples and managed to identify them.

### Versions of software it was tested against

Before running the program make sure that the following programs are installed and added to the path: <br/>
[BLAST >=2.6.0+](https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download) <br/>
[CD-HIT >=4.6.1](http://weizhongli-lab.org/cd-hit/) <br/>
[Bedtools >=2.26](http://bedtools.readthedocs.io/en/latest/) <br/>
Requirements for optional visualizations: <br/>
Heatmaps: <br/>
[R >=3.2.5](https://www.r-project.org/) <br/>
[pheatmap >=1.0.10](https://cran.r-project.org/web/packages/pheatmap/pheatmap.pdf) <br/>
Phylogenetic tree: <br/>
[RAxML](https://sco.h-its.org/exelixis/web/software/raxml/index.html) <br/>

### Installation

Expand All @@ -37,7 +43,7 @@ genapi [options] [-n <analysis name>]

### Input

Annotated contig/scaffold GFF files of the chosen samples have to be placed in the directory from which the program is being run.
Annotated contig/scaffold/genome GFF files of the chosen samples have to be placed in the directory from which the program is being run. GenAPI uses unique IDs from the GFF file as annotations, so it would be easy to trace back the gene of interest.

### Usage

Expand All @@ -63,10 +69,25 @@ Usage: genapi [options] [--name <analysis name>]
-l, --geneLen Minimum gene length. Shorter than the threshold genes are
excluded from the analysis.
Default: 150
-t, --tree Create a phylogenetic tree using gene presence-absence
matrix. Requires RAxML to be installed.
Default: False
-m, --matrix Create a gene presence-absence matrix visualization. Requires
Rscript and pheatmap library to be installed.
Default: False
-v, --version Print the tool version.
-h, --help Print this message.
```
First minimum alignment length threshold and minimum identity threshold are used as a pair. The same goes for the second pair. It is not advised to change those arguments unless there is a strong reason for doing that.
First minimum alignment length threshold and first minimum identity threshold are used as a pair. The same goes for the second pair. It is not advised to change those arguments unless there is a strong reason for doing that.

The minimum length requirement for the gene to be included in the gene presence-absence matrix is 150 bp, it can be changed but not recommended unless it was made sure that lower threshold is required.

### Disclaimer

GenAPI does not take into account gene duplicates, therefore if there are variable number of copies of the same gene in different samples, it will not be detected. GenAPI was developed to identify novel gene acquisition and complete gene deletion, therefore multiple gene copies were not included.

GenAPI does not take into account incomplete deletions in the genes. GenAPI was developed to identify gene deletions and acquisitions, not deletions and acquisitions within the genes. There are many excellent tools for deletion/acquisition within gene detection.

### Output

Expand All @@ -77,6 +98,9 @@ Output file name | Description
clustered_genes_[name].ffn | Pan-genome nucleotide sequences file
gene_presence_absence_[name].txt | Tab seperated gene presence/absence table file
sample_gene_stats | Each sample's best blast alignment statistics for all the pan-genome genes
phylogenetic_analysis | Phylogenetic tree output from RAxML (optional)
heatmap_plot_all_genes_[name].png | Heatmap visualization for all the pan-genome genes (optional)
heatmap_plot_variable_genes_[name].png | Heatmap visualization for lost or acquired genes (optional)

### Author

Expand Down

0 comments on commit a72767d

Please sign in to comment.