ExScope - Visualizing Read Counts at Genomic Regions

ExScope is a Python-based bioinformatics tool that enables users to visualize read counts at specific genomic regions and Ensembl transcript IDs. It counts raw reads at the user-specified region and generates plots. This tool is handy for identifying copy number variations (CNVs) and analyzing gene expression of a specific region on a chromosome.

Features

Read Count Visualization: Generates plots showing read counts across specified genomic regions.
Normalization: Normalizes read counts for accurate comparisons (not added).
Ensembl Transcript ID Support: Allows focused analysis on specific Ensembl transcript IDs.

Installation

To install ExScope, follow these steps:

Clone the repository:

git clone https://github.com/Taimoor-Khan-bt/exscope.git
cd exscope

Install exscope as module with pip:
```
pip install .
```
Install exscope as command-line tool with pip:
```
pip install -e .
```

This will automatically install all dependencies specified in the setup.py file.

Prerequisites

ExScope requires the following tools and libraries:

Python 3.6+
pysam (for reading BAM files)
matplotlib (for plotting)
pandas (for data manipulation)
scipy (for clustering and dendrogram generation)
argparse (for command-line argument parsing)

These dependencies will be installed during the installation process.

Input Files

1. BAM File

You will need a BAM file containing aligned sequencing reads as input.

2. GFF3 or GTF File

To provide exon annotations, you will need a GFF3 or GTF file. These files can be downloaded from Ensembl, UCSC Genome Browser, or other genomic databases. Ensure the file matches the reference genome used in your analysis.

Ensembl GFF3 files: Ensembl FTP
UCSC GTF files: UCSC Genome Browser

3. Genomic Region and Transcript ID

Specify the genomic region in the format chr:start-end and provide the Ensembl transcript ID you wish to analyze.

Usage

Once installed, you can run ExScope from the command line. Basic usage:

exscope -b /path/to/your.bam -g /path/to/annotations.gff3 -r chr1:1000000-1050000 -tid ENST00000367770 -o /path/to/output_dir

Command-Line Arguments

-b, --bam: Path to the input BAM file (required).
-g, --gff3: Path to the GFF3 file for exon annotations (required).
-r, --region: Genomic region in the format chr:start-end (required).
-tid, --transcript_id: Ensembl transcript ID for the region (required).
-o, --output_dir: Output directory for saving results (required).
--plot_file: Optional, specify the name of the output plot file (default: read_counts_plot.png).

Example

exscope -b sample.bam -g Homo_sapiens.GRCh38.104.gff3 -r chr1:150000-160000 -tid ENST00000367770 -o results/

This command will:

Extract read counts from the BAM file for the specified region.
Normalize the read counts.
Generate a stacked area plot visualizing read counts across exons.
Save the output plot and read counts to the specified output directory.

Output

ExScope generates the following outputs:

Read Counts Plot: A PNG file visualizing read counts across the specified region.
Read Counts Text File: A text file listing read counts per position.

GFF3/GTF File Download

If you don’t have a GFF3 or GTF file, you can download one for your species and genome build from:

Ensembl FTP: Ensembl FTP
UCSC Genome Browser: UCSC Genome Browser

Ensure the file matches your reference genome.

Troubleshooting

If you encounter any issues, consider the following:

Ensure that all input files (BAM, GFF3/GTF) are correctly formatted and correspond to the same reference genome.
Verify that the Ensembl transcript ID matches the specified genomic region.
Check the log output for warnings or errors.

For further assistance, raise an issue on the GitHub repository.

Contributing

Contributions are welcome! Please fork the repository and submit a pull request. For major changes, open an issue to discuss your proposed modifications.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
ex-data		ex-data
exscope		exscope
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ExScope - Visualizing Read Counts at Genomic Regions

Features

Installation

Prerequisites

Input Files

1. BAM File

2. GFF3 or GTF File

3. Genomic Region and Transcript ID

Usage

Command-Line Arguments

Example

Output

GFF3/GTF File Download

Troubleshooting

Contributing

License

About

Releases

Packages

Languages

License

Taimoor-Khan-bt/exscope

Folders and files

Latest commit

History

Repository files navigation

ExScope - Visualizing Read Counts at Genomic Regions

Features

Installation

Prerequisites

Input Files

1. BAM File

2. GFF3 or GTF File

3. Genomic Region and Transcript ID

Usage

Command-Line Arguments

Example

Output

GFF3/GTF File Download

Troubleshooting

Contributing

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages