ExScope is a Python-based bioinformatics tool that enables users to visualize read counts at specific genomic regions and Ensembl transcript IDs. It counts raw reads at the user-specified region and generates plots. This tool is handy for identifying copy number variations (CNVs) and analyzing gene expression of a specific region on a chromosome.
- Read Count Visualization: Generates plots showing read counts across specified genomic regions.
- Normalization: Normalizes read counts for accurate comparisons (not added).
- Ensembl Transcript ID Support: Allows focused analysis on specific Ensembl transcript IDs.
To install ExScope, follow these steps:
-
Clone the repository:
git clone https://github.com/Taimoor-Khan-bt/exscope.git cd exscope
-
Install exscope as module with
pip
:pip install .
-
Install exscope as command-line tool with
pip
:pip install -e .
This will automatically install all dependencies specified in the setup.py
file.
ExScope requires the following tools and libraries:
- Python 3.6+
pysam
(for reading BAM files)matplotlib
(for plotting)pandas
(for data manipulation)scipy
(for clustering and dendrogram generation)argparse
(for command-line argument parsing)
These dependencies will be installed during the installation process.
You will need a BAM file containing aligned sequencing reads as input.
To provide exon annotations, you will need a GFF3 or GTF file. These files can be downloaded from Ensembl, UCSC Genome Browser, or other genomic databases. Ensure the file matches the reference genome used in your analysis.
- Ensembl GFF3 files: Ensembl FTP
- UCSC GTF files: UCSC Genome Browser
Specify the genomic region in the format chr:start-end
and provide the Ensembl transcript ID you wish to analyze.
Once installed, you can run ExScope from the command line. Basic usage:
exscope -b /path/to/your.bam -g /path/to/annotations.gff3 -r chr1:1000000-1050000 -tid ENST00000367770 -o /path/to/output_dir
-b, --bam
: Path to the input BAM file (required).-g, --gff3
: Path to the GFF3 file for exon annotations (required).-r, --region
: Genomic region in the formatchr:start-end
(required).-tid, --transcript_id
: Ensembl transcript ID for the region (required).-o, --output_dir
: Output directory for saving results (required).--plot_file
: Optional, specify the name of the output plot file (default:read_counts_plot.png
).
exscope -b sample.bam -g Homo_sapiens.GRCh38.104.gff3 -r chr1:150000-160000 -tid ENST00000367770 -o results/
This command will:
- Extract read counts from the BAM file for the specified region.
- Normalize the read counts.
- Generate a stacked area plot visualizing read counts across exons.
- Save the output plot and read counts to the specified output directory.
ExScope generates the following outputs:
-
Read Counts Plot: A PNG file visualizing read counts across the specified region.
-
Read Counts Text File: A text file listing read counts per position.
If you don’t have a GFF3 or GTF file, you can download one for your species and genome build from:
- Ensembl FTP: Ensembl FTP
- UCSC Genome Browser: UCSC Genome Browser
Ensure the file matches your reference genome.
If you encounter any issues, consider the following:
- Ensure that all input files (BAM, GFF3/GTF) are correctly formatted and correspond to the same reference genome.
- Verify that the Ensembl transcript ID matches the specified genomic region.
- Check the log output for warnings or errors.
For further assistance, raise an issue on the GitHub repository.
Contributions are welcome! Please fork the repository and submit a pull request. For major changes, open an issue to discuss your proposed modifications.
This project is licensed under the MIT License - see the LICENSE file for details.