Skip to content

TrackPlotR is a useful tool for generating visualization track plots using various sequencing data.

License

Notifications You must be signed in to change notification settings

luoyiiii/TrackPlotR

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TrackPlotR

When analyzing Epigenome sequencing data, a common task is to visualize the signal intensity detected at different position of the genome, the distribution and structure of various cis-elements/genes in the corresponding genome region, and flexibly combine them together. UCSC Genome Browser and WashU Epigenome Browser are the greatest and most famous online tools that help to do the visualization task. However, exporting track plots generated by these online tools in a lossless and personalized manner can often be exhausting. Based on ggplot2, a powerful data visualization toolkit in R language, TrackPlotR is designed to generate attractive and high-quality genomic track plots using the most commonly employed data formats in a concise, elegant and highly customizable manner.

Installation

TrackPlotR can be installed from GitHub:

devtools::install_github('yimingsun12138/TrackPlotR')

Requirement

TrackPlotR was designed with the aim of retaining high scalability while minimizing dependency on other packages as much as possible. The packages listed below are necessary for the installation of TrackPlotR:

Document

TrackPlotR currently provides functions for visualizing signal coverage, genome features and transcripts. Meanwhile, TrackPlotR also provides build-in data to help get started quickly.

library(ggplot2)
library(rtracklayer)
library(TrackPlotR)
library(patchwork)

#build-in example data
example_path <- system.file('extdata',package = 'TrackPlotR')
list.files(example_path)
## [1] "Homo_sapiens.GRCh38.105.subset.gtf" "S1_ATAC_coverage.bw"               
## [3] "S1_ATAC_peak.bed"                   "S2_ATAC_coverage.bw"               
## [5] "S2_ATAC_peak.bed"                   "S3_ATAC_coverage.bw"               
## [7] "S3_ATAC_peak.bed"                   "S4_ATAC_coverage.bw"

The build-in data includes Tn5 insertion site coverage for 4 samples (ATAC_coverage.bw), ATAC-seq peak region for 3 samples, and a human gene annotation file (Ensembl release 105).

Coverage track plot

First load the BigWig file into R using load_BigWig and assign a sample name for each file:

coverage_info <- paste(example_path,c('S1_ATAC_coverage.bw','S2_ATAC_coverage.bw','S3_ATAC_coverage.bw','S4_ATAC_coverage.bw'),sep = '/')
coverage_info <- load_BigWig(file_path = coverage_info,file_name = c('S1','S2','S3','S4'))

Plot the coverage signal in the specified genome region:

region <- as('chr3:58510000-58690000','GRanges')
p1 <- coverage_vis_region(coverage_table = coverage_info,region = region,y_lim = 0.03,sample_order = c('S1','S2','S3','S4'),col_pal = c("#D51F26","#272E6A","#208A42","#89288F"))
print(p1 + theme(aspect.ratio = 0.1))

coverage plot region

Sometimes we are interested in the chromatin accessibility landscape around a certain gene, and TrackPlotR provides a quick method for the visualization task:

human_anno <- paste(example_path,'Homo_sapiens.GRCh38.105.subset.gtf',sep = '/')
human_anno <- rtracklayer::import(con = human_anno,format = 'gtf')
p1 <- coverage_vis_gene(coverage_table = coverage_info,gene_anno = human_anno,column_name = 'gene_name',gene = 'FAM107A',up_extend = 50000,down_extend = 50000,y_lim = 0.03,sample_order = c('S1','S2','S3','S4'),col_pal = c("#D51F26","#272E6A","#208A42","#89288F"))
print(p1 + theme(aspect.ratio = 0.1))

coverage plot gene

Feature track plot

TrackPlotR also provides methods for visualizing cis-elements (features) distribution in the specified genome region or around a certain gene. These cis-elements can be peak list, enhancers, human accelerated regions (HARs) and so on, and should be provided in the bed or GRanges/GRangesList format.

#load peak files and assign sample name
peak_info <- paste(example_path,c('S1_ATAC_peak.bed','S2_ATAC_peak.bed','S3_ATAC_peak.bed'),sep = '/')
names(peak_info) <- c('S1','S2','S3')
#feature plot in the specified genome region
p2 <- feature_vis_region(features = peak_info,region = region,col_pal = c("#D51F26","#272E6A","#208A42"))
print(p2 + theme(aspect.ratio = 0.2))

feature plot region

#feature plot around a certain gene
p2 <- feature_vis_gene(features = peak_info,gene_anno = human_anno,column_name = 'gene_name',gene = 'FAM107A',up_extend = 50000,down_extend = 50000,col_pal = c("#D51F26","#272E6A","#208A42"))
print(p2 + theme(aspect.ratio = 0.2))

feature plot gene

Transcript track plot

TrackPlotR also provides methods for visualizing transcripts/genes structure and distribution in the specified genome region or around a certain gene. Gene annotation file can be provided in the GTF or GRanges format and must contain 3 columns: type, gene_id and transcript_id.

#gene plot in the specified genome region
p3 <- transcript_vis_region(gene_anno = human_anno,region = region,display_by = 'gene',show_name = 'gene_id')
print(p3 + theme(aspect.ratio = 0.2))

transcript plot region

#plot a certain gene only
p3 <- gene_track_vis(gene_anno = human_anno,column_name = 'gene_name',gene = 'FAM107A',up_extend = 50000,down_extend = 50000,show_only = TRUE)
print(p3 + theme(aspect.ratio = 0.2))

transcript plot gene

In addition to be displayed by gene, the gene annotation file can also be displayed on a transcript basis.

#display by transcript
p3 <- transcript_vis_region(gene_anno = human_anno,region = region,display_by = 'transcript',show_name = 'transcript_name')
print(p3 + theme(aspect.ratio = 0.5))

transcript plot by transcript

Linkage track plot

What are the interactive relationships between these cis-elements on genome? For example, the activation of enhancer to promoter, co-accessible networks among cis-regulatory elements, etc. TrackPlotR also provides methods for linkage track visualization in the specified genome region or around a certain gene.

#load linkage table and convert to GRanges object
linkage <- paste(example_path,c('peak2gene_linkage.csv'),sep = '/')
linkage <- read.csv(file = linkage,header = TRUE,row.names = 1)
linkage <- linkage_table_to_GRanges(linkage = linkage)
#linkage track plot in the specified genome region
p4 <- linkage_vis_region(linkage = linkage,region = region,color_by = 'Correlation')
print(p4 + theme(aspect.ratio = 0.15))

linkage plot region

#linkage plot around a certain gene
p4 <- linkage_vis_gene(linkage = linkage,gene_anno = human_anno,column_name = 'gene_name',gene = 'FAM107A',up_extend = 50000,down_extend = 50000,color_by = 'Correlation')
print(p4 + theme(aspect.ratio = 0.15))

linkage plot gene

Plotting style

Usually, we would like to observe different track plots together, so it is particularly important to set a uniform style for different track plots. Currently, TrackPlotR has only two build-in plotting styles, and more styles will be added in the future.

p1 <- coverage_vis_region(coverage_table = coverage_info,region = region,y_lim = 0.03,sample_order = c('S1','S2','S3','S4'),col_pal = c("#D51F26","#272E6A","#208A42","#89288F"),style = 'style_1')
p2 <- feature_vis_region(features = peak_info,region = region,col_pal = c("#D51F26","#272E6A","#208A42"),style = 'style_1')
p3 <- linkage_vis_region(linkage = linkage,region = region,color_by = 'Correlation',style = 'style_1')
p4 <- transcript_vis_region(gene_anno = human_anno,region = region,display_by = 'gene',show_name = 'gene_name',style = 'style_1')

Thanks to the high flexibility and scalability of ggplot2, we can make more personalized modifications to the images returned by TrackPlotR and stitch them together easily.

p1 <- p1 + theme(axis.ticks.x = element_blank(),axis.text.x = element_blank(),axis.title.x = element_blank())
p2 <- p2 + theme(axis.ticks.x = element_blank(),axis.text.x = element_blank(),axis.title.x = element_blank(),legend.position = 'none')
p3 <- p3 + theme(axis.ticks.x = element_blank(),axis.text.x = element_blank(),axis.title.x = element_blank())
print(p1 + p2 + p3 + p4 + plot_layout(ncol = 1,heights = c(0.5,0.1,0.1,0.15)))

style_1 plot

All borders are removed in 'style_2', making it suitable for visualization with greater width.

#visualize with style_2
p1 <- coverage_vis_region(coverage_table = coverage_info,region = region,y_lim = 0.03,sample_order = c('S1','S2','S3','S4'),col_pal = c("#D51F26","#272E6A","#208A42","#89288F"),style = 'style_2')
p2 <- feature_vis_region(features = peak_info,region = region,col_pal = c("#D51F26","#272E6A","#208A42"),style = 'style_2')
p3 <- linkage_vis_region(linkage = linkage,region = region,color_by = 'Correlation',style = 'style_2')
p4 <- transcript_vis_region(gene_anno = human_anno,region = region,display_by = 'gene',show_name = 'gene_name',style = 'style_2')

p1 <- p1 + theme(axis.title.x = element_blank())
p2 <- p2 + theme(axis.line.x = element_blank(),axis.title.x = element_blank())
p3 <- p3 + theme(axis.line.x = element_blank(),axis.title.x = element_blank())
print(p1 + p2 + p3 + p4 + plot_layout(ncol = 1,heights = c(0.5,0.1,0.1,0.15)))

style_2 plot

Highlight region

Sometimes, we want to emphasize our discoveries by highlighting specific genome regions, which can be conveniently achieved using the highlight_region function. By simply providing the highlight_region function with the track plot (as a ggplot object) and the genome region we are interested in, we can obtain a ggplot object with added background highlight.

#highlight region
p1 <- coverage_vis_region(coverage_table = coverage_info,region = region,y_lim = 0.03,sample_order = c('S1','S2','S3','S4'),col_pal = c("#D51F26","#272E6A","#208A42","#89288F"),style = 'style_1')
p2 <- feature_vis_region(features = peak_info,region = region,col_pal = c("#D51F26","#272E6A","#208A42"),style = 'style_1')
p3 <- linkage_vis_region(linkage = linkage,region = region,color_by = 'Correlation',style = 'style_1')
p4 <- transcript_vis_region(gene_anno = human_anno,region = region,display_by = 'gene',show_name = 'gene_name',style = 'style_1')

p1 <- highlight_region(gg_object = p1,start = 58622610,end = 58632610)
p2 <- highlight_region(gg_object = p2,start = 58622610,end = 58632610)
p3 <- highlight_region(gg_object = p3,start = 58622610,end = 58632610)
p4 <- highlight_region(gg_object = p4,start = 58622610,end = 58632610)

p1 <- p1 + theme(axis.ticks.x = element_blank(),axis.text.x = element_blank(),axis.title.x = element_blank())
p2 <- p2 + theme(axis.ticks.x = element_blank(),axis.text.x = element_blank(),axis.title.x = element_blank(),legend.position = 'none')
p3 <- p3 + theme(axis.ticks.x = element_blank(),axis.text.x = element_blank(),axis.title.x = element_blank())
print(p1 + p2 + p3 + p4 + plot_layout(ncol = 1,heights = c(0.5,0.1,0.1,0.15)))

highlight region plot

Display mode

Feature track plot and transcript track plot can be displayed with different row compression densities.

# full transcripts
p3 <- transcript_vis_region(gene_anno = human_anno,region = region,display_by = 'gene',show_name = 'gene_name',display_mode = 'full')
print(p3 + theme(aspect.ratio = 0.3))

full transcript plot

In the full mode, each transcript/gene will occupy a single row.

# squish transcripts
p3 <- transcript_vis_region(gene_anno = human_anno,region = region,display_by = 'gene',show_name = 'gene_name',display_mode = 'squish')
print(p3 + theme(aspect.ratio = 0.2))

squish transcript plot

In the squish mode, transcripts/genes that do not overlap with each other can occupy the same row.

# collapse transcripts
p3 <- transcript_vis_region(gene_anno = human_anno,region = region,display_by = 'gene',show_name = NULL,display_mode = 'collapse')
print(p3 + theme(aspect.ratio = 0.1))

collapse transcript plot

# collapse features
p2 <- feature_vis_region(features = peak_info,region = region,col_pal = c("#D51F26","#272E6A","#208A42"),collapse_features = TRUE,overlap_col = 'darkgrey')
print(p2 + theme(aspect.ratio = 0.1))

collapse feature plot

In the collapse mode, all transcripts/genes and features will be plotted in the same row.

Planning

  • Pile-Up plot using sequencing fragments
  • Hi-C support
  • Chromosome plot

About

TrackPlotR is a useful tool for generating visualization track plots using various sequencing data.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • R 100.0%