RNA sequencing (RNASeq) has become a method of choice for transcriptome profiling, however the analysis of the massive amount of data generated by large‐scale RNASq still remains a challenge. Typically RNA‐seq data analyses consist of (1) alignment of millions of short sequencing reads to a reference genome or de novo assembled transcriptome, including the identification of splicing events; (2) quantification of expression levels of genes, transcripts, and exons; (3) differential analysis of gene expression among different biological conditions; and (4) biological interpretation of those differentially expressed genes.
Various Bioinformatics pipeline exist for quantification of expression levels of genes/transcripts, which ultimately generate a matrix of read counts for genes/transcripts in given samples. This count matrix is then subsequently used for differential expression analysis using R/Bioconductor packages.
The workshop session will demonstrate the applications of R/Bioconductor packages for differential expression analysis and visualization of transcriptomics data. During this workshop, participants will learn how to import count data, pre-process and normalize data, perform statistical analyses to find differentially expressed genes and generate publication ready figures to report the results.
The transcriptomics data analysis will be carried out using R (version 4.2.1) and Bioconductor (version 3.15) packages.
We will be using RStudio, an integrated development environment (IDE) for R to edit & execute scripts and visualize plots.
First install R (version 4.2.1) and then RStudio. Once the installation is completed, run the RStudio.
Overview of RStudio IDE
Run RStudio and execute following commands on R Console to install R packages. Tidyverse is a collection of standard R packages that are widely used in data transformation and visualization.
install.packages(c("tidyverse", "pheatmap"), dependencies = TRUE)
Run RStudio and execute following commands on R Console to install the Bioconductor, DESeq2 and related packages.
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(version = "3.15")
BiocManager::install(c("DESeq2", "edgeR", "org.Hs.eg.db", "EnhancedVolcano"))
library(tidyverse)
library(pheatmap)
library(DESeq2)
library(edgeR)
library(org.Hs.eg.db)
library(EnhancedVolcano)
Bioinformatics pipelines designed for transcriptomics or RNASeq data analysis generate a matrix of read counts for genes/transcripts in given samples. The rows in matrix represent either gene or transcript IDs and columns represent samples. The values in the matrix correspond to number of reads aligned to corresponding gene or transcript in related sample.
We will be using previously published "airway" data set, which was described in the following publication,
Himes BE, Jiang X, Wagner P, Hu R, Wang Q, et al. (2014) RNA-Seq Transcriptome Profiling Identifies CRISPLD2 as a Glucocorticoid Responsive Gene that Modulates Cytokine Function in Airway Smooth Muscle Cells. PLOS ONE 9(6): e99625. https://doi.org/10.1371/journal.pone.0099625
PMID: 24926665. GEO: GSE52778.
Authors used RNASeq experiment to characterize the human airway smooth muscle (HASM) transcriptome at baseline and under dexamethasone (DEX) asthma treatment. RNASeq data from HASM cell lines with untreated (n=4) and treated (n=4) samples were processed through Bioinformatics pipeline as described in the paper to generate read count matrix across genes and samples.
- The count matrix data set can be downloaded either in the form of Bioconductor airway package.
BiocManager::install("airway")
library("airway")
data(airway)
dim(airway)
str(assay(airway))
colData(airway)
-
Tab-delimited text files for data sets are saved in the "./data" directory of this repository
list.files("./data")
-
Data set can also be downloaded as tab-delimited files from here (Download both the files from Google drive, airway.tsv and airway_colData.tsv).
An R script with commands for step wise differential expression analysis using DESeq2 package is saved in the "./src" directory.
"./src/DESeq2.R"