Skip to content

pandurang-kolekar/transcriptomics_data_analysis_using_R

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Transcriptomics data analysis using R/Bioconductor

Abstract

RNA sequencing (RNASeq) has become a method of choice for transcriptome profiling, however the analysis of the massive amount of data generated by large‐scale RNASq still remains a challenge. Typically RNA‐seq data analyses consist of (1) alignment of millions of short sequencing reads to a reference genome or de novo assembled transcriptome, including the identification of splicing events; (2) quantification of expression levels of genes, transcripts, and exons; (3) differential analysis of gene expression among different biological conditions; and (4) biological interpretation of those differentially expressed genes.

Various Bioinformatics pipeline exist for quantification of expression levels of genes/transcripts, which ultimately generate a matrix of read counts for genes/transcripts in given samples. This count matrix is then subsequently used for differential expression analysis using R/Bioconductor packages.

The workshop session will demonstrate the applications of R/Bioconductor packages for differential expression analysis and visualization of transcriptomics data. During this workshop, participants will learn how to import count data, pre-process and normalize data, perform statistical analyses to find differentially expressed genes and generate publication ready figures to report the results.

Software Installation

The transcriptomics data analysis will be carried out using R (version 4.2.1) and Bioconductor (version 3.15) packages.

We will be using RStudio, an integrated development environment (IDE) for R to edit & execute scripts and visualize plots.

First install R (version 4.2.1) and then RStudio. Once the installation is completed, run the RStudio.

Overview of RStudio IDE

Youtube Video

IMAGE ALT TEXT HERE

Install R packages

Run RStudio and execute following commands on R Console to install R packages. Tidyverse is a collection of standard R packages that are widely used in data transformation and visualization.

install.packages(c("tidyverse", "pheatmap"), dependencies = TRUE)

Install Bioconductor

Run RStudio and execute following commands on R Console to install the Bioconductor, DESeq2 and related packages.

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install(version = "3.15")
BiocManager::install(c("DESeq2", "edgeR", "org.Hs.eg.db", "EnhancedVolcano"))

Test installations

library(tidyverse)
library(pheatmap)
library(DESeq2)
library(edgeR)
library(org.Hs.eg.db)
library(EnhancedVolcano)

Data Set

Bioinformatics pipelines designed for transcriptomics or RNASeq data analysis generate a matrix of read counts for genes/transcripts in given samples. The rows in matrix represent either gene or transcript IDs and columns represent samples. The values in the matrix correspond to number of reads aligned to corresponding gene or transcript in related sample.

We will be using previously published "airway" data set, which was described in the following publication,

Himes BE, Jiang X, Wagner P, Hu R, Wang Q, et al. (2014) RNA-Seq Transcriptome Profiling Identifies CRISPLD2 as a Glucocorticoid Responsive Gene that Modulates Cytokine Function in Airway Smooth Muscle Cells. PLOS ONE 9(6): e99625. https://doi.org/10.1371/journal.pone.0099625

PMID: 24926665. GEO: GSE52778.

Authors used RNASeq experiment to characterize the human airway smooth muscle (HASM) transcriptome at baseline and under dexamethasone (DEX) asthma treatment. RNASeq data from HASM cell lines with untreated (n=4) and treated (n=4) samples were processed through Bioinformatics pipeline as described in the paper to generate read count matrix across genes and samples.

  • The count matrix data set can be downloaded either in the form of Bioconductor airway package.
BiocManager::install("airway")
library("airway")
data(airway)
dim(airway)
str(assay(airway))
colData(airway)
  • Tab-delimited text files for data sets are saved in the "./data" directory of this repository

    list.files("./data")
    
  • Data set can also be downloaded as tab-delimited files from here (Download both the files from Google drive, airway.tsv and airway_colData.tsv).

Source Code for Hands-on session

An R script with commands for step wise differential expression analysis using DESeq2 package is saved in the "./src" directory.

"./src/DESeq2.R"

About

Transcriptomics data analysis using R/Bioconductor

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages