forked from BioinformaticsFMRP/TCGAbiolinks
-
Notifications
You must be signed in to change notification settings - Fork 0
/
mutation.Rmd
121 lines (101 loc) · 4.24 KB
/
mutation.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
---
title: "TCGAbiolinks: Searching, downloading and visualizing mutation files"
date: "`r BiocStyle::doc_date()`"
vignette: >
%\VignetteIndexEntry{"5. Mutation data"}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_knit$set(progress = FALSE)
```
```{r message=FALSE, warning=FALSE, include=FALSE}
library(TCGAbiolinks)
library(SummarizedExperiment)
library(dplyr)
library(DT)
```
# Search and Download
**TCGAbiolinks** has provided a few functions to download mutation data from GDC.
There are two options to download the data:
1. Use `GDCquery_Maf` which will download MAF aligned against hg38
2. Use `GDCquery`, `GDCdownload` and `GDCpreprare` to downoad MAF aligned against hg19
## Mutation data (hg38)
This exmaple will download MAF (mutation annotation files) for variant calling pipeline muse.
Pipelines options are: `muse`, `varscan2`, `somaticsniper`, `mutect`. For more information please access
[GDC docs](https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/).
```{r results = 'hide', echo=TRUE, message=FALSE, warning=FALSE,eval=F}
maf <- GDCquery_Maf("CHOL", pipelines = "muse")
```
```{r results = 'hide', echo=TRUE, message=FALSE, warning=FALSE,eval=T,include=F}
maf <- chol_maf@data
```
```{r echo = TRUE, message = FALSE, warning = FALSE}
# Only first 50 to make render faster
datatable(maf[1:20,],
filter = 'top',
options = list(scrollX = TRUE, keys = TRUE, pageLength = 5),
rownames = FALSE)
```
## Mutation data (hg19)
This exmaple will download MAF (mutation annotation files) aligned against hg19 (Old TCGA maf files)
```{r results = 'hide', echo=TRUE, message=FALSE, warning=FALSE}
query.maf.hg19 <- GDCquery(project = "TCGA-CHOL",
data.category = "Simple nucleotide variation",
data.type = "Simple somatic mutation",
access = "open",
legacy = TRUE)
```
```{r echo = TRUE, message = FALSE, warning = FALSE}
# Check maf availables
datatable(dplyr::select(getResults(query.maf.hg19),-contains("cases")),
filter = 'top',
options = list(scrollX = TRUE, keys = TRUE, pageLength = 10),
rownames = FALSE)
```
```{r results = 'hide', echo=TRUE, message=FALSE, warning=FALSE,eval=FALSE}
query.maf.hg19 <- GDCquery(project = "TCGA-CHOL",
data.category = "Simple nucleotide variation",
data.type = "Simple somatic mutation",
access = "open",
file.type = "bcgsc.ca_CHOL.IlluminaHiSeq_DNASeq.1.somatic.maf",
legacy = TRUE)
GDCdownload(query.maf.hg19)
maf <- GDCprepare(query.maf.hg19)
```
```{r message=FALSE, warning=FALSE, include=FALSE}
data <- bcgsc.ca_CHOL.IlluminaHiSeq_DNASeq.1.somatic.maf
```
```{r echo = TRUE, message = FALSE, warning = FALSE}
# Only first 50 to make render faster
datatable(maf[1:20,],
filter = 'top',
options = list(scrollX = TRUE, keys = TRUE, pageLength = 5),
rownames = FALSE)
```
# Visualize the data
To visualize the data you can use the Bioconductor package [maftools](https://bioconductor.org/packages/release/bioc/html/maftools.html). For more information, please check its [vignette](https://bioconductor.org/packages/release/bioc/vignettes/maftools/inst/doc/maftools.html#rainfall-plots).
```{r results = "hide",echo = TRUE, message = FALSE, warning = FALSE, eval=FALSE}
library(maftools)
library(dplyr)
maf <- GDCquery_Maf("CHOL", pipelines = "muse") %>% read.maf
```
```{r message=FALSE, warning=FALSE, include=FALSE}
library(maftools)
library(dplyr)
maf <- chol_maf
```
```{r results = "hide",echo = TRUE, message = FALSE, warning = FALSE}
datatable(getSampleSummary(maf),
filter = 'top',
options = list(scrollX = TRUE, keys = TRUE, pageLength = 5),
rownames = FALSE)
plotmafSummary(maf = maf, rmOutlier = TRUE, addStat = 'median', dashboard = TRUE)
```
```{r echo = TRUE, message = FALSE, warning = FALSE}
oncoplot(maf = maf, top = 10, removeNonMutated = TRUE)
titv = titv(maf = maf, plot = FALSE, useSyn = TRUE)
#plot titv summary
plotTiTv(res = titv)
```