forked from microbiome/mia
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathmia.R
386 lines (376 loc) · 15.6 KB
/
mia.R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
#' \code{mia} Package.
#'
#' \code{mia} implements tools for microbiome analysis based on the
#' \code{SummarizedExperiment}, \code{SingleCellExperiment} and
#' \code{TreeSummarizedExperiment} infrastructure. Data wrangling and analysis
#' in the context of taxonomic data is the main scope. Additional functions for
#' common task are implemented such as community indices calculation and
#' summarization.
#'
#' @name mia-package
#' @seealso
#' \link[TreeSummarizedExperiment:TreeSummarizedExperiment-class]{TreeSummarizedExperiment}
"_PACKAGE"
NULL
#' @import methods
#' @import TreeSummarizedExperiment
#' @import DelayedArray
#' @import scater
#' @importFrom dplyr %>%
#' @importFrom rlang sym :=
NULL
#' mia datasets
#'
#' mia provides various datasets derived from independent experimental studies.
#' The datasets represent instances of the TreeSummarizedExperiment and
#' MultiAssayExperiment containers and can serve as tools to practice the
#' mia functionality.
#'
#' Currently, the following datasets are available:
#' \itemize{
#' \item{\code{\link{dmn_se}}: A SummarizedExperiment with 130 features and
#' 278 samples}
#' \item{\code{\link{enterotype}}: A TreeSummarizedExperiment with 553
#' features and 280 samples}
#' \item{\code{\link{esophagus}}: A TreeSummarizedExperiment with 58 features
#' and 3 samples}
#' \item{\code{\link{GlobalPatterns}}: A TreeSummarizedExperiment with 19216
#' features and 26 samples}
#' \item{\code{\link{HintikkaXOData}}: A MultiAssayExperiment with 3
#' experiments (microbiota, metabolites and biomarkers)}
#' \item{\code{\link{peerj13075}}: A TreeSummarizedExperiment with 674
#' features and 58 samples}
#' \item{\code{\link{Tengeler2020}}: A TreeSummarizedExperiment with 151
#' features and 27 samples}
#' }
#'
#' @name mia-datasets
#' @docType data
#' @keywords datasets
#'
#' @examples
#' # Load dataset from mia
#' library(mia)
#' data("GlobalPatterns", package = "mia")
#'
#' # In this case, the dataset is a TreeSE, so it is renamed as tse
#' tse <- GlobalPatterns
#'
#' # Print summary
#' tse
NULL
#' Global patterns of 16S rRNA diversity at a depth of millions of sequences per
#' sample.
#'
#' GlobalPatterns compared the microbial communities from 25 environmental
#' samples
#' and three known "mock communities" at a an average depth of 3.1 million reads
#' per sample. Authors reproduced diversity patterns seen in many other
#' published studies, while investigating technical bias by applying the same
#' techniques to simulated microbial communities of known composition. Special
#' thanks are given to J. Gregory Caporaso for providing the OTU-clustered data
#' files for inclusion in the \pkg{phyloseq} package, from which this data was
#' converted to \code{TreeSummarizedExperiment}.
#'
#' @format A TreeSummarizedExperiment with 19216 features and 26 samples. The
#' rowData contains taxonomic information at Kingdom, Phylum, Class, Order,
#' Family, Genus and Species levels. The colData includes:
#'
#' \describe{
#' \item{X.SampleID}{Sample ID taken from the corresponding study}
#' \item{Primer}{primer used for sequencing}
#' \item{Final_Barcode}{final barcode (6 nucleotides)}
#' \item{Barcode_truncated_plus_T}{truncated barcode with an added tyrosine
#' (6 nucleotides)}
#' \item{Barcode_full_length}{complete barcode with a length of 11
#' nucleotides}
#' \item{SampleType}{sampling type by collection site (Soil, Feces, Skin,
#' Tongue, Freshwater, Creek Freshwater, Ocean, Estuary Sediment and Mock)}
#' \item{Description}{additional information (sampling location, environmental
#' factors and study type)}
#' }
#'
#' @name GlobalPatterns
#' @docType data
#' @keywords datasets
#' @usage data(GlobalPatterns)
#' @seealso \code{\link{mia-datasets}}
#' @author Caporaso, J. G., et al.
#' @references
#' Caporaso, J. G., et al. (2011).
#' Global patterns of 16S rRNA diversity at a depth of millions of sequences per
#' sample.
#' PNAS, 108, 4516-4522. \url{https://doi.org/10.1073/pnas.1000080107}
NULL
#' Human gut microbiome dataset from 22 subjects based on shotgun DNA sequencing
#'
#' The enterotype data of the human gut microbiome includes taxonomic profiling
#' for 280 fecal samples from 22 subjects based on shotgun DNA sequencing. The
#' authors claimed that the data naturally clumps into three community-level
#' clusters, or "enterotypes", that are not immediately explained by sequencing
#' technology or demographic features of the subjects. In a later addendum from
#' 2014 the authors stated that enterotypes should not be seen as discrete
#' clusters, but as a way of stratifying samples to reduce complexity. It was
#' converted into a TreeSummarizedExperiment from the \pkg{phyloseq} package.
#'
#' @format A TreeSummarizedExperiment with 553 features and 280 samples. The
#' rowData contains taxonomic information at Genus level. The colData includes:
#'
#' \describe{
#' \item{Enterotype}{enterotype the sample belongs to (1, 2 and 3)}
#' \item{Sample_ID}{sample ID of samples from all studies}
#' \item{SeqTech}{sequencing technology}
#' \item{SampleID}{sample ID of complete samples}
#' \item{Project}{original project from which sample was obtained (gill06,
#' turnbaugh09, MetaHIT, MicroObes, MicroAge and kurokawa07)}
#' \item{Nationality}{participant's nationality (american, danish, spanish,
#' french, italian and japanese)}
#' \item{Gender}{participant's gender (F or M)}
#' \item{Age}{participant's age (0.25 -- 87)}
#' \item{ClinicalStatus}{participant's clinical status (healthy, obese, CD,
#' UC and elderly)}
#' }
#'
#' @name enterotype
#' @docType data
#' @keywords datasets
#' @usage data(enterotype)
#' @seealso \code{\link{mia-datasets}}
#' @author Arumugam, M., Raes, J., et al.
#' @references
#' Arumugam, M., et al. (2011). Enterotypes of the human gut microbiome.
#' Nature, 473(7346), 174-180. \url{https://doi.org/10.1038/nature09944}
#'
#' Arumugam, M., et al. (2014). Addendum: Enterotypes of the human gut
#' microbiome.
#' Nature 506, 516 (2014). \url{https://doi.org/10.1038/nature13075}
#'
#' @source \url{http://www.bork.embl.de/Docu/Arumugam_et_al_2011/downloads.html}
NULL
#' Human esophageal community from 3 individuals
#'
#' This small dataset from a human esophageal community includes 3 samples from
#' 3 human adults based on biopsies analysed with 16S rDNA PCR. The 16S rRNA
#' sequence processing is provided in the mothur wiki from the link below. It
#' was
#' converted into a TreeSummarizedExperiment from the \pkg{phyloseq} package.
#'
#' @format A TreeSummarizedExperiment with 58 features and 3 samples. The
#' rowData contains no taxonomic information. The colData is empty.
#'
#' @name esophagus
#' @docType data
#' @keywords datasets
#' @usage data(esophagus)
#' @seealso \code{\link{mia-datasets}}
#' @author Pei et al. \email{zhiheng.pei@@med.nyu.edu}.
#' @references
#' Pei, Z., Bini, E. J., Yang, L., Zhou, M., Francois, F., & Blaser, M. J.
#' (2004).
#' Bacterial biota in the human distal esophagus.
#' Proceedings of the National Academy of Sciences of the United States of
#' America, 101(12), 4250-4255.
#' \url{https://doi.org/10.1073/pnas.0306398101}
#'
#' McMurdie, J. & Holmes, S. (2013) \emph{phyloseq}: An R Package for
#' reproducible interactive analysis
#' and graphics of microbiome census data. PLoS ONE. 8(4):e61217.
#' \url{https://doi.org/10.1371/journal.pone.0061217}
#'
#' @source \url{http://www.mothur.org/wiki/Esophageal_community_analysis}
NULL
#' Twins' microbiome data from 278 individuals
#'
#' dmn_se is a dataset on twins' microbiome where samples are stratified by
#' their community composition through Dirichlet Multinomial Mixtures (DMM). It
#' was derived from the \pkg{DirichletMultinomial} package.
#'
#' @format A SummarizedExperiment with 130 features and 278 samples. The
#' rowData contains no taxonomic information. The colData includes:
#'
#' \describe{
#' \item{pheno}{participant's weight condition (Lean, Overwt and Obese)}
#' }
#'
#' @name dmn_se
#' @docType data
#' @aliases twins
#' @keywords datasets
#' @usage data(dmn_se)
#' @seealso
#' \code{\link{mia-datasets}}
#' \code{\link{calculateDMN}}
#' @author Turnbaugh, PJ et al.
#' @references
#' Holmes I, Harris K, Quince C (2012).
#' Dirichlet Multinomial Mixtures: Generative Models for Microbial Metagenomics.
#' PLoS ONE 7(2): e30126. \url{https://doi.org/10.1371/journal.pone.0030126}
#'
#' Turnbaugh PJ, Hamady M, Yatsunenko T, Cantarel BL, Duncan A, et al. (2009).
#' A core gut microbiome in obese and lean twins. Nature 457: 480–484.
#' \url{https://doi.org/10.1038/nature07540}
NULL
#' Skin microbial profiles 58 genetically unrelated individuals
#'
#' peerj13075 includes skin microbial profiles of 58 volunteers with multiple
#' factors. 16S r-RNA sequencing of V3-V4 regions was done to generate millions
#' of read using illumina platform. A standard bioinformatic and statistical
#' analysis done to explore skin bacterial diversity and its association with
#' age, diet, geographical locations. The authors investigated significant
#' association of skin microbiota with individual’s geographical location.
#'
#' @format A TreeSummarizedExperiment with 674 features and 58 samples. The
#' rowData contains taxonomic information at kingdom, phylum, class, order,
#' family and genus level. The colData includes:
#'
#' \describe{
#' \item{Sample}{sample ID}
#' \item{Geographical_location}{city where participant lives (Ahmednagar,
#' Pune and Nashik)}
#' \item{Gender}{participant's gender (Male or Female)}
#' \item{Age}{participant's age group (Middle_age, Adult and Elderly)}
#' \item{Diet}{participant's diet (Veg or Mixed)}
#' }
#'
#' @name peerj13075
#' @docType data
#' @keywords datasets
#' @usage data(peerj13075)
#' @seealso \code{\link{mia-datasets}}
#' @author Potbhare, R., et al.
#' @references
#' Potbhare, R., RaviKumar, A., Munukka, E., Lahti, L., & Ashma, R. (2022).
#' Skin microbiota diversity among genetically unrelated individuals of Indian
#' origin.
#' PeerJ, 10, e13075. \url{https://doi.org/10.7717/peerj.13075}
#' Supplemental information includes OTU table and taxonomy table
#' publicly-accessible from:
#' \url{https://www.doi.org/10.7717/peerj.13075/supp-1}
#' \url{https://www.doi.org/10.7717/peerj.13075/supp-2}
NULL
#' Multiomics dataset from 40 rat samples
#'
#' HintikkaXO is a multiomics dataset from a rat experiment studying effect of
#' fat and prebiotics in diet. It contains high-throughput profiling data from
#' 40 rat samples, including 39 biomarkers, 38 metabolites (NMR), and 12706 OTUs
#' from 318 species, measured from Cecum. This is diet comparison study with
#' High/Low fat diet and xylo-oligosaccaride supplementation. Column metadata is
#' common for all experiments (microbiota, metabolites, biomarkers) and is
#' described below.
#'
#' @format A MultiAssayExperiment with 3 experiments (microbiota, metabolites
#' and
#' biomarkers). rowData of the microbiota experiment contains taxonomic
#' information
#' at Phylum, Class, Order, Family, Genus, Species and OTU levels. The
#' metabolites
#' and biomarkers experiments contain 38 NMR metabolites and 39 biomarkers,
#' respectively. The colData includes:
#'
#' \describe{
#' \item{Sample}{Sample ID (character)}
#' \item{Rat}{Rat ID (factor)}
#' \item{Site}{Site of measurement ("Cecum"); single value}
#' \item{Diet}{Diet group (factor; combination of the Fat and XOS fields)}
#' \item{Fat}{Fat in Diet (factor; Low/High)}
#' \item{XOS}{XOS Diet Supplement (numeric; 0/1)}
#' }
#'
#' @name HintikkaXOData
#' @docType data
#' @keywords datasets
#' @usage data(HintikkaXOData)
#' @seealso \code{\link{mia-datasets}}
#' @author Hintikka L et al.
#' @references
#' Hintikka L et al. (2021): Xylo-oligosaccharides in prevention of hepatic
#' steatosis and adipose tissue inflammation: associating taxonomic and
#' metabolomic patterns in fecal microbiota with biclustering. International
#' Journal of Environmental Research and Public Health 18(8):4049.
#' \url{https://doi.org/10.3390/ijerph18084049}
#'
NULL
#' Gut microbiota profiles of 27 individuals with ADHD and healthy controls
#'
#' Tengeler2020 includes gut microbiota profiles of 27 persons with ADHD. A
#' standard bioinformatic and statistical analysis done to demonstrate that
#' altered microbial composition could be a driver of altered brain structure
#' and function and concomitant changes in the animals’ behavior. This was
#' investigated by colonizing young, male, germ-free C57BL/6JOlaHsd mice with
#' microbiota from individuals with and without ADHD.
#'
#' @format A TreeSummarizedExperiment with 151 features and 27 samples. The
#' rowData contains taxonomic information at Kingdom, Phylum, Class, Order,
#' Family and Genus level. The colData includes:
#'
#' \describe{
#' \item{patient_status}{clinical status of the patient (ADHD or Control)}
#' \item{cohort}{cohort to which the patient belongs (Cohort_1, Cohort_2 and
#' Cohort_3)}
#' \item{patient_status_vs_cohort}{combination of patient_status and cohort}
#' \item{sample_name}{unique sample ID}
#' }
#'
#' @name Tengeler2020
#' @docType data
#' @keywords datasets
#' @usage data(Tengeler2020)
#' @seealso \code{\link{mia-datasets}}
#' @author A.C. Tengeler, et al.
#' @references
#' Tengeler, A.C., Dam, S.A., Wiesmann, M. et al.
#' Gut microbiota from persons with attention-deficit/hyperactivity disorder
#' affects the brain in mice.
#' Microbiome 8, 44 (2020). \url{https://doi.org/10.1186/s40168-020-00816-x}
#'
#' Supplemental information includes Home-cage activity, methods, results and imaging parameters and publicly-accessible from:
#' \url{https://static-content.springer.com/esm/art%3A10.1186%2Fs40168-020-00816-x/MediaObjects/40168_2020_816_MOESM1_ESM.docx}
#' \url{https://static-content.springer.com/esm/art%3A10.1186%2Fs40168-020-00816-x/MediaObjects/40168_2020_816_MOESM2_ESM.docx}
#' \url{https://static-content.springer.com/esm/art%3A10.1186%2Fs40168-020-00816-x/MediaObjects/40168_2020_816_MOESM3_ESM.docx}
#'
NULL
#' Fecal microbiota samples from 589 patients across different colorectal
#' cancer stages
#'
#' The study combined Quantitative Microbiome Profiling (QMP) with
#' extensive patient phenotyping from a group of 589 colorectal cancer (CRC)
#' patients, advanced adenoma (AA) patients, and healthy controls.
#' By implementing confounder control and quantitative profiling methods,
#' the study
#' was able to reveal potential misleading associations between microbial
#' markers
#' and colorectal cancer development that were driven by other factors like
#' intestinal
#' inflammation, rather than the cancer diagnosis itself.
#'
#' @format A TreeSummarizedExperiment with 676 features and 589 samples.
#' The rowData contains species. The colData includes:
#'
#' \describe{
#' \item{sampleID}{(character) Sample ID from the corresponding study}
#' \item{diagnosis}{(factor) Diagnosis type, with possible values: "ADE"
#' (advanced adenoma),
#' "CRC" (colorectal cancer), "CTL" (control)}
#' \item{colonoscopy}{(factor) Colonoscopy result, with possible values:
#' "FIT_Positive",
#' "familial_risk_familial_CRC_FCC", "familial_risk_no", "abdomil_complaints"}
#' }
#'
#' @name Tito2024QMP
#' @docType data
#' @keywords datasets
#' @usage data(Tito2024QMP)
#' @seealso \code{\link{mia-datasets}}
#' @author
#' Shadman Ishraq
#' @references
#' Raúl Y. Tito, Sara Verbandt, Marta Aguirre Vazquez, Leo Lahti, Chloe
#' Verspecht, Verónica Lloréns-Rico, Sara Vieira-Silva,
#' Janine Arts, Gwen Falony, Evelien Dekker, Joke Reumers, Sabine Tejpar &
#' Jeroen Raes (2024).
#' Microbiome confounders and quantitative profiling challenge predicted
#' microbial targets in colorectal cancer development.
#' Nature Medicine,30, 1339-1348.
#' \url{https://doi.org/10.1038/s41591-024-02963-2}
#'
NULL