-
Notifications
You must be signed in to change notification settings - Fork 95
/
Copy pathsleuth_results.Rd
121 lines (104 loc) · 7.04 KB
/
sleuth_results.Rd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/model.R
\name{sleuth_results}
\alias{sleuth_results}
\title{Extract Wald or Likelihood Ratio test results from a sleuth object}
\usage{
sleuth_results(obj, test, test_type = "wt", which_model = "full",
rename_cols = TRUE, show_all = TRUE,
pval_aggregate = obj$pval_aggregate, ...)
}
\arguments{
\item{obj}{a \code{sleuth} object}
\item{test}{a character string denoting the test to extract. Possible tests can be found by using \code{models(obj)}.}
\item{test_type}{'wt' for Wald test or 'lrt' for Likelihood Ratio test.}
\item{which_model}{a character string denoting the model. If extracting a wald test, use the model name.
Not used if extracting a likelihood ratio test.}
\item{rename_cols}{if \code{TRUE} will rename some columns to be shorter and
consistent with the vignette}
\item{show_all}{if \code{TRUE} will show all transcripts (not only the ones
passing filters). The transcripts that do not pass filters will have
\code{NA} values in most columns.}
\item{pval_aggregate}{if \code{TRUE} and both \code{target_mapping} and \code{aggregation_column} were provided,
to \code{sleuth_prep}, use lancaster's method to aggregate p-values by the \code{aggregation_column}.}
\item{...}{advanced options for sleuth_results. See details.}
}
\value{
If \code{pval_aggregate} is \code{FALSE}, returns a \code{data.frame} with the following columns:
\itemize{
\item \code{target_id}: transcript name, e.g. "ENST#####" (dependent on the transcriptome used in kallisto).
If \code{gene_mode} is TRUE, this will instead be the IDs specified by the \code{obj$gene_column} from \code{obj$target_mapping}.
\item \code{...}: if there is a target mapping data frame, all of the annotations columns are added from
\code{obj$target_mapping} before the other columns.
\item \code{pval}: p-value of the chosen model
\item \code{qval}: false discovery rate adjusted p-value, using Benjamini-Hochberg (see \code{\link{p.adjust}})
\item \code{test_stat} (LRT only): Chi-squared test statistic (likelihood ratio test). Only seen with Likelihood Ratio test results.
\item \code{rss} (LRT only): the residual sum of squares under the "null model". Only seen with Likelihood Ratio test results.
\item \code{degrees_free} (LRT only): the degrees of freedom (equal to difference between the two models). Only seen with Likelihood Ratio test results.
\item \code{b} (Wald only): 'beta' value (effect size). Technically a biased estimator of the fold change. Only seen with Wald test results.
\item \code{se_b} (Wald only): standard error of the beta. Only seen with Wald test results.
\item \code{mean_obs}: mean of natural log counts of observations
\item \code{var_obs}: variance of observation
\item \code{tech_var}: technical variance of observation from the bootstraps (named 'sigma_q_sq' if rename_cols is \code{FALSE})
\item \code{sigma_sq}: raw estimator of the variance once the technical variance has been removed
\item \code{smooth_sigma_sq}: smooth regression fit for the shrinkage estimation
\item \code{final_simga_sq}: max(sigma_sq, smooth_sigma_sq); used for covariance estimation of beta
(named 'smooth_sigma_sq_pmax' if rename_cols is \code{FALSE})
}
If \code{pval_aggregate} is \code{TRUE}, returns a \code{data.frame} with the following columns:
\itemize{
\item \code{target_id}: gene ID specified by \code{obj$gene_column}, e.g. "ENSG#####" (dependent on the transcriptome
used in kallisto).
\item \code{...}: all of the additional annotation columns (not \code{'target_id'} or \code{obj$gene_column}) are
added from \code{obj$target_mapping} before the other columns.
\item \code{num_aggregated_transcripts}: the number of transcripts aggregated for a given gene. These only include
filtered transcripts.
\item \code{sum_mean_obs_counts}: this is the sum of the mean observations across all filtered transcripts
within a gene. Note that the weighting function is applied before summing.
\item \code{pval}: the aggregated p-value calculated by the lancaster method. See the aggregation package for details.
\item \code{qval}: adjusted p-values using the Benchamini-Hochberg method.
}
}
\description{
This function extracts Wald or Likelihood Ratio test results from a sleuth object.
}
\details{
The columns returned by this function will depend on a few factors: whether the test is a Wald test or
Likelihood Ratio test, and whether \code{pval_aggregate} is \code{TRUE}.
The sleuth model is a measurement error in the response model. It attempts to segregate the variation due to
the inference procedure by kallisto from the variation due to the covariates -- the biological and technical
factors of the experiment (represented by the columns in \code{obj$sample_to_covariates}). For the Wald test,
the 'b' column represents the estimate of the selected coefficient. In the default setting, it is analogous to,
but not equivalent to, the fold-change. The transformed values are on the natural-log scale, and so the
the estimated coefficient is also on the natural-log scale. This value is taking into account the estimated
'inferential variance' estimated from the kallisto bootstraps.
If the user wishes to get gene-level results from this function, there are two ways of doing so:
\itemize{
\item p-value aggregation mode: if \code{pval_aggregate} argument is TRUE, this function will
aggregate the transcript-level p-values to the gene-level using the lancaster method. See below for advanced
options related to this mode. This is the recommended way to do gene-level aggregation. See the paper
\item count aggregation mode: This is the gene-level aggregation method introduced in sleuth version 0.28.1.
This mode is activated if \code{obj$gene_mode} is \code{TRUE}. In this mode, the modeling and testing was done
using aggregated counts (or TPMs), and so the results are same as for the transcript-level results, except the
target IDs are now gene IDs instead of transcript IDs.
}
An important note if \code{pval_aggregate} or the old \code{gene_mode} is \code{TRUE}: when combining the
gene annotations from \code{obj$target_mapping}, all of the columns except for the transcript ID,
\code{obj$target_mapping$target_id}, will be included. If there are transcript-level entries for any of the other
columns, this will result in duplicate rows in the results table (usually an undesirable result).
Here are advanced options for customizing the p-value aggregation procedure:
\itemize{
\item \code{weight_func}: if \code{pval_aggregate} is \code{TRUE}, then this is used to weight the p-values for
lancaster's method. This function must take the observed means of the transcripts as the only defined argument.
The default is \code{identity}.
}
}
\examples{
models(sleuth_obj) # for this example, assume the formula is ~condition,
and a coefficient is IP
results_table <- sleuth_results(sleuth_obj, 'conditionIP')
}
\seealso{
\code{\link{sleuth_wt}} and \code{\link{sleuth_lrt}} to compute tests, \code{\link{models}} to
view which models, \code{\link{tests}} to view which tests were performed (and can be extracted)
}