forked from tidymodels/parsnip
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathpredict.model_fit.Rd
153 lines (129 loc) · 5.3 KB
/
predict.model_fit.Rd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/predict.R, R/predict_raw.R
\name{predict.model_fit}
\alias{predict.model_fit}
\alias{predict_raw.model_fit}
\alias{predict_raw}
\title{Model predictions}
\usage{
\method{predict}{model_fit}(object, new_data, type = NULL, opts = list(), ...)
\method{predict_raw}{model_fit}(object, new_data, opts = list(), ...)
predict_raw(object, ...)
}
\arguments{
\item{object}{An object of class \code{model_fit}}
\item{new_data}{A rectangular data object, such as a data frame.}
\item{type}{A single character value or \code{NULL}. Possible values
are "numeric", "class", "prob", "conf_int", "pred_int", "quantile", "time",
"hazard", "survival", or "raw". When \code{NULL}, \code{predict()} will choose an
appropriate value based on the model's mode.}
\item{opts}{A list of optional arguments to the underlying
predict function that will be used when \code{type = "raw"}. The
list should not include options for the model object or the
new data being predicted.}
\item{...}{Arguments to the underlying model's prediction
function cannot be passed here (see \code{opts}). There are some
\code{parsnip} related options that can be passed, depending on the
value of \code{type}. Possible arguments are:
\itemize{
\item \code{level}: for \code{type}s of "conf_int" and "pred_int" this
is the parameter for the tail area of the intervals
(e.g. confidence level for confidence intervals).
Default value is 0.95.
\item \code{std_error}: add the standard error of fit or prediction (on
the scale of the linear predictors) for \code{type}s of "conf_int"
and "pred_int". Default value is \code{FALSE}.
\item \code{quantile}: the quantile(s) for quantile regression
(not implemented yet)
\item \code{time}: the time(s) for hazard and survival probability estimates.
}}
}
\value{
With the exception of \code{type = "raw"}, the results of
\code{predict.model_fit()} will be a tibble as many rows in the output
as there are rows in \code{new_data} and the column names will be
predictable.
For numeric results with a single outcome, the tibble will have
a \code{.pred} column and \code{.pred_Yname} for multivariate results.
For hard class predictions, the column is named \code{.pred_class}
and, when \code{type = "prob"}, the columns are \code{.pred_classlevel}.
\code{type = "conf_int"} and \code{type = "pred_int"} return tibbles with
columns \code{.pred_lower} and \code{.pred_upper} with an attribute for
the confidence level. In the case where intervals can be
produces for class probabilities (or other non-scalar outputs),
the columns will be named \code{.pred_lower_classlevel} and so on.
Quantile predictions return a tibble with a column \code{.pred}, which is
a list-column. Each list element contains a tibble with columns
\code{.pred} and \code{.quantile} (and perhaps other columns).
Using \code{type = "raw"} with \code{predict.model_fit()} will return
the unadulterated results of the prediction function.
For censored regression:
\itemize{
\item \code{type = "time"} produces a column \code{.pred_time}.
\item \code{type = "hazard"} results in a column \code{.pred_hazard}.
\item \code{type = "survival"} results in a column \code{.pred_survival}.
}
For the last two types, the results are a nested tibble with an overall
column called \code{.pred} with sub-tibbles with the above format.
In the case of Spark-based models, since table columns cannot
contain dots, the same convention is used except 1) no dots
appear in names and 2) vectors are never returned but
type-specific prediction functions.
When the model fit failed and the error was captured, the
\code{predict()} function will return the same structure as above but
filled with missing values. This does not currently work for
multivariate models.
}
\description{
Apply a model to create different types of predictions.
\code{predict()} can be used for all types of models and uses the
"type" argument for more specificity.
}
\details{
If "type" is not supplied to \code{predict()}, then a choice
is made:
\itemize{
\item \code{type = "numeric"} for regression models,
\item \code{type = "class"} for classification, and
\item \code{type = "time"} for censored regression.
}
\code{predict()} is designed to provide a tidy result (see "Value"
section below) in a tibble output format.
\subsection{Interval predictions}{
When using \code{type = "conf_int"} and \code{type = "pred_int"}, the options
\code{level} and \code{std_error} can be used. The latter is a logical for an
extra column of standard error values (if available).
}
\subsection{Censored regression predictions}{
For censored regression, a numeric vector for \code{time} is required when
survival or hazard probabilities are requested. Also, when
\code{type = "linear_pred"}, censored regression models will be formatted such
that the linear predictor \emph{increases} with time. This may have the opposite
sign as what the underlying model's \code{predict()} method produces.
}
}
\examples{
library(dplyr)
lm_model <-
linear_reg() \%>\%
set_engine("lm") \%>\%
fit(mpg ~ ., data = mtcars \%>\% dplyr::slice(11:32))
pred_cars <-
mtcars \%>\%
dplyr::slice(1:10) \%>\%
dplyr::select(-mpg)
predict(lm_model, pred_cars)
predict(
lm_model,
pred_cars,
type = "conf_int",
level = 0.90
)
predict(
lm_model,
pred_cars,
type = "raw",
opts = list(type = "terms")
)
}
\keyword{internal}