man/predict.model_fit.Rd

% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/predict.R, R/predict_raw.R
\name{predict.model_fit}
\alias{predict.model_fit}
\alias{predict_raw.model_fit}
\alias{predict_raw}
\title{Model predictions}
\usage{
\method{predict}{model_fit}(object, new_data, type = NULL, opts = list(), ...)

\method{predict_raw}{model_fit}(object, new_data, opts = list(), ...)

predict_raw(object, ...)
}
\arguments{
\item{object}{An object of class \code{model_fit}}

\item{new_data}{A rectangular data object, such as a data frame.}

\item{type}{A single character value or \code{NULL}. Possible values
are "numeric", "class", "prob", "conf_int", "pred_int", "quantile", "time",
"hazard", "survival", or "raw". When \code{NULL}, \code{predict()} will choose an
appropriate value based on the model's mode.}

\item{opts}{A list of optional arguments to the underlying
predict function that will be used when \code{type = "raw"}. The
list should not include options for the model object or the
new data being predicted.}

\item{...}{Arguments to the underlying model's prediction
function cannot be passed here (see \code{opts}). There are some
\code{parsnip} related options that can be passed, depending on the
value of \code{type}. Possible arguments are:
\itemize{
\item \code{level}: for \code{type}s of "conf_int" and "pred_int" this
is the parameter for the tail area of the intervals
(e.g. confidence level for confidence intervals).
Default value is 0.95.
\item \code{std_error}: add the standard error of fit or prediction (on
the scale of the linear predictors) for \code{type}s of "conf_int"
and "pred_int". Default value is \code{FALSE}.
\item \code{quantile}: the quantile(s) for quantile regression
(not implemented yet)
\item \code{time}: the time(s) for hazard and survival probability estimates.
}}
}
\value{
With the exception of \code{type = "raw"}, the results of
\code{predict.model_fit()} will be a tibble as many rows in the output
as there are rows in \code{new_data} and the column names will be
predictable.

For numeric results with a single outcome, the tibble will have
a \code{.pred} column and \code{.pred_Yname} for multivariate results.

For hard class predictions, the column is named \code{.pred_class}
and, when \code{type = "prob"}, the columns are \code{.pred_classlevel}.

\code{type = "conf_int"} and \code{type = "pred_int"} return tibbles with
columns \code{.pred_lower} and \code{.pred_upper} with an attribute for
the confidence level. In the case where intervals can be
produces for class probabilities (or other non-scalar outputs),
the columns will be named \code{.pred_lower_classlevel} and so on.

Quantile predictions return a tibble with a column \code{.pred}, which is
a list-column. Each list element contains a tibble with columns
\code{.pred} and \code{.quantile} (and perhaps other columns).

Using \code{type = "raw"} with \code{predict.model_fit()} will return
the unadulterated results of the prediction function.

For censored regression:
\itemize{
\item \code{type = "time"} produces a column \code{.pred_time}.
\item \code{type = "hazard"} results in a column \code{.pred_hazard}.
\item \code{type = "survival"} results in a column \code{.pred_survival}.
}

For the last two types, the results are a nested tibble with an overall
column called \code{.pred} with sub-tibbles with the above format.

In the case of Spark-based models, since table columns cannot
contain dots, the same convention is used except 1) no dots
appear in names and 2) vectors are never returned but
type-specific prediction functions.

When the model fit failed and the error was captured, the
\code{predict()} function will return the same structure as above but
filled with missing values. This does not currently work for
multivariate models.
}
\description{
Apply a model to create different types of predictions.
\code{predict()} can be used for all types of models and uses the
"type" argument for more specificity.
}
\details{
If "type" is not supplied to \code{predict()}, then a choice
is made:
\itemize{
\item \code{type = "numeric"} for regression models,
\item \code{type = "class"} for classification, and
\item \code{type = "time"} for censored regression.
}

\code{predict()} is designed to provide a tidy result (see "Value"
section below) in a tibble output format.
\subsection{Interval predictions}{

When using \code{type = "conf_int"} and \code{type = "pred_int"}, the options
\code{level} and \code{std_error} can be used. The latter is a logical for an
extra column of standard error values (if available).
}

\subsection{Censored regression predictions}{

For censored regression, a numeric vector for \code{time} is required when
survival or hazard probabilities are requested. Also, when
\code{type = "linear_pred"}, censored regression models will be formatted such
that the linear predictor \emph{increases} with time. This may have the opposite
sign as what the underlying model's \code{predict()} method produces.
}
}
\examples{
library(dplyr)

lm_model <-
  linear_reg() \%>\%
  set_engine("lm") \%>\%
  fit(mpg ~ ., data = mtcars \%>\% dplyr::slice(11:32))

pred_cars <-
  mtcars \%>\%
  dplyr::slice(1:10) \%>\%
  dplyr::select(-mpg)

predict(lm_model, pred_cars)

predict(
  lm_model,
  pred_cars,
  type = "conf_int",
  level = 0.90
)

predict(
  lm_model,
  pred_cars,
  type = "raw",
  opts = list(type = "terms")
)
}
\keyword{internal}