Skip to content

Commit c13f310

Browse files
committed
Fixed few typos in glm vignette.
1 parent 4850459 commit c13f310

File tree

2 files changed

+2
-4
lines changed

2 files changed

+2
-4
lines changed

docs/glm/GLM_Vignette.pdf

-144 Bytes
Binary file not shown.

docs/glm/GLM_Vignette.tex

+2-4
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ \subsection{What is H2O?}
5454
\url{[email protected]} and \url{github.com/h2oai/h2o.git}
5555

5656
\subsection{What is GLM?}
57-
Generalized linear models (GLM) are the workhorse for most predictive analysis use cases. GLM can be used for both regression and classification, it scales well to large datasets and is based on solid statistical background. It is a ganrelaztion of linear models, allowing for modeling of data with exponential distributions and for categorical data (classification). GLM models are fitted solving the maximum likelihood optimization problem.
57+
Generalized linear models (GLM) are the workhorse for most predictive analysis use cases. GLM can be used for both regression and classification, it scales well to large datasets and is based on solid statistical background. It is a generalization of linear models, allowing for modeling of data with exponential distributions and for categorical data (classification). GLM models are fitted by solving the maximum likelihood optimization problem.
5858

5959
\subsection{GLM on H2O}
6060
H2O's GLM algorithm fits the generalized linear model with elastic net penalties. The model fitting computation is distributed, extremely fast, and scales extremely well for models with a limited number (~ low thousands) of predictors with non-zero coefficients. The algorithm can compute models for a single value of a penalty argument or the full regularization path, similar to glmnet package for R\cite{glmnet}.
@@ -254,8 +254,6 @@ \subsubsection{Logistic Regression (Binomial Family)}
254254
Deviance is -2 log likelihood:
255255
\[D = -2\sum\limits_{i=1}\limits^{N}{(y log(\hat{y}) + (1 - y)log(1-\hat{y}) )}\]
256256

257-
\textbf{Decision threshold}
258-
259257
\textbf{Example}\\
260258
Using the prostate data set, build a binomial model that classifies if there is penetration of the prostatic capsule (CAPSULE). Make sure the entries in the CAPSULE column are binary entries by using the \texttt{h2o.table()} function. Change the regression by setting the family to binomial.
261259
\begin{spverbatim}
@@ -490,7 +488,7 @@ \subsubsection{Lambda Search}
490488

491489
H2O computes $\lambda$-models sequentially and in decreasing order, warm-starting the model for $\lambda_k$ with the solution for $\lambda_{k-1}$. By warm-starting the models, we get better performance: typically models for subsequent $\lambda$s are close to each other, so we need only a few iterations per $\lambda$ (typically 2 or 3). We also achieve greater numerical stability, since models with a higher penalty are easier to compute; so, we start with an easy problem and then keep making only small changes to it.
492490

493-
\textbf{Note:} \textit{nlambda}, \textit{lambda.min.ratio} also specify the relative distance of any two lambdas in the sequence. This is important for the application of recursive strong rules, which are only effectove if the neighbouring lambdas are \textit{"close"} to each other. The default values are \textit{nlambda} = 100 and $\lambda_{min} = \lambda_{max} 1e^{-4}$, which gives us the ratio of. In order for strong rules to work, you should keep the ratio close to the default.
491+
\textbf{Note:} \textit{nlambda}, \textit{lambda.min.ratio} also specify the relative distance of any two lambdas in the sequence. This is important for the application of recursive strong rules, which are only effective if the neighbouring lambdas are \textit{"close"} to each other. The default values are \textit{nlambda} = 100 and $\lambda_{min} = \lambda_{max} 1e^{-4}$, which gives us the ratio of 0.912. In order for strong rules to work, you should keep the ratio close to the default.
494492

495493
\subsubsection{Grid Search Over Lambdas}
496494
While automatic lambda search is the preferred method, it is also possible to do a grid search over lambda values by passing in vector of lambdas and disabling the lambda-search option. The behavior will be identical to lambda search, except H2O will use a user-supplied list of lambdas instead (still capped at $\lambda_{max}$).

0 commit comments

Comments
 (0)