-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathgamlssML.Rd
136 lines (124 loc) · 7 KB
/
gamlssML.Rd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
\name{gamlssML}
\alias{gamlssML}
\alias{gamlssMLpred}
\title{Maximum Likelihood estimation of a simple GAMLSS model }
\description{
The function \code{gamlssML()} fits a \code{gamlss.family} distribution to single data set using a non linear maximisation algorithm in \code{R}.
This is relevant only when explanatory variables do not exist.
The function \code{gamlssMLpred()} is similar to \code{gamlssML()} but it saves the \emph{predictive} global deviance for the \code{newdata}. The new data in \code{gamlssMLpred()} can be given with the arguments \code{newdata} or defining the factor \code{rand}. \code{rand} should be a binary factor \code{rand} splitting the original data set into a training set (value 1) and a validation/test set (values 2), see
also \code{\link{gamlssVGD}}}
\usage{
gamlssML(formula, family = NO, weights = NULL, mu.start = NULL,
sigma.start = NULL, nu.start = NULL, tau.start = NULL,
mu.fix = FALSE, sigma.fix = FALSE, nu.fix = FALSE,
tau.fix = FALSE, data, start.from = NULL, ...)
gamlssMLpred(response = NULL, data = NULL, family = NO,
rand = NULL, newdata = NULL, ...)
}
%- maybe also 'usage' for other objects documented here.
\arguments{
\item{formula, response}{
a vector of data requiring the fit of a \code{gamlss.family} distribution or (only for the function \code{gamlssML}) a formula, for example, \code{y~1}, with no explanatory variables because they are ignored).
}
\item{family}{ \code{\link[gamlss.dist]{gamlss.family}} object, which is used to define the distribution and the link functions of the various parameters.
The distribution families supported by \code{gamlssML()} can be found in \code{\link[gamlss.dist]{gamlss.family}}}
\item{weights}{ a vector of weights.
Here weights can be used to weight out observations (like in \code{subset}) or for a weighted likelihood analysis where the contribution of the observations to the likelihood differs according to \code{weights}. The length of \code{weights} must be the same as the number of observations in the data. By default, the weight is set to one. To set weights to vector say \code{w} use \code{weights=w}
}
\item{mu.start}{ a scalar of initial values for the location parameter \code{mu} e.g. \code{mu.start=4}}
\item{sigma.start}{a scalar of initial values for the scale parameter \code{sigma} e.g. \code{sigma.start=1}}
\item{nu.start}{scalar of initial values for the parameter \code{nu} e.g. \code{nu.start=3} }
\item{tau.start}{scalar of initial values for the parameter \code{tau} e.g. \code{tau.start=3} }
\item{mu.fix}{whether the mu parameter should be kept fixed in the fitting processes e.g. \code{mu.fix=FALSE} }
\item{sigma.fix}{whether the sigma parameter should be kept fixed in the fitting processes e.g. \code{sigma.fix=FALSE}}
\item{nu.fix}{whether the nu parameter should be kept fixed in the fitting processes e.g. \code{nu.fix=FALSE} }
\item{tau.fix}{whether the tau parameter should be kept fixed in the fitting processes e.g. \code{tau.fix=FALSE}}
\item{data}{a data frame containing the variable \code{y}, e.g. \code{data=aids}. If this is missing, the variable should be on the search list.}
\item{start.from}{a gamlss object to start from the fitting or vector of length as many parameters in the distribution}
\item{rand}{For \code{gamlssMLpred()} a factor with values 1 (for fitting) and 2 (for predicting).}
\item{newdata}{The prediction data set (validation or test).}
\item{\dots}{for extra arguments}
}
\details{ The function \code{gamlssML()} fits a \code{gamlss.family} distribution to a single data set is using a non linear maximisation.
in fact it uses the internal function \code{MLE()} which is a copy of the \code{mle()} function of package \code{stat4}.
The function \code{gamlssML()} could be for large data faster than the equivalent \code{gamlss()} function which is designed for regression type of models.
The function \code{gamlssMLpred()} uses the function \code{gamlssML()} to fit the model but then uses \code{predict.gamlssML()} to predict for new data and saves the the prediction i) deviance increments, ii) global deviance iii) residuals.
}
\value{Returns a \code{gamlssML} object which behaves like a \code{gamlss} fitted objected}
\references{
Rigby, R. A. and Stasinopoulos D. M. (2005). Generalized additive models for location, scale and shape,(with discussion),
\emph{Appl. Statist.}, \bold{54}, part 3, pp 507-554.
Rigby, R. A., Stasinopoulos, D. M., Heller, G. Z., and De Bastiani, F. (2019)
\emph{Distributions for modeling location, scale, and shape: Using GAMLSS in R}, Chapman and Hall/CRC. An older version can be found in \url{https://www.gamlss.com/}.
Stasinopoulos D. M. Rigby R.A. (2007) Generalized additive models for location scale and shape (GAMLSS) in R.
\emph{Journal of Statistical Software}, Vol. \bold{23}, Issue 7, Dec 2007, \url{https://www.jstatsoft.org/v23/i07/}.
Stasinopoulos D. M., Rigby R.A., Heller G., Voudouris V., and De Bastiani F., (2017)
\emph{Flexible Regression and Smoothing: Using GAMLSS in R}, Chapman and Hall/CRC.
(see also \url{https://www.gamlss.com/}).
}
\author{
Mikis Stasinopoulos, Bob Rigby, Vlasis Voudouris and Majid Djennad
}
\seealso{
\code{\link[gamlss.dist]{gamlss.family}}, \code{\link{gamlss}} }
\examples{
#-------- negative binomial 1000 observations
y<- rNBI(1000)
system.time(m1<-gamlss(y~1, family=NBI))
system.time(m1a<-gamlss(y~1, family=NBI, trace=FALSE))
system.time(m11<-gamlssML(y, family=NBI))
AIC(m1,m1a,m11, k=0)
# neg. binomial n=10000
y<- rNBI(10000)
system.time(m1<-gamlss(y~1, family=NBI))
system.time(m1a<-gamlss(y~1, family=NBI, trace=FALSE))
system.time(m11<-gamlssML(y, family=NBI))
AIC(m1,m1a,m11, k=0)
# binomial type data
data(aep)
m1 <- gamlssML(aep$y, family=BB) # ok
m2 <- gamlssML(y, data=aep, family=BB) # ok
m3 <- gamlssML(y~1, data=aep, family=BB) # ok
m4 <- gamlssML(aep$y~1, family=BB) # ok
AIC(m1,m2,m3,m4)
\dontrun{
#-----------------------------------------------------------
# neg. binomial n=10000
y<- rNBI(10000)
rand <- sample(2, length(y), replace=TRUE, prob=c(0.6,0.4))
table(rand)
Y <- subset(y, rand==1)
YVal <- subset(y, rand==2)
length(Y)
length(YVal)
da1 <- data.frame(y=y)
dim(da1)
da2 <- data.frame(y=Y)
dim(da2)
danew <- data.frame(y=YVal)
# using gamlssVGD to fit the models
g1 <- gamlssVGD(y~1, rand=rand, family=NBI, data=da1)
g2 <- gamlssVGD(y~1, family=NBI, data=da2, newdata=dan)
AIC(g1,g2)
VGD(g1,g2)
# using gamlssMLpred to fit the models
p1 <- gamlssMLpred(y, rand=rand, family=NBI)
p2 <- gamlssMLpred(Y, family=NBI, newdata=YVal)
# AIC and VGD should produce identical results
AIC(p1,p2,g1,g2)
VGD(p1,p2, g1,g2)
# the fitted residuals
wp(p1, ylim.all=1)
# the prediction residuals
wp(resid=p1$residVal, ylim.all=.5)
#-------------------------------------------------------------
# chossing between distributions
p2<-gamlssMLpred(y, rand=rand, family=PO)
p3<-gamlssMLpred(y, rand=rand, family=PIG)
p4<-gamlssMLpred(y, rand=rand, family=BNB)
AIC(p1, p2, p3, p4)
VGD(p1, p2, p3, p4)
#--------------------------------------------------
}
}
\keyword{regression}