June 2020 hand edits to entire text

yawomkobara · Jul 5, 2020 · c24e663 · c24e663
1 parent 20c4f72
commit c24e663
Show file tree

Hide file tree

Showing 12 changed files with 438 additions and 153 deletions.
diff --git a/01-Introduction.Rmd b/01-Introduction.Rmd
diff --git a/02-Beyond-Most-Least-Squares.Rmd b/02-Beyond-Most-Least-Squares.Rmd
diff --git a/03-Distribution-Theory.Rmd b/03-Distribution-Theory.Rmd
diff --git a/04-Poisson-Regression.Rmd b/04-Poisson-Regression.Rmd
diff --git a/05-Generalized-Linear-Models.Rmd b/05-Generalized-Linear-Models.Rmd
@@ -29,14 +29,14 @@ library(knitr)
 
 ## One parameter exponential families
 
-Thus far, we have expanded our repertoire of models from linear least squares regression to include Poisson regression. But in the early 1970s @Nelder1972 identified a broader class of models that generalizes the  multiple linear regression we considered in the introductory chapter and are referred to as __generalized linear models (GLMs)__. All GLMs have similar forms for their likelihoods, MLEs, and variances. This makes it easier to find model estimates and their corresponding uncertainty. To determine whether a model based on a single parameter $\theta$ is a GLM, we consider the following properties.
+Thus far, we have expanded our repertoire of models from linear least squares regression to include Poisson regression. But in the early 1970s @Nelder1972 identified a broader class of models that generalizes the  multiple linear regression we considered in the introductory chapter and are referred to as __generalized linear models (GLMs)__. \index{generalized linear models (GLMs)} All GLMs have similar forms for their likelihoods, MLEs, and variances. This makes it easier to find model estimates and their corresponding uncertainty. To determine whether a model based on a single parameter $\theta$ is a GLM, we consider the following properties.
 When a probability formula can be written in the form below
 
 \begin{equation}
 f(y;\theta)=e^{[a(y)b(\theta)+c(\theta)+d(y)]}
 (\#eq:1expForm)
 \end{equation}
-and if the support (the set of possible input values) does not depend upon $\theta$, it is said to have a __one-parameter exponential family form__. We demonstrate that the Poisson distribution is a member of the one parameter exponential family by writing its probability mass function (pmf) in the form of Equation \@ref(eq:1expForm) and assessing its support.
+and if the support (the set of possible input values) does not depend upon $\theta$, it is said to have a __one-parameter exponential family form__ \index{one-parameter exponential family}. We demonstrate that the Poisson distribution is a member of the one parameter exponential family by writing its probability mass function (pmf) in the form of Equation \@ref(eq:1expForm) and assessing its support.
 
 ### One Parameter Exponential Family: Possion
 
@@ -68,7 +68,7 @@ c(\lambda)&=-\lambda \\
 d(y)&=-\log (y!)
 (\#eq:diffunc)
 \end{align*}
-These functions have useful interpretations in statistical theory. We won't be going into this in detail, but we will note that function $b(\lambda)$, or more generally $b(\theta)$, will be particularly helpful in GLMs. The function $b(\theta)$ is referred to as the __canonical link__. The canonical link is often a good choice to model as a linear function of the explanatory variables. That is, Poisson regression should be set up as $\log(\lambda)=\beta_0+\beta_1x_1+\beta_2x_2+\cdots$.  In fact, there is a distinct advantage to modeling the canonical link as opposed to other functions of $\theta$, but it is also worth noting that other choices are possible, and at times preferred, depending upon the context of the application.  
+These functions have useful interpretations in statistical theory. We won't be going into this in detail, but we will note that function $b(\lambda)$, or more generally $b(\theta)$, will be particularly helpful in GLMs. The function $b(\theta)$ is referred to as the __canonical link__ \index{canonical link}. The canonical link is often a good choice to model as a linear function of the explanatory variables. That is, Poisson regression should be set up as $\log(\lambda)=\beta_0+\beta_1x_1+\beta_2x_2+\cdots$.  In fact, there is a distinct advantage to modeling the canonical link as opposed to other functions of $\theta$, but it is also worth noting that other choices are possible, and at times preferred, depending upon the context of the application.  
 
 There are other benefits of identifying a response as being from a one parameter exponential family. For example, by creating an unifying theory for regression modeling, Nelder and Wedderburn made possible a common and efficient method for finding estimates of model parameters using iteratively reweighted least squares (IWLS).  In addition, we can use the one parameter exponential family form to determine the expected value and standard deviation of $Y$. With statistical theory you can show that
 

diff --git a/06-Logistic-Regression.Rmd b/06-Logistic-Regression.Rmd
diff --git a/07-Correlated-Data.Rmd b/07-Correlated-Data.Rmd
@@ -92,9 +92,9 @@ kable(scenarioSimTab, booktabs=T, caption="Summary of simulations for Dams and P
 
 ## Recognizing correlation
 
-Correlated data is encountered in nearly every field. In education, student scores from a particular teacher are typically more similar than scores of other students who have had a different teacher. During a study measuring depression indices weekly over the course of a month, we usually find that four measures for the same patient tend to be more similar than  depression indices from other patients. In political polling, opinions from members of the same household are usually more similar than opinions of members from other randomly selected  households. The structure of these data sets suggest inherent patterns of similarities or correlation among outcomes. This kind of correlation specifically concerns correlation of observations *within the same teacher or patient or household* and is referred to as __intraclass correlation__.
+Correlated data is encountered in nearly every field. In education, student scores from a particular teacher are typically more similar than scores of other students who have had a different teacher. During a study measuring depression indices weekly over the course of a month, we usually find that four measures for the same patient tend to be more similar than  depression indices from other patients. In political polling, opinions from members of the same household are usually more similar than opinions of members from other randomly selected  households. The structure of these data sets suggest inherent patterns of similarities or correlation among outcomes. This kind of correlation specifically concerns correlation of observations *within the same teacher or patient or household* and is referred to as __intraclass correlation__ \index{intraclass correlation}.
 
-Correlated data often takes on a multilevel structure.  That is, population elements are grouped into aggregates, and we often have information on both the individual elements and the aggregated groups.  For instance, students are grouped by teacher, weekly depression measures are grouped by patient, and survey respondents are grouped by household.  In these cases, we refer to **levels** of measurement and observational units at each level.  For example, students might represent **level one observational units** while teachers represent **level two observational units**, where **level one** is the most basic level of observation, and level one observations are aggregated to form **level two** observations.  Then, if we are modeling a response such as test score, we may want to examine the effects of student characteristics such as sex and ethnicity, and teacher characteristics such as years of experience.  Student characteristics would be considered **level one covariates**, while teacher characteristics would be **level two covariates**.
+Correlated data often takes on a multilevel structure.  That is, population elements are grouped into aggregates, and we often have information on both the individual elements and the aggregated groups.  For instance, students are grouped by teacher, weekly depression measures are grouped by patient, and survey respondents are grouped by household.  In these cases, we refer to **levels** \index{levels} of measurement and observational units at each level.  For example, students might represent **level one observational units** while teachers represent **level two observational units**, where **level one** is the most basic level of observation, and level one observations are aggregated to form **level two** observations.  Then, if we are modeling a response such as test score, we may want to examine the effects of student characteristics such as sex and ethnicity, and teacher characteristics such as years of experience.  Student characteristics would be considered **level one covariates**, while teacher characteristics would be **level two covariates**.
 
 
 ## Case Study: Dams and pups
@@ -103,9 +103,9 @@ A __teratogen__ is a substance or exposure that can result in harm to a developi
 ## Sources of Variability
 Before we analyze data from our simulated experiment, let's step back and look at the big picture. Statistics is all about analyzing and explaining variability, so let's consider what sources of variability we have in the dams and pups example. There are several reasons why the counts of the number of defective pups might differ from dam to dam, and it is helpful to explicitly identify these reasons in order to determine how the dose levels affect the pups while also accommodating correlation.
 
-__Dose Effect__ The dams and pups experiment is being carried out to determine whether different dose levels affect the development of defects differently. Of particular interest is determining whether a **dose-response** effect is present. A dose-response effect is evident when dams receiving higher dose levels produce higher proportions of pups with defects. Knowing defect rates at specific dose levels is typically of interest within this experiment and beyond. Publishing the defect rates for each dose level in a journal paper, for example, would be of interest to other teratologists. For that reason, we refer to dose level effects as __fixed effects__.
+__Dose Effect__ The dams and pups experiment is being carried out to determine whether different dose levels affect the development of defects differently. Of particular interest is determining whether a **dose-response** \index{dose response} effect is present. A dose-response effect is evident when dams receiving higher dose levels produce higher proportions of pups with defects. Knowing defect rates at specific dose levels is typically of interest within this experiment and beyond. Publishing the defect rates for each dose level in a journal paper, for example, would be of interest to other teratologists. For that reason, we refer to dose level effects as __fixed effects__ \index{fixed effects}.
 
-__Dams (Litter) Effect__  In many settings like this, there is a litter effect as well. For example, some dams may exhibit a propensity to produce pups with defects while others rarely produce litters with defective pups. That is, observations on pups within the same litter are likely to be similar or correlated. Unlike the dose effect, teratologists reading experiment results are not interested in the estimated probability of defect for each dam in the study, and we would not report these estimated probabilities in a paper. However, there may be interest in the *variability* in litter-specific defect probabilities; accounting for dam-to-dam variability reduces the amount of unexplained variability and leads to more precise estimates of fixed effects like dose. Often this kind of effect is modeled using the idea that randomly selected dams produce __random effects__. This provides one way in which to model correlated data, in this case the correlation between pups from the same dam. We elaborate on this idea throughout the remainder of the text.
+__Dams (Litter) Effect__  In many settings like this, there is a litter effect as well. For example, some dams may exhibit a propensity to produce pups with defects while others rarely produce litters with defective pups. That is, observations on pups within the same litter are likely to be similar or correlated. Unlike the dose effect, teratologists reading experiment results are not interested in the estimated probability of defect for each dam in the study, and we would not report these estimated probabilities in a paper. However, there may be interest in the *variability* in litter-specific defect probabilities; accounting for dam-to-dam variability reduces the amount of unexplained variability and leads to more precise estimates of fixed effects like dose. Often this kind of effect is modeled using the idea that randomly selected dams produce __random effects__ \index{random effects}. This provides one way in which to model correlated data, in this case the correlation between pups from the same dam. We elaborate on this idea throughout the remainder of the text.
 
 __Pup-to-pup variability__ The within litter pup-to-pup differences reflect random, unexplained variation in the model.
 
@@ -220,7 +220,7 @@ exp(confint(fit_1a_quasi))
 exp(confint(fit_1a_quasi)) / (1 + exp(confint(fit_1a_quasi)))
 ```
 
-However, we can account for potential overdispersion with a __quasibinomial model__, just as we did in Section \@ref(sec-logOverdispersion), in case the observed variance is larger than the variance under a binomial model.  Quasibinomial regression yields the same estimate for $\beta_0$ as the binomial regression model ($\hat{\beta}_0 = 0.067$), but we now have overdispersion paramater $\widehat{\phi} = 0.894$.  This gives us the following 95\% profile likelihood-based confidence intervals:
+However, we can account for potential overdispersion \index{overdispersion} with a __quasibinomial model__ \index{quasibinomial}, just as we did in Section \@ref(sec-logOverdispersion), in case the observed variance is larger than the variance under a binomial model.  Quasibinomial regression yields the same estimate for $\beta_0$ as the binomial regression model ($\hat{\beta}_0 = 0.067$), but we now have overdispersion paramater $\widehat{\phi} = 0.894$.  This gives us the following 95\% profile likelihood-based confidence intervals:
 
 \[
 \begin{alignedat}{2}
@@ -528,7 +528,7 @@ __Transect effects__ For some of the factors previously mentioned such as sun ex
 __Tree-to-tree variability within transects__ There is inherent variability in tree growth even when they are subject to the same transect and treatment effects. This variability remains unexplained in our model, although we will attempt to explain some of it with covariates such as species.
 
 
-Data sets with this kind of structure are often referred to as __multilevel data__, and the remaining chapters delve into models for multilevel data in gory detail.  With a continuous response variable, we will actually add random effects for transects to a more traditional linear least squares regression model rather than estimate an overdispersion parameter as with a binary response.  Either way, if observations are really correlated, proper accounting will lead to larger standard errors for model coefficients and larger (but more appropriate) p-values for testing the significance of those coefficients.
+Data sets with this kind of structure are often referred to as __multilevel data__ \index{multilevel data}, and the remaining chapters delve into models for multilevel data in gory detail.  With a continuous response variable, we will actually add random effects for transects to a more traditional linear least squares regression model rather than estimate an overdispersion parameter as with a binary response.  Either way, if observations are really correlated, proper accounting will lead to larger standard errors for model coefficients and larger (but more appropriate) p-values for testing the significance of those coefficients.
 
 ### Analysis preview: accounting for correlation within transect
 
@@ -601,9 +601,10 @@ f. *Teen Alcohol Use.*  @Curran1997 collected data on 82 adolescents at three ti
     - `peer` = a measure of peer alcohol use, taken when each subject was 14.  This is the square root of the sum of two 6-point items about the proportion of friends who drink occasionally or regularly.
     - `alcuse` = the primary response.  Four items—(a) drank beer or wine, (b) drank hard liquor, (c) 5 or more drinks in a row, and (d) got drunk—were each scored on an 8-point scale, from 0=”not at all” to 7=”every day”.  Then `alcuse` is the square root of the sum of these four items.
 
-Primary research questions included:
-    - do trajectories of alcohol use differ by parental alcoholism?
-    - do trajectories of alcohol use differ by peer alcohol use?
+    Primary research questions included:
+
+        - do trajectories of alcohol use differ by parental alcoholism?
+        - do trajectories of alcohol use differ by peer alcohol use?
 
 2. __More dams and pups__ Describe how to generalize the pup and dam example by allowing for different size litters.