June 2020 hand edits to entire text

yawomkobara · Jul 12, 2020 · 872f4ed · 872f4ed
1 parent ed8ee95
commit 872f4ed
Show file tree

Hide file tree

Showing 7 changed files with 5,210 additions and 3,017 deletions.
diff --git a/09-Two-Level-Longitudinal-Data.Rmd b/09-Two-Level-Longitudinal-Data.Rmd
diff --git a/10-Multilevel-Data-With-More-Than-Two-Levels.Rmd b/10-Multilevel-Data-With-More-Than-Two-Levels.Rmd
@@ -476,24 +476,21 @@ We once again begin with the __unconditional means model__ \index{unconditional
 
 - Level One (timepoint within plant):
 
-\begin{equation}
+\begin{equation*}
 Y_{ijk} = a_{ij}+\epsilon_{ijk} \textrm{ where } \epsilon_{ijk}\sim N(0,\sigma^2)
-(\#eq:initunun)
-\end{equation}
+\end{equation*}
 
 - Level Two (plant within pot):
 
-\begin{equation}
+\begin{equation*}
 a_{ij} = a_{i}+u_{ij} \textrm{ where } u_{ij}\sim N(0,\sigma_{u}^{2})
-(\#eq:initunun2)
-\end{equation}
+\end{equation*}
 
 - Level Three (pot):
 
-\begin{equation}
+\begin{equation*}
 a_{i} = \alpha_{0}+\tilde{u}_{i} \textrm{ where } \tilde{u}_{i} \sim N(0,\sigma_{\tilde{u}}^{2})
-(\#eq:initunun3)
-\end{equation}
+\end{equation*}
 
 where the heights of plants from different pots are considered independent, but plants from the same pot are correlated as well as measurements at different times from the same plant.
 
@@ -512,10 +509,9 @@ Keeping track of all the model terms, especially with three subscripts, is not a
 
 The three-level unconditional means model can also be expressed as a composite model:
 
-\begin{equation}
+\begin{equation*}
 Y_{ijk}=\alpha_{0}+\tilde{u}_{i}+u_{ij}+\epsilon_{ijk}
-(\#eq:initununcomp)
-\end{equation}
+\end{equation*}
 and this composite model can be fit using statistical software:
 
 ```{r, comment=NA}
@@ -549,10 +545,9 @@ The three-level unconditional growth model (Model B) can be specified either usi
 
 - Level One (timepoint within plant):
 
-\begin{equation}
+\begin{equation*}
 Y_{ijk} = a_{ij}+b_{ij}\textrm{time}_{ijk}+\epsilon_{ijk}
-(\#eq:timewithplnt)
-\end{equation}
+\end{equation*}
 
 - Level Two (plant within pot):
 
@@ -570,11 +565,10 @@ b_{i} & = \beta_{0}+\tilde{v}_{i}
 
 or as a composite model:
 
-\begin{equation}
+\begin{equation*}
 Y_{ijk}=[\alpha_{0}+\beta_{0}\textrm{time}_{ijk}]+
 [\tilde{u}_{i}+{v}_{ij}+\epsilon_{ijk}+(\tilde{v}_{i}+{v}_{ij})\textrm{time}_{ijk}]
-(\#eq:compmodb)
-\end{equation}
+\end{equation*}
 
 where $\epsilon_{ijk}\sim N(0,\sigma^2)$,
 
@@ -779,9 +773,9 @@ However, when it is possible to remove boundary constraints through reasonable m
 
 - Level One (timepoint within plant):
 
-\begin{equation}
+\begin{equation*}
 Y_{ijk} = a_{ij}+b_{ij}\textrm{time}_{ijk}+\epsilon_{ijk}
-\end{equation}
+\end{equation*}
 
 - Level Two (plant within pot):
 
@@ -848,7 +842,7 @@ Under the parametric bootstrap, we must simulate data under the null hypothesis
 - Produce a histogram of likelihood ratio statistics to illustrate its behavior when the null hypothesis is true
 - Calculate a p-value by finding the proportion of times the bootstrapped test statistic is greater than our observed test statistic
 
-Let's see how new plant heights are generated under the parametric bootstrap. Consider, for instance, $i=1$ and $j=1,2$. That is, consider Plants \#11 and \#12. These plants are found in Pot \#1, which was randomly assigned to contain sterilized soil from a restored prairie (STP):
+Let's see how new plant heights are generated under the parametric bootstrap. Consider, for instance, $i=1$ and $j=1,2$. That is, consider Plants \#11 and \#12 as shown in Table \@ref(tab:10verb7). These plants are found in Pot \#1, which was randomly assigned to contain sterilized soil from a restored prairie (STP):
 
 ```{r, 10verb7, echo=FALSE, comment=NA}
 verb7 <- seedwd[1:2, c(2:11)]
@@ -1037,10 +1031,9 @@ For instance, consider Model C, where we must estimate a total of 15 parameters:
 
 - Level One (timepoint within plant):
 
-\begin{equation}
+\begin{equation*}
 Y_{ijk} = a_{ij}+b_{ij}\textrm{time}_{ijk}+\epsilon_{ijk}
-(\#eq:lev1timemodcp)
-\end{equation}
+\end{equation*}
 
 - Level Two (plant within pot):
 
@@ -1110,9 +1103,9 @@ By following the options above, our potential 30-parameter model (C_plus) can be
 
 - Level One:
 
-\begin{equation}
+\begin{equation*}
 Y_{ijk} = a_{ij}+b_{ij}\textrm{time}_{ijk}+\epsilon_{ijk}
-\end{equation}
+\end{equation*}
 
 - Level Two:
 
@@ -1147,9 +1140,9 @@ In Model C we considered the main effects of soil type and sterilization on lead
 
 - Level One:
 
-\begin{equation}
+\begin{equation*}
 Y_{ijk} = a_{ij}+b_{ij}\textrm{time}_{ijk}+\epsilon_{ijk}
-\end{equation}
+\end{equation*}
 
 - Level Two:
 
@@ -1193,9 +1186,9 @@ Our final model (Model F), with its constraints on Level Three error terms, can
 
 - Level One:
 
-\begin{equation}
+\begin{equation*}
 Y_{ijk} = a_{ij}+b_{ij}\textrm{time}_{ijk}+\epsilon_{ijk}
-\end{equation}
+\end{equation*}
 
 - Level Two:
 
@@ -1367,11 +1360,10 @@ As in Chapter \@ref(ch-lon), it is important to be aware of the covariance struc
 
 We will first consider Model B with $\tilde{v}_{i}$ at Level Three, and then we will evaluate the resulting covariance structure that results from removing $\tilde{v}_{i}$, thereby restricting $\sigma_{\tilde{v}}^{2}=\sigma_{\tilde{u}\tilde{v}}=0$.  The composite version of Model B has been previously expressed as:
 
-\begin{equation}
+\begin{equation*}
 Y_{ijk}=[\alpha_{0}+\beta_{0}\textrm{time}_{ijk}]+
 [\tilde{u}_{i}+u_{ij}+\epsilon_{ijk}+(\tilde{v}_{i}+v_{ij})\textrm{time}_{ijk}]
-(\#eq:modbcomp)
-\end{equation}
+\end{equation*}
 
 where $\epsilon_{ijk}\sim N(0,\sigma^2)$,
 
@@ -1567,7 +1559,7 @@ In Section \@ref(threelevel-paraboot) we sought to perform a significance test c
 
 11. In Model C, we initially addressed boundary constraints by removing the Level Three correlation between error terms from our multilevel model. What other model adjustments might we have considered?
 
-12. How does Figure \@ref(fig:paraboot) show that a likelihood ratio test using a chi-square distribution would be biased?
+12. How does Figure \@ref(fig:paraboot10) show that a likelihood ratio test using a chi-square distribution would be biased?
 
 13. In Section \@ref(sec:explodingvarcomps), a model with 52 parameters is described: (a) illustrate that the model does indeed contain 52 parameters; (b) explain how to minimize the total number of parameters using ideas from Section \@ref(sec:explodingvarcomps); (c) what assumptions have you made in your simplification in (b)?
 
@@ -1612,7 +1604,7 @@ kable(table4chp10, booktabs=T, escape=F,
 
 21. At the bottom of Table \@ref(tab:table4chp10), the percent of variance explained is given within and between neighborhoods. Explain what these values likely represent and how they were calculated.
 
-22. Table \@ref(tab:table5chp10) shows a portion of Table 4 from @Sampson1997. Describe the multilevel model that likely produced this table. State the primary result from this table in context. [Note that collective efficacy is a Level Three covariate in this table, summarized over an entire neighborhood.]  Estimates of neighborhood-level coefficients control for gender, marital status, homeownership, ethnicity, mobility, age, years in neighborhood, and SES of those interviewed. Model 1 accounts for 70.5\% of the variation between neighborhoods in perceived violence, whereas model 2 accounts for 77.8\% of the variation.
+22. Table \@ref(tab:table5chp10) shows a portion of Table 4 from @Sampson1997. Describe the multilevel model that likely produced this table. State the primary result from this table in context. [Note that collective efficacy is a Level Three covariate in this table, summarized over an entire neighborhood.]  Estimates of neighborhood-level coefficients control for gender, marital status, homeownership, ethnicity, mobility, age, years in neighborhood, and SES of those interviewed. Model 1 accounts for 70.5\% of the variation between neighborhoods in perceived violence, whereas Model 2 accounts for 77.8\% of the variation.
 
 ```{r, include=FALSE}
 Variable2 <- c("Concentrated disadvantage","Immigrant concentration","Residential stability","Collective efficacy")
@@ -1632,7 +1624,7 @@ kable(table5chp10, booktabs=T,
       caption="A portion of Table 4: Neighborhood correlates of perceived neighborhood violence from Sampson et al. (1997).") %>%
       add_header_above(c(" ","Social composition"=3, "Social comp and collective efficacy"=3)) %>%
       add_header_above(c(" ","Model 1"=3, "Model 2"=3)) %>%
-      row_spec(1, bold=T) 
+    kable_styling(latex_options = "scale_down")
 ```
 
 

diff --git a/11-Generalized-Linear-Multilevel-Models.Rmd b/11-Generalized-Linear-Multilevel-Models.Rmd
@@ -561,7 +561,7 @@ At this point, you might imagine expanding model building efforts in a couple of
 
 ### A unified multilevel approach (the framework we'll use) {#unified-glmm}
 
-As in Chapters \@ref(ch-multilevelintro) and \@ref(ch-lon), we will write out a composite model after first expressing Level One and Level Two models.  That is, we will create Level One and Level Two models as in Section \@ref(twostage-glmm), but we will then combine those models into a composite model and estimate all model parameters simultaneously.  Once again $Y_{ij}$ is an indicator variable recording if the $j^{th}$ foul from Game $i$ was called on the home team (1) or the visiting team (0), and $p_{ij}$ is the true probability that the $j^{th}$ foul from Game $i$ was called on the home team.  Our Level One model with foul differential as the sole predictor is given by Equation \@ref(eq:lev1glmm):
+As in Chapters \@ref(ch-multilevelintro) and \@ref(ch-lon), we will write out a composite model after first expressing Level One and Level Two models.  That is, we will create Level One and Level Two models as in Section \@ref(twostage-glmm), but we will then combine those models into a composite model and estimate all model parameters simultaneously.  Once again $Y_{ij}$ is an indicator variable recording if the $j^{th}$ foul from Game $i$ was called on the home team (1) or the visiting team (0), and $p_{ij}$ is the true probability that the $j^{th}$ foul from Game $i$ was called on the home team.  Our Level One model with foul differential as the sole predictor is given by Equation \@ref(eq:lev1glmm) generalized to Game $i$:
 
   \[ \log\bigg(\frac{p_{ij}}{1-p_{ij}}\bigg)=a_i+b_i\mathrm{foul.diff}_{ij} \]
 
@@ -679,10 +679,9 @@ In the College Basketball Referees case study, our two primary Level Two covaria
 
 How will treating home and visiting teams as random effects change our multilevel model?  Another way we might view this situation is by considering that Game is not the only Level Two observational unit we might have selected.  What if we instead decided to focus on Home Team as the Level Two observational unit?  That is, what if we assumed that fouls called on the same home team across all games must be correlated?  In this case, we could redefine our Level One model from Equation \@ref(eq:lev1glmm).  Let $Y_{hj}$ be an indicator variable recording if the $j^{th}$ foul from Home Team $h$ was called on the home team (1) or the visiting team (0), and $p_{hj}$ be the true probability that the $j^{th}$ foul from Home Team $h$ was called on the home team.  Now, if we were to consider a simple model with foul differential as the sole predictor, we could model the probability of a foul on the home team for Home Team $h$ with the model:
 
-\begin{equation}
+\begin{equation*}
 \log\bigg(\frac{p_{hj}}{1-p_{hj}}\bigg)=a_h+b_h\mathrm{foul.diff}_{hj}
-(\#eq:lev1bglmm)
-\end{equation}
+\end{equation*}
 
 In this case, $e^{a_{h}}$ represents the odds that a foul is called on the home team when total fouls are equal between both teams in a game involving Home Team $h$, and $e^{b_{h}}$ represents the multiplicative change in the odds that a foul is called on the home team for every extra foul on the home team compared to the visitors in a game involving Home Team $h$.  After fitting logistic regression models for each of the 39 teams in our data set, we see in Figure \@ref(fig:gmu-histmat3) variability in fitted intercepts (mean=-0.15, sd=0.33) and slopes (mean=-0.22, sd=0.12) among the 39 teams, although much less variability than we observed from game-to-game.  Of course, each logistic regression model for a home team was based on about 10 times more foul calls than each model for a game, so observing less variability from team-to-team was not unexpected.
 
@@ -744,10 +743,9 @@ We could include terms that vary by home or visiting team in other Level Two equ
 
 Our composite model then looks like:
 
-\begin{equation}
+\begin{equation*}
 \log\bigg(\frac{p_{i[gh]j}}{1-p_{i[gh]j}}\bigg) = [\alpha_{0}+\beta_{0}\mathrm{foul.diff}_{ij}]+[u_{i}+v_{h}+w_{g}].
-(\#eq:compmoda)
-\end{equation}
+\end{equation*}
 We will refer to this as Model A3, where we look at the effect of foul differential on the odds a foul is called on the home team, while accounting for three crossed random effects at Level Two (game, home team, and visiting team).  Parameter estimates for Model A3 are given below:
 
 ```{r, include=FALSE}
@@ -1337,7 +1335,7 @@ kable(table5chp11, booktabs=T,
 
     Perform exploratory analyses and then run multilevel models to examine significant determinants of successful challenges.  Write a short report comparing specific reasons for the challenge to the greater context in which a challenge was made.
 
-4. __Yelp restaurant reviews__  @Mohr2018 assembled a data set of Yelp restaurant reviews in Madison, WI, from 2005 through 2017 based on the Yelp Dataset Challenge on [Kaggle](https://www.kaggle.com/yelp-dataset/yelp-dataset).  Their data in `yelp.csv` contains almost 60,000 reviews on 888 restaurants from over 20,000 reviewers, and it contains a selection of variables on the reviewer (e.g., total reviews, average stars), the restaurant (e.g., neighborhood, average stars, category), and the review itself (e.g., stars, year, useful ratings, actual text).
+4. __Yelp restaurant reviews__  @Mohr2018 assembled a data set of Yelp restaurant reviews in Madison, WI, from 2005 through 2017 based on the Yelp Dataset Challenge on [Kaggle](https://www.kaggle.com/yelp-dataset/yelp-dataset).  The data in `yelp.csv` contains almost 60,000 reviews on 888 restaurants from over 20,000 reviewers, and it contains a selection of variables on the reviewer (e.g., total reviews, average stars), the restaurant (e.g., neighborhood, average stars, category), and the review itself (e.g., stars, year, useful ratings, actual text).
 
     There are various questions that could be pursued with this data.  Here are just a few ideas:
       - how can we model number of stars in the rating, or whether or not the rating was 5 stars or not?

diff --git a/_bookdown.yml b/_bookdown.yml
@@ -2,9 +2,9 @@ book_filename: "bookdown-bysh"
 chapter_name: "Chapter "
 output_dir: docs
 rmd_files: ["index.Rmd", 
-  "04-Poisson-Regression.Rmd",
-  "05-Generalized-Linear-Models.Rmd",
-  "06-Logistic-Regression.Rmd",
-  "07-Correlated-Data.Rmd",
+  "08-Introduction-to-Multilevel-Models.Rmd",
+  "09-Two-Level-Longitudinal-Data.Rmd",
+  "10-Multilevel-Data-With-More-Than-Two-Levels.Rmd",
+  "11-Generalized-Linear-Multilevel-Models.Rmd",
   "99-References.Rmd"]
 clean: [packages.bib, bookdown.bbl]
diff --git a/bib/articles.bib b/bib/articles.bib
@@ -20,7 +20,7 @@ @article{Rodgers2001
 
 @article{Mathews2005,
   author = {T.J. Mathews and Brady E. Hamilton},
-  title = {Trend Analysis of the Sex Ratio at Birth in the United States},
+  title = {Trend Analysis of the Sex Ratio at Birth in the {United States}},
   journal={National Vital Statistics Reports},
   volume={53},
   number = {20},
@@ -63,7 +63,7 @@ @article{Cameron1986
 
 @article{Scotto1974,
   author = {Joseph Scotto and Alfred W. Kopf and Fredrick Urbach},
-  title = {Non-melanoma skin cancer among caucasians in four areas of the United States},
+  title = {Non-melanoma skin cancer among caucasians in four areas of the {United States}},
   journal = {Cancer},
   volume = {34},
   number = {4},
@@ -96,7 +96,7 @@ @article{Nelder1972
 
 @article{Roskes2011,
   author = {Marieke Roskes and Daniel Sligte and Shaul Shalvi and Carsten K. W. De Dreu},
-  title = {The Right Side? Under Time Pressure, Approach Motivation Leads to Right-Oriented Bias},
+  title = {The Right Side? {Under} Time Pressure, Approach Motivation Leads to Right-Oriented Bias},
   journal = {Psychology Science},
   volume = {22},
   number = {11},
@@ -107,7 +107,7 @@ @article{Roskes2011
 
 @article{Martinsen2009,
 author = {Martinsen, M and Bratland-Sanda, S and Eriksson, A K and Sundgot-Borgen, J},
-title = {Dieting to win or to be thin? A study of dieting and disordered eating among adolescent elite athletes and non-athlete controls},
+title = {Dieting to win or to be thin? {A} study of dieting and disordered eating among adolescent elite athletes and non-athlete controls},
 journal = {British Journal of Sports Medicine},
 volume = {44},
 number = {1},
@@ -128,7 +128,7 @@ @article{Nafstad1999
 }
 
 @article{Gilovich1985,
-author = {Thomas Gilovich and Robert Vallone and and Amos Tversky},
+author = {Thomas Gilovich and Robert Vallone and Amos Tversky},
 title = {The Hot Hand in Basketball: On the Misperception of Random Sequences},
 journal = {Cognitive Psychology},
 volume = {17},
@@ -173,7 +173,7 @@ @article{Brown2004
 
 @article{Witte2007,
 author = {John Witte and David Weimer and Arnold Shober and Paul Schlomer},
-title = {The Performance of Charter Schools in Wisconsin},
+title = {The Performance of Charter Schools in {Wisconsin}},
 journal = {Journal of Policy Analysis and Management},
 volume = {26},
 number = {3},
@@ -250,7 +250,7 @@ @article{Hesterberg2015
 }
 
 @article{Poole1989,
-  title = {Mate guarding, reproductive success and female choice in African elephants},
+  title = {Mate guarding, reproductive success and female choice in {African} elephants},
   journal = {Animal Behaviour},
   volume = {37},
   pages = {842--849},
@@ -265,7 +265,7 @@ @article{Gelman2007
   year = {2007},
   month = {Sept},
   pages = {813--823},
-  title = {An Analysis of the NYPD's Stop-and-Frisk Policy in the Context of Claims of Racial Bias},
+  title = {An Analysis of the {NYPD's} Stop-and-Frisk Policy in the Context of Claims of Racial Bias},
   volume = {102}
 }
 
@@ -376,7 +376,7 @@ @article {Sampson1997
 
 @article{Anderson2009,
   author = {Kyle J. Anderson and David A. Pierce},
-  title = {Officiating bias: the effect of foul differential on foul calls in NCAA basketball},
+  title = {Officiating bias: the effect of foul differential on foul calls in {NCAA} basketball},
   volume = {27},
   number = {7},
   pages = {687-94},
@@ -387,7 +387,7 @@ @article{Anderson2009
 
 @Article{Noecker2012,
   author={Noecker, Cecilia A. and Roback, Paul},
-  title={New Insights on the Tendency of NCAA Basketball Officials to Even Out Foul Calls},
+  title={New Insights on the Tendency of {NCAA} Basketball Officials to Even Out Foul Calls},
   journal={Journal of Quantitative Analysis in Sports},
   year={2012},
   volume={8},
@@ -397,8 +397,7 @@ @Article{Noecker2012
 }
 
 @article{Randall2014,
-  title = {Exploring disparities in acute myocardial infarction events between Aboriginal
-           and non-Aboriginal Australians: Roles of age, gender,   geography and area-level disadvantage},
+  title = {Exploring disparities in acute myocardial infarction events between {Aboriginal and non-Aboriginal Australians}: Roles of age, gender, geography and area-level disadvantage},
   journal = {Health \& Place},
   volume = {28},
   pages = {58-66},
@@ -471,7 +470,7 @@ @article{Holst1988
 
 @article{Proudfoot2003,
   author = {J Proudfoot and D Goldberg and A Mann and B Everitt and I Marks and J A Gray},
-  journal = {psychological Medicine},
+  journal = {Psychological Medicine},
   title = {Computerized, Interactive, Multimedia Cognitive-Behavioural Program for Anxiety and Depression in General Practice}, 
   month = {Feb},
   year = {2003},
@@ -480,3 +479,14 @@ @article{Proudfoot2003
   number = {2},
   doi = {10.1017/s0033291702007225}
 }
+
+@article{Camill2004,
+  author = {Camill Philip and McKone Mark J. and Sturges Sean T. and Severud William J. and Ellis Erin and Limmer Jacob and Martin Christopher B. and Navratil Ryan T. and Purdie Amy J. and Sandel Brody S. and Talukder Shano and Trout Andrew},
+  title = {Community- and Ecosystem-level Changes in a Specie-rich Tallgrass Prairie Restoration},
+  year = {2004},
+  journal = {Ecological Applications},
+  volume = {14},
+  number = {6},
+  pages = {1680-1694},
+  doi = {10.1890/03-5273},
+}
diff --git a/docs/bookdown-bysh.pdf b/docs/bookdown-bysh.pdf