Skip to content

Commit

Permalink
June 2020 hand edits to entire text
Browse files Browse the repository at this point in the history
  • Loading branch information
proback committed Jul 12, 2020
1 parent ed8ee95 commit 872f4ed
Show file tree
Hide file tree
Showing 7 changed files with 5,210 additions and 3,017 deletions.
187 changes: 96 additions & 91 deletions 09-Two-Level-Longitudinal-Data.Rmd

Large diffs are not rendered by default.

64 changes: 28 additions & 36 deletions 10-Multilevel-Data-With-More-Than-Two-Levels.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -476,24 +476,21 @@ We once again begin with the __unconditional means model__ \index{unconditional

- Level One (timepoint within plant):

\begin{equation}
\begin{equation*}
Y_{ijk} = a_{ij}+\epsilon_{ijk} \textrm{ where } \epsilon_{ijk}\sim N(0,\sigma^2)
(\#eq:initunun)
\end{equation}
\end{equation*}

- Level Two (plant within pot):

\begin{equation}
\begin{equation*}
a_{ij} = a_{i}+u_{ij} \textrm{ where } u_{ij}\sim N(0,\sigma_{u}^{2})
(\#eq:initunun2)
\end{equation}
\end{equation*}

- Level Three (pot):

\begin{equation}
\begin{equation*}
a_{i} = \alpha_{0}+\tilde{u}_{i} \textrm{ where } \tilde{u}_{i} \sim N(0,\sigma_{\tilde{u}}^{2})
(\#eq:initunun3)
\end{equation}
\end{equation*}

where the heights of plants from different pots are considered independent, but plants from the same pot are correlated as well as measurements at different times from the same plant.

Expand All @@ -512,10 +509,9 @@ Keeping track of all the model terms, especially with three subscripts, is not a

The three-level unconditional means model can also be expressed as a composite model:

\begin{equation}
\begin{equation*}
Y_{ijk}=\alpha_{0}+\tilde{u}_{i}+u_{ij}+\epsilon_{ijk}
(\#eq:initununcomp)
\end{equation}
\end{equation*}
and this composite model can be fit using statistical software:

```{r, comment=NA}
Expand Down Expand Up @@ -549,10 +545,9 @@ The three-level unconditional growth model (Model B) can be specified either usi

- Level One (timepoint within plant):

\begin{equation}
\begin{equation*}
Y_{ijk} = a_{ij}+b_{ij}\textrm{time}_{ijk}+\epsilon_{ijk}
(\#eq:timewithplnt)
\end{equation}
\end{equation*}

- Level Two (plant within pot):

Expand All @@ -570,11 +565,10 @@ b_{i} & = \beta_{0}+\tilde{v}_{i}

or as a composite model:

\begin{equation}
\begin{equation*}
Y_{ijk}=[\alpha_{0}+\beta_{0}\textrm{time}_{ijk}]+
[\tilde{u}_{i}+{v}_{ij}+\epsilon_{ijk}+(\tilde{v}_{i}+{v}_{ij})\textrm{time}_{ijk}]
(\#eq:compmodb)
\end{equation}
\end{equation*}

where $\epsilon_{ijk}\sim N(0,\sigma^2)$,

Expand Down Expand Up @@ -779,9 +773,9 @@ However, when it is possible to remove boundary constraints through reasonable m

- Level One (timepoint within plant):

\begin{equation}
\begin{equation*}
Y_{ijk} = a_{ij}+b_{ij}\textrm{time}_{ijk}+\epsilon_{ijk}
\end{equation}
\end{equation*}

- Level Two (plant within pot):

Expand Down Expand Up @@ -848,7 +842,7 @@ Under the parametric bootstrap, we must simulate data under the null hypothesis
- Produce a histogram of likelihood ratio statistics to illustrate its behavior when the null hypothesis is true
- Calculate a p-value by finding the proportion of times the bootstrapped test statistic is greater than our observed test statistic

Let's see how new plant heights are generated under the parametric bootstrap. Consider, for instance, $i=1$ and $j=1,2$. That is, consider Plants \#11 and \#12. These plants are found in Pot \#1, which was randomly assigned to contain sterilized soil from a restored prairie (STP):
Let's see how new plant heights are generated under the parametric bootstrap. Consider, for instance, $i=1$ and $j=1,2$. That is, consider Plants \#11 and \#12 as shown in Table \@ref(tab:10verb7). These plants are found in Pot \#1, which was randomly assigned to contain sterilized soil from a restored prairie (STP):

```{r, 10verb7, echo=FALSE, comment=NA}
verb7 <- seedwd[1:2, c(2:11)]
Expand Down Expand Up @@ -1037,10 +1031,9 @@ For instance, consider Model C, where we must estimate a total of 15 parameters:

- Level One (timepoint within plant):

\begin{equation}
\begin{equation*}
Y_{ijk} = a_{ij}+b_{ij}\textrm{time}_{ijk}+\epsilon_{ijk}
(\#eq:lev1timemodcp)
\end{equation}
\end{equation*}

- Level Two (plant within pot):

Expand Down Expand Up @@ -1110,9 +1103,9 @@ By following the options above, our potential 30-parameter model (C_plus) can be

- Level One:

\begin{equation}
\begin{equation*}
Y_{ijk} = a_{ij}+b_{ij}\textrm{time}_{ijk}+\epsilon_{ijk}
\end{equation}
\end{equation*}

- Level Two:

Expand Down Expand Up @@ -1147,9 +1140,9 @@ In Model C we considered the main effects of soil type and sterilization on lead

- Level One:

\begin{equation}
\begin{equation*}
Y_{ijk} = a_{ij}+b_{ij}\textrm{time}_{ijk}+\epsilon_{ijk}
\end{equation}
\end{equation*}

- Level Two:

Expand Down Expand Up @@ -1193,9 +1186,9 @@ Our final model (Model F), with its constraints on Level Three error terms, can

- Level One:

\begin{equation}
\begin{equation*}
Y_{ijk} = a_{ij}+b_{ij}\textrm{time}_{ijk}+\epsilon_{ijk}
\end{equation}
\end{equation*}

- Level Two:

Expand Down Expand Up @@ -1367,11 +1360,10 @@ As in Chapter \@ref(ch-lon), it is important to be aware of the covariance struc

We will first consider Model B with $\tilde{v}_{i}$ at Level Three, and then we will evaluate the resulting covariance structure that results from removing $\tilde{v}_{i}$, thereby restricting $\sigma_{\tilde{v}}^{2}=\sigma_{\tilde{u}\tilde{v}}=0$. The composite version of Model B has been previously expressed as:

\begin{equation}
\begin{equation*}
Y_{ijk}=[\alpha_{0}+\beta_{0}\textrm{time}_{ijk}]+
[\tilde{u}_{i}+u_{ij}+\epsilon_{ijk}+(\tilde{v}_{i}+v_{ij})\textrm{time}_{ijk}]
(\#eq:modbcomp)
\end{equation}
\end{equation*}

where $\epsilon_{ijk}\sim N(0,\sigma^2)$,

Expand Down Expand Up @@ -1567,7 +1559,7 @@ In Section \@ref(threelevel-paraboot) we sought to perform a significance test c

11. In Model C, we initially addressed boundary constraints by removing the Level Three correlation between error terms from our multilevel model. What other model adjustments might we have considered?

12. How does Figure \@ref(fig:paraboot) show that a likelihood ratio test using a chi-square distribution would be biased?
12. How does Figure \@ref(fig:paraboot10) show that a likelihood ratio test using a chi-square distribution would be biased?

13. In Section \@ref(sec:explodingvarcomps), a model with 52 parameters is described: (a) illustrate that the model does indeed contain 52 parameters; (b) explain how to minimize the total number of parameters using ideas from Section \@ref(sec:explodingvarcomps); (c) what assumptions have you made in your simplification in (b)?

Expand Down Expand Up @@ -1612,7 +1604,7 @@ kable(table4chp10, booktabs=T, escape=F,

21. At the bottom of Table \@ref(tab:table4chp10), the percent of variance explained is given within and between neighborhoods. Explain what these values likely represent and how they were calculated.

22. Table \@ref(tab:table5chp10) shows a portion of Table 4 from @Sampson1997. Describe the multilevel model that likely produced this table. State the primary result from this table in context. [Note that collective efficacy is a Level Three covariate in this table, summarized over an entire neighborhood.] Estimates of neighborhood-level coefficients control for gender, marital status, homeownership, ethnicity, mobility, age, years in neighborhood, and SES of those interviewed. Model 1 accounts for 70.5\% of the variation between neighborhoods in perceived violence, whereas model 2 accounts for 77.8\% of the variation.
22. Table \@ref(tab:table5chp10) shows a portion of Table 4 from @Sampson1997. Describe the multilevel model that likely produced this table. State the primary result from this table in context. [Note that collective efficacy is a Level Three covariate in this table, summarized over an entire neighborhood.] Estimates of neighborhood-level coefficients control for gender, marital status, homeownership, ethnicity, mobility, age, years in neighborhood, and SES of those interviewed. Model 1 accounts for 70.5\% of the variation between neighborhoods in perceived violence, whereas Model 2 accounts for 77.8\% of the variation.

```{r, include=FALSE}
Variable2 <- c("Concentrated disadvantage","Immigrant concentration","Residential stability","Collective efficacy")
Expand All @@ -1632,7 +1624,7 @@ kable(table5chp10, booktabs=T,
caption="A portion of Table 4: Neighborhood correlates of perceived neighborhood violence from Sampson et al. (1997).") %>%
add_header_above(c(" ","Social composition"=3, "Social comp and collective efficacy"=3)) %>%
add_header_above(c(" ","Model 1"=3, "Model 2"=3)) %>%
row_spec(1, bold=T)
kable_styling(latex_options = "scale_down")
```


Expand Down
14 changes: 6 additions & 8 deletions 11-Generalized-Linear-Multilevel-Models.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -561,7 +561,7 @@ At this point, you might imagine expanding model building efforts in a couple of

### A unified multilevel approach (the framework we'll use) {#unified-glmm}

As in Chapters \@ref(ch-multilevelintro) and \@ref(ch-lon), we will write out a composite model after first expressing Level One and Level Two models. That is, we will create Level One and Level Two models as in Section \@ref(twostage-glmm), but we will then combine those models into a composite model and estimate all model parameters simultaneously. Once again $Y_{ij}$ is an indicator variable recording if the $j^{th}$ foul from Game $i$ was called on the home team (1) or the visiting team (0), and $p_{ij}$ is the true probability that the $j^{th}$ foul from Game $i$ was called on the home team. Our Level One model with foul differential as the sole predictor is given by Equation \@ref(eq:lev1glmm):
As in Chapters \@ref(ch-multilevelintro) and \@ref(ch-lon), we will write out a composite model after first expressing Level One and Level Two models. That is, we will create Level One and Level Two models as in Section \@ref(twostage-glmm), but we will then combine those models into a composite model and estimate all model parameters simultaneously. Once again $Y_{ij}$ is an indicator variable recording if the $j^{th}$ foul from Game $i$ was called on the home team (1) or the visiting team (0), and $p_{ij}$ is the true probability that the $j^{th}$ foul from Game $i$ was called on the home team. Our Level One model with foul differential as the sole predictor is given by Equation \@ref(eq:lev1glmm) generalized to Game $i$:

\[ \log\bigg(\frac{p_{ij}}{1-p_{ij}}\bigg)=a_i+b_i\mathrm{foul.diff}_{ij} \]

Expand Down Expand Up @@ -679,10 +679,9 @@ In the College Basketball Referees case study, our two primary Level Two covaria

How will treating home and visiting teams as random effects change our multilevel model? Another way we might view this situation is by considering that Game is not the only Level Two observational unit we might have selected. What if we instead decided to focus on Home Team as the Level Two observational unit? That is, what if we assumed that fouls called on the same home team across all games must be correlated? In this case, we could redefine our Level One model from Equation \@ref(eq:lev1glmm). Let $Y_{hj}$ be an indicator variable recording if the $j^{th}$ foul from Home Team $h$ was called on the home team (1) or the visiting team (0), and $p_{hj}$ be the true probability that the $j^{th}$ foul from Home Team $h$ was called on the home team. Now, if we were to consider a simple model with foul differential as the sole predictor, we could model the probability of a foul on the home team for Home Team $h$ with the model:

\begin{equation}
\begin{equation*}
\log\bigg(\frac{p_{hj}}{1-p_{hj}}\bigg)=a_h+b_h\mathrm{foul.diff}_{hj}
(\#eq:lev1bglmm)
\end{equation}
\end{equation*}

In this case, $e^{a_{h}}$ represents the odds that a foul is called on the home team when total fouls are equal between both teams in a game involving Home Team $h$, and $e^{b_{h}}$ represents the multiplicative change in the odds that a foul is called on the home team for every extra foul on the home team compared to the visitors in a game involving Home Team $h$. After fitting logistic regression models for each of the 39 teams in our data set, we see in Figure \@ref(fig:gmu-histmat3) variability in fitted intercepts (mean=-0.15, sd=0.33) and slopes (mean=-0.22, sd=0.12) among the 39 teams, although much less variability than we observed from game-to-game. Of course, each logistic regression model for a home team was based on about 10 times more foul calls than each model for a game, so observing less variability from team-to-team was not unexpected.

Expand Down Expand Up @@ -744,10 +743,9 @@ We could include terms that vary by home or visiting team in other Level Two equ

Our composite model then looks like:

\begin{equation}
\begin{equation*}
\log\bigg(\frac{p_{i[gh]j}}{1-p_{i[gh]j}}\bigg) = [\alpha_{0}+\beta_{0}\mathrm{foul.diff}_{ij}]+[u_{i}+v_{h}+w_{g}].
(\#eq:compmoda)
\end{equation}
\end{equation*}
We will refer to this as Model A3, where we look at the effect of foul differential on the odds a foul is called on the home team, while accounting for three crossed random effects at Level Two (game, home team, and visiting team). Parameter estimates for Model A3 are given below:

```{r, include=FALSE}
Expand Down Expand Up @@ -1337,7 +1335,7 @@ kable(table5chp11, booktabs=T,

Perform exploratory analyses and then run multilevel models to examine significant determinants of successful challenges. Write a short report comparing specific reasons for the challenge to the greater context in which a challenge was made.

4. __Yelp restaurant reviews__ @Mohr2018 assembled a data set of Yelp restaurant reviews in Madison, WI, from 2005 through 2017 based on the Yelp Dataset Challenge on [Kaggle](https://www.kaggle.com/yelp-dataset/yelp-dataset). Their data in `yelp.csv` contains almost 60,000 reviews on 888 restaurants from over 20,000 reviewers, and it contains a selection of variables on the reviewer (e.g., total reviews, average stars), the restaurant (e.g., neighborhood, average stars, category), and the review itself (e.g., stars, year, useful ratings, actual text).
4. __Yelp restaurant reviews__ @Mohr2018 assembled a data set of Yelp restaurant reviews in Madison, WI, from 2005 through 2017 based on the Yelp Dataset Challenge on [Kaggle](https://www.kaggle.com/yelp-dataset/yelp-dataset). The data in `yelp.csv` contains almost 60,000 reviews on 888 restaurants from over 20,000 reviewers, and it contains a selection of variables on the reviewer (e.g., total reviews, average stars), the restaurant (e.g., neighborhood, average stars, category), and the review itself (e.g., stars, year, useful ratings, actual text).

There are various questions that could be pursued with this data. Here are just a few ideas:
- how can we model number of stars in the rating, or whether or not the rating was 5 stars or not?
Expand Down
8 changes: 4 additions & 4 deletions _bookdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@ book_filename: "bookdown-bysh"
chapter_name: "Chapter "
output_dir: docs
rmd_files: ["index.Rmd",
"04-Poisson-Regression.Rmd",
"05-Generalized-Linear-Models.Rmd",
"06-Logistic-Regression.Rmd",
"07-Correlated-Data.Rmd",
"08-Introduction-to-Multilevel-Models.Rmd",
"09-Two-Level-Longitudinal-Data.Rmd",
"10-Multilevel-Data-With-More-Than-Two-Levels.Rmd",
"11-Generalized-Linear-Multilevel-Models.Rmd",
"99-References.Rmd"]
clean: [packages.bib, bookdown.bbl]
36 changes: 23 additions & 13 deletions bib/articles.bib
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ @article{Rodgers2001

@article{Mathews2005,
author = {T.J. Mathews and Brady E. Hamilton},
title = {Trend Analysis of the Sex Ratio at Birth in the United States},
title = {Trend Analysis of the Sex Ratio at Birth in the {United States}},
journal={National Vital Statistics Reports},
volume={53},
number = {20},
Expand Down Expand Up @@ -63,7 +63,7 @@ @article{Cameron1986

@article{Scotto1974,
author = {Joseph Scotto and Alfred W. Kopf and Fredrick Urbach},
title = {Non-melanoma skin cancer among caucasians in four areas of the United States},
title = {Non-melanoma skin cancer among caucasians in four areas of the {United States}},
journal = {Cancer},
volume = {34},
number = {4},
Expand Down Expand Up @@ -96,7 +96,7 @@ @article{Nelder1972

@article{Roskes2011,
author = {Marieke Roskes and Daniel Sligte and Shaul Shalvi and Carsten K. W. De Dreu},
title = {The Right Side? Under Time Pressure, Approach Motivation Leads to Right-Oriented Bias},
title = {The Right Side? {Under} Time Pressure, Approach Motivation Leads to Right-Oriented Bias},
journal = {Psychology Science},
volume = {22},
number = {11},
Expand All @@ -107,7 +107,7 @@ @article{Roskes2011

@article{Martinsen2009,
author = {Martinsen, M and Bratland-Sanda, S and Eriksson, A K and Sundgot-Borgen, J},
title = {Dieting to win or to be thin? A study of dieting and disordered eating among adolescent elite athletes and non-athlete controls},
title = {Dieting to win or to be thin? {A} study of dieting and disordered eating among adolescent elite athletes and non-athlete controls},
journal = {British Journal of Sports Medicine},
volume = {44},
number = {1},
Expand All @@ -128,7 +128,7 @@ @article{Nafstad1999
}

@article{Gilovich1985,
author = {Thomas Gilovich and Robert Vallone and and Amos Tversky},
author = {Thomas Gilovich and Robert Vallone and Amos Tversky},
title = {The Hot Hand in Basketball: On the Misperception of Random Sequences},
journal = {Cognitive Psychology},
volume = {17},
Expand Down Expand Up @@ -173,7 +173,7 @@ @article{Brown2004

@article{Witte2007,
author = {John Witte and David Weimer and Arnold Shober and Paul Schlomer},
title = {The Performance of Charter Schools in Wisconsin},
title = {The Performance of Charter Schools in {Wisconsin}},
journal = {Journal of Policy Analysis and Management},
volume = {26},
number = {3},
Expand Down Expand Up @@ -250,7 +250,7 @@ @article{Hesterberg2015
}

@article{Poole1989,
title = {Mate guarding, reproductive success and female choice in African elephants},
title = {Mate guarding, reproductive success and female choice in {African} elephants},
journal = {Animal Behaviour},
volume = {37},
pages = {842--849},
Expand All @@ -265,7 +265,7 @@ @article{Gelman2007
year = {2007},
month = {Sept},
pages = {813--823},
title = {An Analysis of the NYPD's Stop-and-Frisk Policy in the Context of Claims of Racial Bias},
title = {An Analysis of the {NYPD's} Stop-and-Frisk Policy in the Context of Claims of Racial Bias},
volume = {102}
}

Expand Down Expand Up @@ -376,7 +376,7 @@ @article {Sampson1997

@article{Anderson2009,
author = {Kyle J. Anderson and David A. Pierce},
title = {Officiating bias: the effect of foul differential on foul calls in NCAA basketball},
title = {Officiating bias: the effect of foul differential on foul calls in {NCAA} basketball},
volume = {27},
number = {7},
pages = {687-94},
Expand All @@ -387,7 +387,7 @@ @article{Anderson2009

@Article{Noecker2012,
author={Noecker, Cecilia A. and Roback, Paul},
title={New Insights on the Tendency of NCAA Basketball Officials to Even Out Foul Calls},
title={New Insights on the Tendency of {NCAA} Basketball Officials to Even Out Foul Calls},
journal={Journal of Quantitative Analysis in Sports},
year={2012},
volume={8},
Expand All @@ -397,8 +397,7 @@ @Article{Noecker2012
}

@article{Randall2014,
title = {Exploring disparities in acute myocardial infarction events between Aboriginal
and non-Aboriginal Australians: Roles of age, gender, geography and area-level disadvantage},
title = {Exploring disparities in acute myocardial infarction events between {Aboriginal and non-Aboriginal Australians}: Roles of age, gender, geography and area-level disadvantage},
journal = {Health \& Place},
volume = {28},
pages = {58-66},
Expand Down Expand Up @@ -471,7 +470,7 @@ @article{Holst1988

@article{Proudfoot2003,
author = {J Proudfoot and D Goldberg and A Mann and B Everitt and I Marks and J A Gray},
journal = {psychological Medicine},
journal = {Psychological Medicine},
title = {Computerized, Interactive, Multimedia Cognitive-Behavioural Program for Anxiety and Depression in General Practice},
month = {Feb},
year = {2003},
Expand All @@ -480,3 +479,14 @@ @article{Proudfoot2003
number = {2},
doi = {10.1017/s0033291702007225}
}

@article{Camill2004,
author = {Camill Philip and McKone Mark J. and Sturges Sean T. and Severud William J. and Ellis Erin and Limmer Jacob and Martin Christopher B. and Navratil Ryan T. and Purdie Amy J. and Sandel Brody S. and Talukder Shano and Trout Andrew},
title = {Community- and Ecosystem-level Changes in a Specie-rich Tallgrass Prairie Restoration},
year = {2004},
journal = {Ecological Applications},
volume = {14},
number = {6},
pages = {1680-1694},
doi = {10.1890/03-5273},
}
Binary file modified docs/bookdown-bysh.pdf
Binary file not shown.
Loading

0 comments on commit 872f4ed

Please sign in to comment.