-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathchart-blind.Rmd
445 lines (334 loc) · 29.8 KB
/
chart-blind.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
---
title: "Discussion of Piechart of the Blind 1874"
author: "Heike Hofmann"
date: "February 26, 2016"
output: html_document
---
This is the chart that we are discussing:
![Chart of the Blind by Fred H. Wines](ca000104.gif), [high resolution chart](https://www.loc.gov/resource/g3701gm.gct00008/?sp=104)
- Fred H. Wines was the Secretary of the Illinois State Board of Charities. It seems that he had a personal interest in this aspect of the census records.
- At the top left of the chart a scale is given (just as in a geographic map): 1 square inch corresponds to 4,800 individuals. Obviously graphs now can be re-scaled, so it would be great to include a visual scale as well.
- Each of the state circles shows the number of blind individuals in each state. This number should be highly correlated with the actual population - so it might be more informative to also show the proportion of blind compared to the overall population. Presumably there a case could be made to get a proportion of state funding for the blind plus some extra allocation in case the proportion of the blind is higher than in other states.
- The whole interpretation of this chart is confounded by the missing comparison to the overall population. At face value, there are more blind men than blind women. Does that mean that the proportion of men affected by blindness is also higher or is this a reflection of the gender balance at the time?
- donut rings: the inner ring is the 1860 data, the outer ring is the data from 1870. These chart only work, if there is an increase in the number of blind individuals in every single state. The U.S. was growing at a fast pace during this time (growing from about 31.2 million people in 1860 to 38.5 million people in 1870), and as long as living conditions and health care conditions did not improve, we would expect proportionally about the same increase in the blind population.
- link to [current statistics](https://nfb.org/blindness-statistics)
- data for Colorado (state since August 1, 1876; territory since February 28, 1861) and Nevada (state since October 31, 1864) is missing.
- pros for the angles: they are not affected by the overall number, but con: comparing angles (with different starting points) is cognitively a hard task (Cleveland & McGill reference).
- con against angles: the cumulative angles introduce dependencies
- con against the pies: the areas of the first two rows of circles combined are the same area as each one of the large circles at the top of the page. Visually this is not at all clear. HH: I am not at all sure how we could visualize this relationship in a meaningful way.
- earliest pie chart is credited to Statistical Breviary (William Playfair, 1801)
- other sources critiquing pie charts
- [Save the Pies for Dessert (Stephen Few)](http://www.perceptualedge.com/articles/08-21-07.pdf) Pro: shows part-to-whole relationships, better than bar charts at quarter percentage marks (0%,25%,50%,75%,or 100%) Con: Pie charts often worse than bar charts for other percentage differences. "[William Cleaveland] refers to pie charts as 'pop charts' because they are commonly found in pop culture media, but much less in science and technology media."
- The Visual Display of Quantitative Information (Tufte, 2001, p 178) "A table is nearly always better than a dumb pie chart; the only worse design then a pie chart is several of them, for then the viewer is asked to compare quantities located in spatial disarray both within and between pies"
- Did we want to comment on technology 1870's vs 2016 when creating graphics? HH: I think commenting on the technology a bit would be good.
- Have we lost anything from that time? HH: I guess we have, but it's hard to tell what we have lost - but one example would be the 1890 census, which got damaged by fire in two separate incidences: [fate of 1890 census](http://www.archives.gov/publications/prologue/1996/spring/1890-census-1.html)
- now it is difficult to importing handwriting as a particular font for a pie chart. (I've heard mixed reactions on if handwritten text on graphics looks unprofessional or more artistic) - HH: Howard Wainer and others seem to think of it as artistic. There are several instances where they make charts look like their more dated counterparts, e.g. in [A century and a half of moral statistics in the United Kingdom: Variations on Joseph Fletcher's thematic maps](http://onlinelibrary.wiley.com/doi/10.1111/j.1740-9713.2012.00575.x/epdf)
- Placing "United States" across the entire pie chart. Placing "(Colored, including Indian and Chinese)" across two slices of the pie. Still possible by specifying xy-coordinates but not as straightforward as writing it on the chart.
- An interactive display of a pie chart, might have the division across all of the United States as an opening graphic. User clicks on a particular slice to see pie chart of that particular slice and continue to zoom in. Still not as good as bar chart to quickly get a sense of the divisions across all states.
- Does part of the appeal of pie charts vs bar chart have to do with rounded corners vs straight edge?
- [Why round corners are easier on the eyes](http://uxmovement.com/thinking/why-rounded-corners-are-easier-on-the-eyes/) Curved lines seem more organic. Does that mean a graphic that has too many straight lines seems unreal or unsettling? Is that why hand-drawn seems appealing? Is there an uncanny valley for statistical graphics? However adding a randomization to the lines may be adjusting the data...
## Approach to re-do the chart
###The data
Getting the data turns out to be a bit problematic. What we would like are state-level aggregates of the population size and the number of blind individuals by state, gender, nativity and race for both the 1870 census and the 1860 census (for the comparisons).
There are a few avenues available:
- individual data is available from a lot of sites (such as ancestry.com, FamilySearch, etc): e.g. in the 1870 census there are 391 individuals listed by the name of Helen Keller, who are white females (obviously not THE Helen Keller, because she was not born until 1880).
- state aggregate data as part of the Historical Census Browser provided by the University of Virginia at [http://mapserver.lib.virginia.edu/](http://mapserver.lib.virginia.edu/). Citation: (2004). Historical Census Browser. Retrieved [Date you accessed source], from the University of Virginia, Geospatial and Statistical Data Center: http://mapserver.lib.virginia.edu/.
The data comes from [ICPSR, Inter-university Consortium for Political and Social Research](http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/0003). Unfortunately, only political and economic factors are included in these data sources, but no information on disabilities, i.e. the number of blind or deaf individuals is absent.
- [ipums USA](https://usa.ipums.org/usa/) provides access to microsamples of all US census. However, microsamples of 1.2% for the 1860 and 1870 census include only a handful of blind individuals for each of the state, and therefore do not provide a reliable estimate of these proportions.
The reliability of the microsample for questions regarding the blind is not high enough to allow accurate statements: The microsample for the 1870 census consists of 443,377 individuals, out of which 252 are identified as blind, i.e. on average there are only 5 to 6 blind individuals in a population of size 10,000.
Let's assume that the actual proportion of blind individuals in the US at 1870 is actually 252 in 443,377. The microsample consists of 443,377 individuals. Let X be the number of blind individuals in the microsample. We can assume that X follows a Binomial distribution $B_{443377, 252/443377}$. This means that X has an expected value of 252, but its variance is almost as big as this (251.9), which means that a Wald 95% confidence interval is (220, 284), i.e. putting the uncertainty at about a quarter of the number of individuals.
At the state level the situation is worse, i.e. the microsample does not allow us to draw any inference at this level.
- Wines himself provides us with estimates of the proportions: for each of the pie segments he included a numeric value for the angle (out of 360). Based on these numbers, we can recover the proportions (if not the overall numbers).
###The new chart
- States are sorted according to proportions of native/foreign born individuals.
- Some things we didn't see before (unless looking very closely):
+ the proportion of blind individuals among foreigners is high for states that have been founded within the last two decades before the census was taken (such as WI 1848, ... IA 1846). However, this is not consistent - states along the East Coast, particularly New York, are traditional immigration states with large percentages of foreign born population. On the other hand, Kansas and West Virginia became states in 1861 and 1863, respectively, and the percentages of foreign born population are at below 10% in these two states, which is well below the national level of foreign born population of 16% in 1870. Again, these percentages are much more likely a reflection of state level demographics rather than characteristics of the state-level population of blind individuals.
+ Generally, the number of blind men is higher than the number of blind women in all states, but there are some states that show a huge gender imbalance between blind individuals: Delaware is the only state, in which the population of blind females is much higher than the male population. Oregon, Vermont, Maine, Virginia, and to a lesser degree, Mississippi, show a (significantly?) markedly higher percentage of blind men than women.
```{r, echo=FALSE, fig.width=9, fig.height=6, message=FALSE}
blindprop <- read.csv("data/blind-per.state.csv", na.strings="NA")
library(ggplot2)
#qplot(reorder(State, Female.blind.perc), Female.blind.perc, data=blindprop) + coord_flip()
library(tidyr)
library(dplyr)
blind <- blindprop %>% select(State, Male.native.blind.perc, Male.foreign.blind.perc, Female.native.blind.perc, Female.foreign.blind.perc) %>% gather(gender.born, blind, -State)
blind$gender <- "Male"
blind$gender[grep("Female", blind$gender.born)] <- "Female"
blind$nativity <- "Foreign"
blind$nativity[grep("native", blind$gender.born)] <- "Native"
library(RColorBrewer)
cols <- brewer.pal(n=8, name="Paired")
blind$gender.born <- with(blind, interaction(gender, nativity, sep="/"))
blind$gender.born <- factor(blind$gender.born, levels=c("Female/Native", "Female/Foreign", "Male/Foreign", "Male/Native"))
blind$State <- reorder(blind$State, blind$blind, function(x) x[1]+x[3])
ggplot(data=blind, aes(x=State, weight=blind, fill=gender.born)) +
geom_bar() +
coord_flip() +
scale_fill_manual("Gender/Nativity", values=cols[c(6,5,1,2)]) +
theme(legend.position="bottom") +
ylab("Percentage")
```
## Expected versus observed proportions of blind individuals
XXX We will need only one of these expected values. At the moment I lean towards using the values from the microsample because they take care of the two-way interaction between gender and nativitiy in each of the states. XXX
### State-wide aggregates
As a follow-up to the state-wide proportions of blind individuals by gender and nativity, we can calculate the proportions of all individuals, regardless of disabilities, and calculate the ratios of gender by nativity for the overall population. Using the marginal distributions for gender and nativity for each state, we calculate an expected percentage.
```{r, echo=FALSE, fig.width=9, fig.height=6, message=FALSE}
census1870 <- read.csv("data/states-1870.csv", stringsAsFactors = FALSE)
names(census1870)[1] <- "STATE"
foreigners <- census1870[, c("STATE", "FOREIGN.BORN.PERSONS", "NATIVE.BORN.PERSONS", "TOTAL.POPULATION")]
genders <- census1870[, c("STATE", "TOTAL.FEMALES", "TOTAL.MALES", "TOTAL.POPULATION")]
foreigners.all <- foreigners %>% gather(NATIVITY, number, -c(STATE, TOTAL.POPULATION ))
foreigners.all$NATIVITY <- gsub(".BORN.PERSONS", "", foreigners.all$NATIVITY)
foreigners.all$NATIVITYperc <- with(foreigners.all, number/TOTAL.POPULATION*100)
genders.all <- genders %>% gather(GENDER, number, -c(STATE, TOTAL.POPULATION ))
genders.all$GENDER <- gsub("TOTAL.", "", genders.all$GENDER)
genders.all$GENDERperc <- with(genders.all, number/TOTAL.POPULATION*100)
capwords <- function(s, strict = FALSE) {
cap <- function(s) paste(toupper(substring(s, 1, 1)),
{s <- substring(s, 2); if(strict) tolower(s) else s},
sep = "", collapse = " " )
sapply(strsplit(s, split = " "), cap, USE.NAMES = !is.null(names(s)))
}
genders.all$GENDER <- capwords(genders.all$GENDER, strict=TRUE)
genders.all$GENDER <- gsub("s$", "", genders.all$GENDER)
foreigners.all$NATIVITY <- capwords(foreigners.all$NATIVITY, strict=TRUE)
blind.plus <- merge(blind, foreigners.all[,c("STATE", "NATIVITY", "NATIVITYperc")], by.x=c("State", "nativity"), by.y=c("STATE", "NATIVITY"))
blind.plus <- merge(blind.plus, genders.all[,c("STATE", "GENDER", "GENDERperc")], by.x=c("State", "gender"), by.y=c("STATE", "GENDER"))
blind.plus$blind.exp <- with(blind.plus, NATIVITYperc*GENDERperc/100)
blind.plus$blind.resid <- with(blind.plus, (blind-blind.exp)/blind.exp)
blind.plus.fem <- blind.plus %>% group_by(State, gender) %>% summarize(
blind.exp = sum(blind.exp)
)
ggplot(data=blind, aes(x=State, weight=blind, fill=gender.born)) +
geom_bar() +
coord_flip() +
scale_fill_manual("Gender/Nativity", values=cols[c(6,5,1,2)]) +
theme(legend.position="bottom") +
ylab("Percentage") +
geom_point(aes(y=blind.exp), data=subset(blind.plus, gender.born=="Female/Native"), colour=cols[6], show.legend=FALSE) +
geom_point(aes(y=blind.exp), data=subset(blind.plus, gender.born=="Female/Native"), colour="white", shape=1, show.legend=FALSE) +
geom_point(aes(y=100-blind.exp), data=subset(blind.plus, gender.born=="Male/Native"), colour=cols[2], show.legend=FALSE) +
geom_point(aes(y=100-blind.exp), data=subset(blind.plus, gender.born=="Male/Native"), colour="white", shape=1, show.legend=FALSE) +
geom_point(aes(x= State, y=blind.exp), data=subset(blind.plus.fem, gender=="Female"), shape="|", show.legend=FALSE, inherit.aes=FALSE, colour="white", size=3)
```
The dots and the white vertical line segment shows the number of expected blind individuals in each of the four classes of nativity and gender, assuming that the proportion of blind individuals follows the distribution of the overall population, and that there is no interaction between gender and nativity for any of the states. Except for a few deviations, this latter assumption generally holds well (Oregon is an exception: here, there are fewer women than expected).
### 1% microsample
The chart below shows the actual proportions of blind individuals by gender and nativity for each state in 1870. The dots are showing expected proportions based on the 1.2% microsample for the 1870 census. The expected proportions are based on the assumption that gender and nativity demographics of each state are the same for blind individuals as for the total population.
There are obvious differences:
- Generally, we can see that with a few notable exceptions (Delaware, Rhode Island; Vermont, Connecticut, Louisiana, New Hampshire, Maine, Alabama, Mississippi), the proportions of native born blind men are following the expectation.
- in most states, the proportion of foreign born blind individuals is higher compared to what might be expected compared to the number of foreigners in each state. Exceptions to that are
- California: here, the proportion of foreign born blind reflect the proportion of foreigners in the whole population. However, there are about twice as many blind women among the foreigners as expected.
- Nebraska: the proportion of observed foreign born blind is only about 2/3 of the expected proportion.
These deviations go hand in hand with an over-estimation of the proportion of native born blind women in almost all states. Kansas, Delaware and Nebraska are the only states, in which the proportion of blind native born women is higher than expected based on the demographics of the overall population.
```{r, echo=FALSE, fig.width=9, fig.height=6, message=FALSE, warning=FALSE}
ipums <- read.csv("data/ipums-1870.csv")
gborn <- as.data.frame(xtabs(data=ipums, ~STATEICP+SEX+I(NATIVITY=="Foreign born")))
gborn$State <- toupper(gborn$STATEICP)
names(gborn)[3] <- "nativity"
levels(gborn$nativity) <- c("Native", "Foreign")
gborn<- gborn %>% group_by(State) %>% mutate(totals=sum(Freq))
gborn$blind.exp <- with(gborn, Freq/totals*100)
blind.plus2 <- merge(blind, gborn[, c("State", "SEX","nativity", "blind.exp")], by.x=c("State","gender", "nativity"), by.y=c("State", "SEX", "nativity"), all.x=TRUE)
blind.plus2.fem <- blind.plus2 %>% group_by(State, gender) %>% summarize(
blind.exp = sum(blind.exp)
)
ggplot(data=blind.plus2, aes(x=State, weight=blind, fill=gender.born)) +
geom_bar() +
coord_flip() +
scale_fill_manual("Gender/Nativity", values=cols[c(6,5,1,2)]) +
theme(legend.position="bottom") +
ylab("Percentage") +
geom_point(aes(y=blind.exp), data=subset(blind.plus2, gender=="Female" & nativity=="Native"), colour=cols[6], show.legend=FALSE) +
geom_point(aes(y=blind.exp), subset(blind.plus2, gender=="Female" & nativity=="Native"), colour="white", shape=1, show.legend=FALSE) +
geom_point(aes(y=100-blind.exp), data=subset(blind.plus2, gender=="Male" & nativity=="Native"), colour=cols[2], show.legend=FALSE) +
geom_point(aes(y=100-blind.exp), subset(blind.plus2, gender=="Male" & nativity=="Native"), colour="white", shape=1, show.legend=FALSE) +
geom_point(aes(x= State, y=blind.exp), data=subset(blind.plus2.fem, gender=="Female"), shape="|", show.legend=FALSE, inherit.aes=FALSE, colour="white", size=3)
```
## Putting the residuals on a map
This might be the real story in the data: we calculate expected proportions for blind by gender and nativity based on the assumption that the demographics of the general population are reflected among the blind. The deviations (in form of absolute residuals) are plotted on choropleth maps below. We see that the largest negative residuals (i.e. most overestimated proportion of bline) are the native born female. There are far fewer native born blind women than expected. The most under-estimated group are foreign born blind men, who in almost all of the states make up a larger number than expected.
California makes an exception: foreign born blind women in California are a hugely underestimated group, while there are fewer foreign born blind men than expected.
Kansas and Nebraska are two of the youngest states and might have been affected the most by the Homestead Act of 1862, which gave the opportunity to anyone 'who had not taken up arms against the United States' to own 160 acres of land, if they settled and farmed the land for five years. In both Kansas and Nebraska, the proportion of foreign-born blind individuals is lower than expected, favoring the proportion of native born blind men and women, instead.
Nebraska is a state since 1867, Kansas since 1861 (Nevada 1864, West Virginia 1863).
```{r, echo=FALSE, message=FALSE, fig.width=8, fig.height=7}
library(maps)
states <- map_data("state")
blind.plus2$State <- tolower(blind.plus2$State)
states.resids <- merge(states, na.omit(blind.plus2), by.x="region", by.y="State")
states.resids <- states.resids[order(states.resids$order),]
library(ggplot2)
library(ggthemes)
ggplot(data=states,
aes(x=long, y=lat, group=group, order=order)) +
geom_polygon(fill=NA, colour="grey90", size=.25) +
geom_polygon(aes(fill=blind-blind.exp), data=states.resids) +
facet_wrap(~gender.born) + scale_fill_gradient2() +
theme_bw() +
ggtitle("absolute residuals") +
theme(legend.position="bottom") +
theme(axis.text=element_blank(), axis.title = element_blank(),
axis.ticks=element_blank())
```
```{r, echo=FALSE, message=FALSE, fig.width=8, fig.height=7}
states.resids$rel.resids <- with(states.resids, (blind-blind.exp)/blind.exp)
# scale the relative residuals back to +-2:
states.resids$rel.resids <- with(states.resids, pmax(pmin(2, rel.resids), -2))
ggplot(data=states,
aes(x=long, y=lat, group=group, order=order)) +
geom_polygon(fill=NA, colour="grey90", size=.25) +
geom_polygon(aes(fill=rel.resids), data=states.resids) +
facet_wrap(~gender.born) + scale_fill_gradient2(limits=c(-2,2)) +
theme_bw() +
ggtitle("relative residuals") +
theme(legend.position="bottom") +
theme(axis.text=element_blank(), axis.title = element_blank(),
axis.ticks=element_blank())
```
##Investigating the relationship of gender and race by state
```{r, echo=FALSE}
blind2 <- blindprop %>% select(State, REGION, Male.white.blind.perc, Male.colored.blind.perc, Female.white.blind.perc, Female.colored.blind.perc) %>% gather(gender.race, blind, Male.white.blind.perc, Male.colored.blind.perc, Female.white.blind.perc, Female.colored.blind.perc)
blind2$gender <- "Male"
blind2$gender[grep("Female", blind2$gender.race)] <- "Female"
blind2$race <- "White"
blind2$race[grep("colored", blind2$gender.race)] <- "Non-White"
library(RColorBrewer)
cols <- brewer.pal(n=8, name="Paired")
blind2$gender.race <- with(blind2, interaction(gender, race, sep="/"))
blind2$gender.race <- factor(blind2$gender.race, levels=c("Female/White", "Female/Non-White", "Male/Non-White", "Male/White"))
blind2$State <- reorder(blind2$State, blind2$blind, function(x) x[1]+x[3])
ggplot(data=blind2, aes(x=State, weight=blind, fill=gender.race)) +
geom_bar() +
coord_flip() +
scale_fill_manual("Gender/Race", values=cols[c(6,5,1,2)]) +
theme(legend.position="bottom") +
ylab("Percentage") + ggtitle("Proportion of blind by Gender & Race for each State")
```
```{r, echo=FALSE}
grace <- as.data.frame(xtabs(data=ipums, ~STATEICP+SEX+I(RACE=="White")))
grace$State <- toupper(grace$STATEICP)
names(grace)[3] <- "race"
levels(grace$race) <- c("Non-White", "White")
grace<- grace %>% group_by(State) %>% mutate(totals=sum(Freq))
grace$blind.exp <- with(grace, Freq/totals*100)
blind.plus3 <- merge(blind2, grace[, c("State", "SEX","race", "blind.exp")], by.x=c("State","gender", "race"), by.y=c("State", "SEX", "race"), all.x=TRUE)
blind.plus3.fem <- blind.plus3 %>% group_by(State, gender) %>% summarize(
blind.exp = sum(blind.exp)
)
ggplot(data=blind.plus3, aes(x=State, weight=blind, fill=gender.race)) +
geom_bar() +
coord_flip() +
scale_fill_manual("Gender/Race", values=cols[c(6,5,1,2)]) +
theme(legend.position="bottom") +
ylab("Percentage") +
geom_point(aes(y=blind.exp), data=subset(blind.plus3, gender=="Female" & race=="White"), colour=cols[6], show.legend=FALSE) +
geom_point(aes(y=blind.exp), subset(blind.plus3, gender=="Female" & race=="White"), colour="white", shape=1, show.legend=FALSE) +
geom_point(aes(y=100-blind.exp), data=subset(blind.plus3, gender=="Male" & race=="White"), colour=cols[2], show.legend=FALSE) +
geom_point(aes(y=100-blind.exp), subset(blind.plus3, gender=="Male" & race=="White"), colour="white", shape=1, show.legend=FALSE) +
geom_point(aes(x= State, y=blind.exp), data=subset(blind.plus3.fem, gender=="Female"), shape="|", show.legend=FALSE, inherit.aes=FALSE, colour="white", size=3)
```
The maps seem to tell a story of discrimination: the proportion of white men among the blind is so much higher higher than expected, that it dominates all of the other groups. Blind women are generally fewer than expected in the North, while the estimates seem to be mostly on target in the South. The proportion of Non-White blind (black, chinese and american indian/Alaska native) are estimated at higher levels than observed, particularly in the South and the West Coast.
Nebraska and Kansas have, again, a different pattern, maybe due to their special status as youngest states.
```{r, echo=FALSE, message=FALSE, fig.width=8, fig.height=7}
library(maps)
states <- map_data("state")
blind.plus3$State <- tolower(blind.plus3$State)
states.resids <- merge(states, na.omit(blind.plus3), by.x="region", by.y="State")
states.resids <- states.resids[order(states.resids$order),]
library(ggplot2)
library(ggthemes)
ggplot(data=states,
aes(x=long, y=lat, group=group, order=order)) +
geom_polygon(fill=NA, colour="grey90", size=.25) +
geom_polygon(aes(fill=blind-blind.exp), data=states.resids) +
facet_wrap(~gender.race) + scale_fill_gradient2() +
theme_bw() +
ggtitle("absolute residuals") +
theme(legend.position="bottom") +
theme(axis.text=element_blank(), axis.title = element_blank(),
axis.ticks=element_blank())
```
# Changes between 1860 and 1870
```{r}
circles <- read.csv("data/diameters.csv")
fourth <- subset(circles, location %in% c("4I", "4O"))
fourth$Year <- factor(c(1860, 1870)[as.numeric(factor(fourth$location))])
fsp <- tidyr::spread(fourth[, c("State","location", "blindEst")], location, blindEst)
fsp$Change <- with(fsp, `4O` - `4I`)
fsp$relChange <- with(fsp, Change/`4I`)
fsp$Multiple <- cut(fsp$relChange, breaks=c(0,0.5, 1, 2, 11))
fourth <- merge(fourth, fsp[, c("State", "Change", "relChange", "Multiple")], by="State")
qplot(Year, blindEst, colour=Multiple, data=fourth, size=I(3)) + geom_line(aes(group=State), size=1.25) +
scale_colour_brewer(palette="Reds") + theme_bw() +
geom_label(aes(x=Year, y=blindEst, label=State)) + facet_grid(.~Multiple)
```
A relative change of 1 means that the numbers have doubled between 1860 and 1870. All states see an influx in the number of blind.
There are several other comparisons possible:
The first one would be against the overall increase in the number of people living in the US in 1860 and 1870. The population totals in 1860 and 1870 are 31,443,321 and 38,515,505, respectively. This comes out at a net increase of 22.5%.
We would expect the same percent-wise increase in the number of blind in each state.
```{r}
fourth$adjRelChange <- fourth$relChange - ((38515505 - 31443321)/31443321)
summary(fourth$adjRelChange)
```
The number of blind in all states increases by more than the 22.5%
There is a definite geographic pattern as can be seen in the figure below. The most established states of the North East and the East Coast show the lowest rates of increases in the blind population, while the less established, young states shows the largest increases. West Virgina is missing from the set - note, that it is also missing from the fourth line of circles (shaped like donuts), because it only became a state three years after the 1860 census. Note that while West Virginia is a relatively young state, its inhabitants might have lived there longer, because West Virginia seceded from Virginia and parts of the British Virginia Colony during the Civil War. Some of the West Virginians in the 1870 census will therefore have been counted towards Virginia in the 1860 census.
```{r}
require(ggthemes)
states <- map_data("state")
fourth$region <- tolower(fourth$State)
ggplot(aes(fill=Multiple), data=fourth) +
geom_polygon(aes(x=long, y=lat, group=group),
fill="grey95", colour = "grey80", data=states) +
geom_map(aes(map_id=region), map=states)+
expand_limits(x=states$long, y = states$lat) +
theme_map() +
scale_fill_brewer(palette="Reds")
```
- "Males are shown to be in excess among the blind"
what is the comparison? - under the assumption that the gender ratio is 50:50, Wines' statement is true; there are more men among the blind than women. Based on the pie charts this is what he seems to suggest. However, Wines should be aware that the gender ratio in the US overall is slightly tilted towards men -- in the 1870 census 50.5% of the population are men. However, the ratio is very different from one state to the next.
```{r, echo=FALSE}
library(dplyr)
#blind <- read.csv("data/blind-per.state.csv")
blindGender <- blind %>% group_by(State, gender) %>%
summarize(
blind = sum(blind)
)
blindGender$state <- tolower(blindGender$State)
state_pop <- merge(state_pop, subset(blindGender, gender=="Male")[,-(1:2)], by="state", all.x=TRUE)
state_pop$state <- with(state_pop, reorder(state, TOTAL.MALES/TOTAL.POPULATION*100))
qplot(state, weight=TOTAL.MALES/TOTAL.POPULATION*100, data=state_pop, alpha=I(0.8)) +
ylab("Percentage") + ylim(c(0,100)) + xlab("State") +
theme(axis.text.x=element_text(angle=90, vjust=0.5, hjust=1)) +
geom_hline(yintercept=50, colour="white") +
ggtitle("Percent male population, blind overlaid") +
geom_bar(aes(x = state, weight=blind), fill="orange", alpha=0.4)
```
- "Foreigners are shown to be in excess of their proportion among the blind"
```{r, echo=FALSE}
blindForeign <- blind %>% group_by(State, nativity) %>%
summarize(
blindForeign = sum(blind)
)
blindForeign$state <- tolower(blindForeign$State)
state_pop <- merge(state_pop, subset(blindForeign, nativity=="Foreign")[,-(1:2)], by="state", all.x=TRUE)
state_pop$state <- with(state_pop, reorder(state, FOREIGN.BORN.PERSONS/TOTAL.POPULATION*100))
qplot(state, weight=FOREIGN.BORN.PERSONS/TOTAL.POPULATION*100, data=state_pop, alpha=I(0.8)) +
ylab("Percentage") + ylim(c(0,100)) + xlab("State") +
theme(axis.text.x=element_text(angle=90, vjust=0.5, hjust=1)) +
geom_hline(yintercept=50, colour="white") +
ggtitle("Percent foreign population, blind overlaid") +
geom_bar(aes(x = state, weight=blindForeign), fill="orange", alpha=0.4)
```
- "The Colored are shown to be in excess of their proportion among the Blind"
```{r, echo=FALSE}
blindColored <- blind2 %>% group_by(State, race) %>%
summarize(
blindRace = sum(blind)
)
blindColored$state <- tolower(blindColored$State)
state_pop <- merge(state_pop, subset(blindColored, race=="Non-White")[,-(1:2)], by="state", all.x=TRUE)
state_pop$state <- with(state_pop, reorder(state, COLORED.PERSONS/TOTAL.POPULATION*100))
qplot(state, weight=COLORED.PERSONS/TOTAL.POPULATION*100, data=state_pop, alpha=I(0.8)) +
ylab("Percentage") + ylim(c(0,100)) + xlab("State") +
theme(axis.text.x=element_text(angle=90, vjust=0.5, hjust=1)) +
geom_hline(yintercept=50, colour="white") +
ggtitle("Percent coloured population, blind overlaid") +
geom_bar(aes(x = state, weight=blindRace), fill="orange", alpha=0.4)
```