The purpose of resurrectionr is recode_religion()
which recodes a set
of religious identification variables from the US General Social Survey,
in a way that is analytically useful and easy to implement in data
processing.
Bigger sociological and political problems in recoding of religious identification are elaborated and resolved with a recoding schema in the paper by Darren E. Sherkat and Derek Lehman,[1] based on classification[2] from 2001. This package provides solutions for main technical issues with recoding of religious identification from GSS in R, using said schema.
- Variables contain large number of codes: 16 (
relig
), 30 (denom
), and 201 (other
). - Same pattern of variables describes religious identification for a large number of items (e.g. spouses’, father’s, mother’s, etc.), and all of them have, naturally, different variable names.
- Punch codes are not consequtive indexes, but are often skipping certain numbers, and, for certain items they tend to have only punches, not labels.
This package provides universal function recode_religion()
that can
recode all these varaibles and adds additional benefits in terms of
cheking values, providing a recoding key, etc.
# Install development version from GitHub
# install.packages("devtools") # If you don't have it
devtools::install_github("mdjeric/resurrectionr")
This is a basic use of function:
library(resurrectionr)
gss <- gss14_f
gss$religion <- recode_religion(gss$relig, gss$denom, gss$other)
#> Distribution of religious identification, in your data of 2538 is:
#> Freq Relative Cumul
#> Catholic or Orthodox 615 24.23 24.23
#> None 522 20.57 44.80
#> Baptist 324 12.77 57.57
#> Christian, no group given 307 12.10 69.66
#> Moderate Protestant 206 8.12 77.78
#> Sectarian Protestant 178 7.01 84.79
#> Lutheran 92 3.62 88.42
#> Other religion 88 3.47 91.88
#> Liberal Protestant 77 3.03 94.92
#> Jewish 40 1.58 96.49
#> Episcopalian 39 1.54 98.03
#> Mormon 32 1.26 99.29
#> No answer 15 0.59 99.88
#> Don't know 3 0.12 100.00
summary(gss$religion)
#> Baptist Catholic or Orthodox
#> 324 615
#> Christian, no group given Don't know
#> 307 3
#> Episcopalian Jewish
#> 39 40
#> Liberal Protestant Lutheran
#> 77 92
#> Moderate Protestant Mormon
#> 206 32
#> No answer None
#> 15 522
#> Other religion Sectarian Protestant
#> 88 178
recode_religion()
is a wrapper for basic function that does all the
recoding. It confirms the variables are with proper codes, same length,
transforms the type, and provides additional information, warning
messages, errors, etc.
fct_rec_relig()
, the simple function, uses data frames in the list
rdo_cdbk
from which three vectors are
formed where index corresponds to punch number, and label is value (of
relig
, denom
, and other
).
For each new religious identification, separate vectors (for relig
,
denom
, and/or other
identification) are created that contain
corresponding punch codes and labels, per paper and SPSS syntax. In
addition, logical vectors are created for Sectarian Protestants with
valid denomination codes for sorting between ‘Sectarian Protestants’,
‘Christian - no group given’, and ‘other religions’.
Next, twelve logical vectors, for respondent’s belonging to each group, are created by checking against appropriate vectors containing labels. Names are assigned, function prints frequency table, and returns vector that contains new variable.
Function checks that the (1) vectors are of same length, and (2) that their values are valid answers, throwing errors like this:
religion <- recode_religion("PROTESTANT", c(12, 13), "IAP")
#> Error: Vectors must be of the same lenght, currently they are:
#> Length of `relig`: 1.
#> Length of `denom`: 2.
#> Length of `other`: 1.
It is also possible to combine punches with labels, and get same results, or print the coding key that was used on specific data.
set.seed(999) # for reproducability
short <- sample(1:2358, 50, replace = FALSE) # chose random 50 cases
gshrt <- gss[short, ]
gshrt$religion <- recode_religion(gshrt$relig, gshrt$denom, gshrt$other,
add_missing_levels = TRUE, print_key = TRUE)
#> Distribution of religious identification, in your data of 50 is:
#> Freq Relative Cumul
#> Christian, no group given 13 26 26
#> Catholic or Orthodox 12 24 50
#> None 7 14 64
#> Moderate Protestant 6 12 76
#> Baptist 3 6 82
#> Liberal Protestant 2 4 86
#> Lutheran 2 4 90
#> Other religion 2 4 94
#> Sectarian Protestant 2 4 98
#> Jewish 1 2 100
#> Don't know 0 0 100
#> No answer 0 0 100
#> Episcopalian 0 0 100
#> Mormon 0 0 100
#> *** Key for recoding variables ***
#> In this case, number of combinations was: 19 .
#>
#> religion| Baptist Baptist Catholic or Orthodox
#> ^ + ^^^ ^^^ ^^^
#> relig | PROTESTANT PROTESTANT CATHOLIC
#> denom | BAPTIST-DK WHICH SOUTHERN BAPTIST IAP
#> other | IAP IAP IAP
#> --------+- ---------- ---------- ----------
#>
#> religion| Christian, no group given Christian, no group given Jewish
#> ^ + ^^^ ^^^ ^^^
#> relig | CHRISTIAN PROTESTANT JEWISH
#> denom | NO DENOMINATION NO DENOMINATION IAP
#> other | IAP IAP IAP
#> --------+- ---------- ---------- ----------
#>
#> religion| Liberal Protestant Liberal Protestant Lutheran
#> ^ + ^^^ ^^^ ^^^
#> relig | PROTESTANT PROTESTANT PROTESTANT
#> denom | PRESBYTERIAN-DK WH PRESBYTERIAN, MERGED EVANGELICAL LUTH
#> other | IAP IAP IAP
#> --------+- ---------- ---------- ----------
#>
#> religion| Lutheran Moderate Protestant Moderate Protestant
#> ^ + ^^^ ^^^ ^^^
#> relig | PROTESTANT PROTESTANT PROTESTANT
#> denom | OTHER LUTHERAN AM BAPT CH IN USA NAT BAPT CONV USA
#> other | IAP IAP IAP
#> --------+- ---------- ---------- ----------
#>
#> religion| Moderate Protestant Moderate Protestant Moderate Protestant
#> ^ + ^^^ ^^^ ^^^
#> relig | PROTESTANT PROTESTANT PROTESTANT
#> denom | OTHER OTHER UNITED METHODIST
#> other | Christian Reform Disciples of Christ IAP
#> --------+- ---------- ---------- ----------
#>
#> religion| None Other religion Other religion
#> ^ + ^^^ ^^^ ^^^
#> relig | NONE HINDUISM PROTESTANT
#> denom | IAP IAP OTHER
#> other | IAP IAP Unitarian, Universalist
#> --------+- ---------- ---------- ----------
#>
#> religion| Sectarian Protestant
#> ^ + ^^^
#> relig | PROTESTANT
#> denom | OTHER
#> other | Christian; Central Christian
#> --------+- ----------
#> ===================================
#> This is probably not the best way to present or inspect them.
#> There will be a better way soon, or use unique() on your data.
It also works well when some (or all) variables contain punches, either as characters or numbers.
gshrt$d_num <- gss14_n$denom[short]
gshrt$o_num_char <- as.character(gss14_n$other[short])
gshrt$religion2 <- recode_religion(gshrt$relig, gshrt$d_num, gshrt$o_num_char,
add_missing_levels = TRUE, frequencies = FALSE)
#> * `denom` recoded from punches to labels; and 'NA' introduced.
#> * `other` recoded from punches to labels; and 'NA' introduced.
# We can see that there is no difference
FALSE %in% (gshrt$religion2 == gshrt$religion)
#> [1] FALSE
There are minor differences, however, when all missing values are
imported from SPSS. For example, if all missing values are treated
equaliy, certain groups might appear NA
, as seen in this example:
gss$religion2 <- recode_religion(gss14_n$relig, gss14_n$denom, gss14_n$other, frequencies = FALSE)
#> Some of the variables contain NA: `Don't know` and `NA`will be merged. Please see documentation for more details.
#> * `relig` recoded from punches to labels; and 'NA' introduced.
#> * `denom` recoded from punches to labels; and 'NA' introduced.
#> * `other` recoded from punches to labels; and 'NA' introduced.
unique(gss[,c("religion2", "religion")])
#> religion2 religion
#> 1 Catholic or Orthodox Catholic or Orthodox
#> 3 Lutheran Lutheran
#> 6 Episcopalian Episcopalian
#> 13 Christian, no group given Christian, no group given
#> 18 None None
#> 19 Moderate Protestant Moderate Protestant
#> 21 Liberal Protestant Liberal Protestant
#> 34 Other religion Other religion
#> 43 Sectarian Protestant Sectarian Protestant
#> 46 Baptist Baptist
#> 54 Jewish Jewish
#> 119 <NA> No answer
#> 220 <NA> Don't know
#> 260 Mormon Mormon
#> 309 <NA> Christian, no group given
# total number of cases in 2014 GSS:
gss[is.na(gss$religion2) & gss$religion == "Christian, no group given",]
#> bible relig denom other religion
#> 309 INSPIRED WORD PROTESTANT OTHER NA Christian, no group given
#> 419 WORD OF GOD PROTESTANT DK IAP Christian, no group given
#> 677 INSPIRED WORD PROTESTANT OTHER DK Christian, no group given
#> 1138 BOOK OF FABLES PROTESTANT DK IAP Christian, no group given
#> 2487 WORD OF GOD PROTESTANT DK IAP Christian, no group given
#> religion2
#> 309 <NA>
#> 419 <NA>
#> 677 <NA>
#> 1138 <NA>
#> 2487 <NA>
# These are with different types of missing values on denom and other
# should be corrected soon.
Three more things are planned to be included in this package:
- Recode to often more useful group of 7 religious identifications.
- Function that just prints the results.
- Function that returns key with all recoding tetrades, and all recoding tetrades in particular set, as data frame (it might be useful for someone).
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.
[1] “After The Resurrection: The Field of the Sociology of Religion in the United States”
[2] Sherkat, Darren E. 2001. “Tracking the restructuring of American religion: Religious affiliation and patterns of religious mobility, 1973–1998.” Social Forces 79(4), 1459-1493. doi:10.1353/sof.2001.0052