Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using wru offline #99

Closed
emcghee73 opened this issue May 3, 2023 · 4 comments
Closed

Using wru offline #99

emcghee73 opened this issue May 3, 2023 · 4 comments

Comments

@emcghee73
Copy link

I have data on a server without access to the internet, and I'm not allowed to remove the data from that server. Can I use wru without internet access? I tried pre-downloading the census data and using it in the predict_race function, like so:

census20 <- readRDS("wru_census_2020.rds") #loading the pre-downloaded census data
d <- predict_race(filter(d, !is.na(surname)), census.geo="tract", census.data=census20,
year="2020", age=F, sex=F, party="party",
names.to.use="surname, first, middle", model="fBISG")

but I got this error message:

Using predict_race to obtain initial race prediction priors with BISG model
Proceeding with first, last, and middle name predictions...
There was an error retrieving dataCannot access release data for repo "kosukeimai/wru".
Error in gzfile(file, "rb") : cannot open the connection
In addition: Warning message:
In gzfile(file, "rb") :
cannot open compressed file 'C:\Users\mcghee\AppData\Local\Temp\4\RtmpYpAfz4/wru-data-first_c.rds', probable reason 'No such file or directory'

If it's not downloading the census data, what is it downloading? Can I do without it, or pre-download that, too?

@1beb
Copy link
Collaborator

1beb commented May 4, 2023

Hello @emcghee73

There is an option for this:

options("wru_data_wd" = TRUE)

If you download the files manually from the release and put them in the working directory, it won't access the internet. You'll need to download all the files manually:

image

@1beb 1beb closed this as completed May 4, 2023
@emcghee73
Copy link
Author

Thanks! I tried this--downloading these four files and putting them in the working directory--and I got this error message (along with the code leading up to it):

setwd("Z:/projects/mcghee/Voter Files")
census20 <- readRDS("wru_census_2020.rds")
start.time <- proc.time()
options("wru_data_wd"=TRUE)
d2 <- predict_race(filter(d2, !is.na(surname), !is.na(tract), !is.na(county)),

  •                census.geo="tract", census.data=census20, 
    
  •                year="2020", age=F, sex=F, party="party", 
    
  •                names.to.use="surname, first, middle", model="fBISG")
    

Using predict_race to obtain initial race prediction priors with BISG model
Proceeding with first, last, and middle name predictions...
There was an error retrieving dataCannot access release data for repo "kosukeimai/wru".
Proceeding with Census geographic data at tract level...
Using Census geographic data from provided census.data object...
State 1 of 1: CA
There was an error retrieving dataCannot access release data for repo "kosukeimai/wru".
Error in toupper(as.character(df$surname)) :
invalid input 'NUÑEZ' in 'utf8towcs'

proc.time() - start.time

That's farther than it made it before, but it's still giving the same basic error. Am I using the options command wrong?

@1beb
Copy link
Collaborator

1beb commented May 5, 2023

Can you show me the str of your census object. Also, that error I think references a character encoding problem in your data. Can you also show us Sys.locale()?

@emcghee73
Copy link
Author

I fixed the character encoding problem with stri_trans_general(LastName, "Latin-ASCII") and it the rest of it ran. So I now have imputed race probabilities and I think everything is ok? That said, I am still getting the error about accessing release data, and it seems like there are a lot of unmatched names (see below). Should I be worried about that?

Using predict_race to obtain initial race prediction priors with BISG model
Proceeding with first, last, and middle name predictions...
There was an error retrieving dataCannot access release data for repo "kosukeimai/wru".
Proceeding with Census geographic data at tract level...
Using Census geographic data from provided census.data object...
State 1 of 1: CA
There was an error retrieving dataCannot access release data for repo "kosukeimai/wru".
2363620 (11.4%) individuals' last names were not matched.
795341 (3.8%) individuals' first names were not matched.
797290 (3.8%) individuals' middle names were not matched.
There was an error retrieving dataCannot access release data for repo "kosukeimai/wru".
fBISG relies on MCMC; for reproducibility, I am setting RNG seed and returning it as attribute 'RNGseed'.
To silence this message, you can set a seed explicitly by defining the 'seed' element in the control list.
Forming Pr(race | location) tables from census data...

In case it's helpful, here's the output from str(census20). I couldn't get Sys.locale() to work--it said "could not find function Sys.locale()."

List of 1
$ CA:List of 7
..$ state : chr "CA"
..$ age : logi FALSE
..$ sex : logi FALSE
..$ year : chr "2020"
..$ block :'data.frame': 519723 obs. of 17 variables:
.. ..$ state : chr [1:519723] "CA" "CA" "CA" "CA" ...
.. ..$ county : chr [1:519723] "001" "001" "001" "001" ...
.. ..$ tract : chr [1:519723] "400100" "400100" "400100" "400100" ...
.. ..$ block : chr [1:519723] "1002" "1005" "1009" "1027" ...
.. ..$ P2_005N: num [1:519723] 0 0 0 49 144 0 65 17 55 68 ...
.. ..$ P2_006N: num [1:519723] 0 0 6 3 0 0 7 1 2 5 ...
.. ..$ P2_007N: num [1:519723] 0 0 0 0 0 0 0 0 0 0 ...
.. ..$ P2_008N: num [1:519723] 0 0 0 10 26 0 39 22 8 37 ...
.. ..$ P2_009N: num [1:519723] 0 0 0 0 0 0 0 0 0 0 ...
.. ..$ P2_010N: num [1:519723] 0 0 0 0 0 0 0 0 0 0 ...
.. ..$ P2_011N: num [1:519723] 0 0 2 2 16 0 9 7 6 3 ...
.. ..$ P2_002N: num [1:519723] 0 0 3 4 18 0 8 4 6 10 ...
.. ..$ r_whi : num [1:519723] 0.00 0.00 0.00 3.57e-06 1.05e-05 ...
.. ..$ r_bla : num [1:519723] 0.00 0.00 2.83e-06 1.42e-06 0.00 ...
.. ..$ r_his : num [1:519723] 0.00 0.00 1.93e-07 2.57e-07 1.16e-06 ...
.. ..$ r_asi : num [1:519723] 0.00 0.00 0.00 1.63e-06 4.25e-06 ...
.. ..$ r_oth : num [1:519723] 0.00 0.00 9.96e-07 9.96e-07 7.97e-06 ...
..$ tract :'data.frame': 9129 obs. of 16 variables:
.. ..$ state : chr [1:9129] "CA" "CA" "CA" "CA" ...
.. ..$ county : chr [1:9129] "001" "001" "001" "001" ...
.. ..$ tract : chr [1:9129] "400100" "400200" "400300" "400400" ...
.. ..$ P2_005N: num [1:9129] 1840 1383 3293 2676 1898 ...
.. ..$ P2_006N: num [1:9129] 140 39 544 283 617 ...
.. ..$ P2_007N: num [1:9129] 0 0 11 6 8 5 10 8 4 13 ...
.. ..$ P2_008N: num [1:9129] 538 190 540 386 304 125 406 574 243 566 ...
.. ..$ P2_009N: num [1:9129] 10 2 23 8 5 1 9 17 3 32 ...
.. ..$ P2_010N: num [1:9129] 49 14 37 29 38 42 52 40 24 49 ...
.. ..$ P2_011N: num [1:9129] 256 166 509 350 337 190 358 277 224 443 ...
.. ..$ P2_002N: num [1:9129] 205 207 547 374 437 188 633 500 374 977 ...
.. ..$ r_whi : num [1:9129] 0.000134 0.000101 0.00024 0.000195 0.000138 ...
.. ..$ r_bla : num [1:9129] 6.61e-05 1.84e-05 2.57e-04 1.34e-04 2.91e-04 ...
.. ..$ r_his : num [1:9129] 1.32e-05 1.33e-05 3.51e-05 2.40e-05 2.80e-05 ...
.. ..$ r_asi : num [1:9129] 8.96e-05 3.14e-05 9.20e-05 6.44e-05 5.05e-05 ...
.. ..$ r_oth : num [1:9129] 1.52e-04 8.97e-05 2.77e-04 1.92e-04 1.91e-04 ...
..$ county:'data.frame': 58 obs. of 15 variables:
.. ..$ state : chr [1:58] "CA" "CA" "CA" "CA" ...
.. ..$ county : chr [1:58] "001" "003" "007" "011" ...
.. ..$ P2_005N: num [1:58] 472277 801 139651 6941 455421 ...
.. ..$ P2_006N: num [1:58] 159499 10 3320 182 97994 ...
.. ..$ P2_007N: num [1:58] 4131 214 3050 280 2553 ...
.. ..$ P2_008N: num [1:58] 540511 12 10333 252 214520 ...
.. ..$ P2_009N: num [1:58] 13209 0 508 70 5720 ...
.. ..$ P2_010N: num [1:58] 10440 7 1184 92 8366 ...
.. ..$ P2_011N: num [1:58] 88537 76 13474 546 66453 ...
.. ..$ P2_002N: num [1:58] 393749 84 40112 13476 314900 ...
.. ..$ r_whi : num [1:58] 3.44e-02 5.84e-05 1.02e-02 5.06e-04 3.32e-02 ...
.. ..$ r_bla : num [1:58] 7.53e-02 4.72e-06 1.57e-03 8.59e-05 4.62e-02 ...
.. ..$ r_his : num [1:58] 2.53e-02 5.39e-06 2.57e-03 8.65e-04 2.02e-02 ...
.. ..$ r_asi : num [1:58] 9.05e-02 1.96e-06 1.77e-03 5.26e-05 3.60e-02 ...
.. ..$ r_oth : num [1:58] 0.051355 0.000148 0.00882 0.000457 0.038537 ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants