basic data input and structure vignette

delomast · Jun 23, 2020 · b806959 · b806959
1 parent df28df7
commit b806959
Show file tree

Hide file tree

Showing 4 changed files with 32 additions and 4 deletions.
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -18,3 +18,4 @@ Suggests:
     knitr,
     rmarkdown
 VignetteBuilder: knitr
+LazyData: true
diff --git a/README.md b/README.md
@@ -9,3 +9,14 @@ Current options: </br>
 * "single-sided" grandparentage: a trio of grandchild + both maternal grandparents (or both paternal grandparents) </br>
 * single parentage: a pair of parent + offspring </br>
 
+Install with:
+```
+devtools::install_github("delomast/gRandma")
+```
+
+To install and view the vignette:
+```
+devtools::install_github("delomast/gRandma", build_vignettes = TRUE)
+browseVignettes("gRandma")
+```
+
diff --git a/test.R b/test.R
@@ -30,7 +30,7 @@ for(i in seq(3,ncol(data_mh_snp) - 1, 2)){
 to_remove <- c(to_remove, to_remove + 1)
 data_mh_snp <- data_mh_snp[,-to_remove]
 # add data to package
-usethis::use_data(data_mh_snp, overwrite = TRUE)
+usethis::use_data(data_mh_snp, overwrite = TRUE, internal = FALSE)
 
 # add vignette
 usethis::use_vignette("Load_in_data_and_gmaData_structure")
diff --git a/vignettes/Load_in_data_and_gmaData_structure.Rmd b/vignettes/Load_in_data_and_gmaData_structure.Rmd
@@ -44,15 +44,15 @@ For most things, you will just take this object and use it as input for other fu
 
 gmaData objects are lists with nine entries:
 
-* `baseline`: A recoded, one-column per call version of the baseline. Genotypes are represented by an integer (starting at 0) and missing genotypes are `NA`.
-* `mixture`: A recoded version of the input mixture (if one was input).
+* `baseline`: A recoded, one-column per call version of the baseline (potential grandparents/parents). Genotypes are represented by an integer (starting at 0) and missing genotypes are `NA`.
+* `mixture`: A recoded version of the input mixture (potential descendants, if one was input).
 * `unsampledPops`: experimental - ignore and do not use for now
 * `genotypeErrorRates`: a list of matrices giving the genotyping error model. The row label indicates the true genotype and the column indicates the observed genotypes. The values are the probability of observing each genotype given the true genotype.
 * `genotypeKeys`: A list of dataframes defining each of the genotypes as represented by integers
 * `alleleKeys`: A list of dataframes defining each of the alleles as represented by integers
 * `baselineParams`: A list with an entry for each baseline population. Each population is represented by another list of numeric vectors - one for each locus. These vectors are the parameters of a Dirichlet posterior for estimates of allele frequencies given a Dirichlet prior with 1/n for all parameters where n is the number of alleles at that locus and ignoring genotyping error. The allele frequencies used by gRandma for a given baseline population and locus are these values normalized to sum to 1.
 * `unsampledPopsParams`: experimental - ignore and do not use for now
-* `missingParams`: A list of numeric vectors, one for each locus. These have the parameters of a Beta posterior for estimates of the proability a genotype is missing given a Beta(.5,.5) prior.
+* `missingParams`: A list of numeric vectors, one for each locus. These have the parameters of a Beta posterior for estimates of the proability a genotype is missing given a Beta(.5,.5) prior. The probabilities of missing genotypes used by gRandma for a given locus are these values normalized to sum to 1.
 
 ```{r}
 head(gData_1$baseline)
@@ -65,4 +65,20 @@ gData_1$missingParams[1:5]
 ```
 
 
+Now let's pretend we have separate mixture and baseline populations. These should be two separate dataframes, with identical columns expect that the mixture dataframe should not have the column representing populations. We can split our example data into two:
+```{r}
+mixtureData <- data_mh_snp[data_mh_snp$Pop == "Pop_4",]
+mixtureData <- mixtureData[,-1] # remove Pop column from mixture
+baselineData <- data_mh_snp[data_mh_snp$Pop != "Pop_4",]
+mixtureData[1:5,1:5]
+baselineData[1:5,1:5]
+
+```
+
+And now we can create a gmaData object with both a baseline and a mixture.
+```{r}
+gData_2 <- createGmaInput(baseline = baselineData, mixture = mixtureData, perAlleleError = .005, dropoutProb = .005,
+										  markerType = "microhaps")
+gData_2
+```