The goal of ensemblQueryR is to seemlessly integrate querying of Ensembl databases into your R workflow. It does this by formatting and submitting user queries to the Ensembl API. At present, the package contains functions for the three Ensembl Linkage Disequilibrium (LD) 'endpoints': 1. Query LD in a window around one SNP, 2. Query LD for a pair of query SNPs and 3. Query LD for SNPs at a specified genomic locus.
You can install ensemblQueryR as below.
# load remotes package
library(remotes)
# to install the development version
remotes::install_github("ainefairbrother/ensemblQueryR")
# to install the stable CRAN release
install.packages("ensemblQueryR")
Or from
To check that the Ensembl server is up and running, the server can be pinged.
library(ensemblQueryR)
ensemblQueryR::pingEnsembl()
All functions in this package take the pop
argument which defines the population for which to retrieve LD metrics. To get a list of options for this argument, run the ensemblQueryGetPops()
function.
ensemblQueryR::ensemblQueryGetPops()
Get all variants in LD with one query variant using ensemblQueryLDwithSNPwindow
. This function constrains the query by taking a minimum r-squared cut-off (r2
), D-prime (d.prime
) and window size around the variant in kilobases (window.size
).
ensemblQueryR::ensemblQueryLDwithSNPwindow(rsid="rs3851179",
r2=0.8,
d.prime=0.8,
window.size=500,
pop="1000GENOMES:phase_3:EUR")
For more than one query variant, the ensemblQueryLDwithSNPwindowDataframe
function takes a data.frame
as input, and gets all variants in LD with all query variants in the rsid
column. It is possible to parallelise this operation by setting the number of cores above 1.
# example input data
in.table <- data.frame(rsid=rep(c("rs7153434","rs1963154","rs12672022","rs3852802","rs12324408","rs56346870"), 500))
# run query on in.table
ensemblQueryR::ensemblQueryLDwithSNPwindowDataframe(
in.table=in.table,
r2=0.8,
d.prime=0.8,
window.size=500,
pop="1000GENOMES:phase_3:EUR",
cores=1
)
The ensemblQueryLDwithSNPpair
takes a single pair of query SNPs and returns a data.frame
of LD metrics.
ensemblQueryR::ensemblQueryLDwithSNPpair(
rsid1="rs6792369",
rsid2="rs1042779",
pop="1000GENOMES:phase_3:EUR"
)
The ensemblQueryLDwithSNPpairDataframe
takes a data.frame
with columns rsid1
and rsid2
and returns a data.frame
of LD metrics for all variant pairs. It is possible to parallelise this operation by setting the number of cores above 1.
# example input data
in.table <- data.frame(rsid1=rep("rs6792369", 10), rsid2=rep("rs1042779", 10))
# run query on in.table
ensemblQueryR::ensemblQueryLDwithSNPpairDataframe(
in.table=in.table,
pop="1000GENOMES:phase_3:EUR",
keep.original.table.row.n=F,
cores=1
)
The ensemblQueryLDwithSNPregion
function takes genomic coordinates as input and returns all variant pairs and their LD metrics within the defined region.
ensemblQueryR::ensemblQueryLDwithSNPregion(
chr="6",
start="25837556",
end="25843455",
pop="1000GENOMES:phase_3:EUR"
)
The ensemblQueryLDwithSNPregionDataframe
takes a data.frame
with columns chr
, start
and end
and returns a data.frame
of LD metrics for all variant pairs contained within each genomic region (each row of in.table
). It is possible to parallelise this operation by setting the number of cores above 1.
# example input data
in.table = data.frame(chr=rep(c("6"), 10),
start=rep(c("25837556"), 10),
end=rep(c("25843455"), 10))
# run query on in.table
ensemblQueryR::ensemblQueryLDwithSNPregionDataframe(
in.table= ,
pop="1000GENOMES:phase_3:EUR",
cores = 2
)
We have provided a Docker image, enabling this tool to be run regardless of your local operating system or R version. This can be found here. As long as you have Docker installed, the code below will allow you to pull this image, run a container and execute it. You will then be able to use ensemblQueryR
as described above.
docker pull ainefairbrotherbrowne/ensemblqueryr:1.0; \
docker run -t -d --name ensemblqueryr ainefairbrotherbrowne/ensemblqueryr:1.0; \
docker exec -i -t ensemblqueryr R
Please note that this code is still under development and may contain bugs or errors. It is not recommended for use in production environments. Use at your own risk. I am working on improving the code, addressing any issues, and expanding the package's capabilities so please check back for updates.