-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
62e0040
commit 8461b1e
Showing
1 changed file
with
19 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,12 +1,26 @@ | ||
# 2vcf - convert raw 23andme or ancestry.com data to VCF | ||
## 2vcf | ||
|
||
The [VCF](https://samtools.github.io/hts-specs/VCFv4.3.pdf) is a widely adopted format for storing detailed data about genetic variation. Services like [23andme](https://www.23andme.com/) and [ancestry.com](https://www.ancestry.com/) offer to genotype customers at less than a million well-characterized sites in the human genome. It is possible to obtain the raw data collected by these sites, but the raw data are provided in a minimal format, which is not trivial to enrich and transform into the VCF format. _2vcf_ converts the raw output from 23andme or ancestry.com into a gzipped VCF file. The output vcf is populated with [human variant data](https://www.ncbi.nlm.nih.gov/variation/docs/human_variation_vcf/), which includes all alternate alleles, annotations, etc. | ||
in order to improve individual sovereignty over genetic/genomic information, facilitate a deeper understanding of biology and computation, and promote shared meaning, openb.io provides `2vcf` under the [MIT license](https://mit-license.org). `2vcf` will convert raw genotype data exports from [23andme](https://www.23andme.com) or [Ancestry.com](https://www.ancestry.com) into [VCF format](https://samtools.github.io/hts-specs/VCFv4.2.pdf). | ||
|
||
In order to build 2vcf, the [golang](https://golang.org/) build tool is required. On os x use [homebrew](https://brew.sh/) to install it `brew install go`. | ||
`2vcf` produces a VCF that contains annotations from dbSNP [build 151](https://github.com/ncbi/dbsnp/tree/master/Build%20Announcements/151) on `GRCh37.p13`. these annotations include allele frequencies from various sources including [1000 Genomes](https://www.internationalgenome.org) and [ExAC](http://exac.broadinstitute.org/), [RefSeq](https://www.ncbi.nlm.nih.gov/refseq/) gene annotations, and functional class of the variant. | ||
|
||
Build 2vcf by checking out the [source repo](https://github.com/plantimals/2vcf), entering the directory `cd 2vcf`, and running the make file `make`. Build for windows by using `make windows`. | ||
the source VCF for dbSNP build 151 weighs in at around 15GB. the sites assayed by personal genomics companies are but a tiny fraction of the totality of dbSNP sites. so I make available a reference version of the dbSNP VCF which has been filtered down to those sites likely to be contained in your exported 23andme or Ancestry.com exported raw data. for more details on which sites are included and why, see this writeup on the sources for `2vcf reference v2.0`. | ||
|
||
Convert your raw data by running the utility `./2vcf --input-file my-raw-data.zip --output-file my-personal-genotypes.vcf.gz`. Running the utility from another location works as well, but remember to specify the path to the reference data as well `--vcf-ref /home/me/git/2vcf/reference.vcf.gz`. | ||
## usage | ||
|
||
1. download the appropriate binary for your architecture from the [most recent github release](https://github.com/plantimals/2vcf/releases/tag/v0.4.0). un-tar the contents after downloading. | ||
|
||
2. download the [reference vcf](http://openb.io/2vcf/2vcf-v2.0.vcf.gz) http://openb.io/2vcf/2vcf-v2.0.vcf.gz | ||
|
||
3. download your raw genotype data from [23andme](https://customercare.23andme.com/hc/en-us/articles/212196868-Accessing-and-Downloading-Your-Raw-Data) or [Ancestry](https://support.ancestry.com/s/article/Downloading-AncestryDNA-Raw-Data). | ||
|
||
4. now run the `2vcf` binary with the appropriate options: | ||
|
||
``` | ||
./2vcf conv 23andme --ref path/to/2vcf-v2.0.vcf.gz \ | ||
--input path/to/my/raw/genotypes.zip \ | ||
--output my-personal-annotated.vcf.gz | ||
``` | ||
|
||
Please report any errors or difficulties with the utility. | ||
|