Project code for Machine Learning for Functional Genomics course at Columbia University
two files are too large to upload directly to git. Download them from these google drive links directly:
ENCFF690FNR.bed.gz, which contains methylation data for loci in the genome https://drive.google.com/file/d/1c_oTFgF7mzB6KfIzh8TJ8t9smBoG1GvV/view?usp=sharing
The data is described here: https://www.encodeproject.org/documents/964e2676-d0be-4b5d-aeec-f4f02310b221/@@download/attachment/WGBS%20pipeline%20overview.pdf
hg38-002.pkl.zip, which contains nuclei located at each loci in human genome (2013 sequencing): https://drive.google.com/file/d/1QAUnQgW38F0qfrsjN2DUhUSxgJTbP9ru/view?usp=sharing
The remaining two data files are described below:
genes.tsv contains each gene on every chromosome with start and end loci
ENCFF292KIL.tsv contains TPM data for genes in the human genome