Skip to content

cliao5/GISMO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GISMO

Gene Identity Score of Mammalian Orthologs

GenerateScore

  • Generating GISMO-mis and GISMO metrics.

    inputs/

    • consensus_v2_missense-counts.gz and consensus_v2_synonymous-counts.gz output from scripts in GISMO-mis
    • unmerged-species_combined_matrix_2023-07-26.tsv output from scripts in one2_matrices

GISMO-mis/

  • Scripts to generate input files for GISMO-mis
  • Instead of comparing to human (">REFERENCE" in fasta), generate a consensus reference sequence by codon for each gene. Note that this permits multiple reference codons at a position (in the event of ties)
  • If there is a deletion ("-") anywhere in either reference or query sequence for a gene, skip over it (not counted towards mis or syn totals)
  • For any "N" bp, consider all possible substitutions. To be the most conservative, if it can be synonymous, mark is as synonymous. In other words, only classify if missense if all possibilities are missense.
  • For synonymous and missense scoring in the case of multiple reference codons, again taking the most conservative route by only determining missense if all reference codons are only missense against queried codon.

    inputs/

    outputs/

    • Primary outputs moved to: GenerateScore/inputs/consensus_v2_missense-counts.gz and GenerateScore/inputs/consensus_v2_synonymous-counts.gz

one2_matrices/

partitioned-heritability/

  • Scripts to run partitioned heritability

    scripts/

    • Scripts used to run partitioned heritability

    inputs/

    • (most) Input files needed to run partitioned heritability
    • SNP list: list.txt
    • Gene coordinate file: ENSG_coord.txt
    • Dependent files that are not included here:
      • 1000G_EUR_Phase3_plink/ hosted at: gs://broad-alkesgroup-public-requester-pays/LDSCORE/1000G_Phase3_plinkfiles.tgz
      • hapmap3_snps/ hosted at: gs://broad-alkesgroup-public-requester-pays/LDSCORE/hapmap3_snps.tgz. Using w_hm3.snplist instead.
      • Baseline model hosted at: gs://broad-alkesgroup-public-requester-pays/LDSCORE/1000G_Phase3_baselineLD_v2.2_ldscores.tgz
      • Weights hosted at: gs://broad-alkesgroup-public-requester-pays/LDSCORE/1000G_Phase3_weights_hm3_no_MHC.tgz
      • frqfiles hosted at: gs://broad-alkesgroup-public-requester-pays/LDSCORE/1000G_Phase3_frq.tgz
      • GWAS sumstats can be downloaded according to details in: Manifest_201807.csv
    • decile, decile_rand, and decile-list used as inputs to partitioned heritability scripts (all generated by GISMO-mis_gencode-v44_annotate-ensembl.R)
    • gencode.v44.chr_patch_hapl_scaff.basic.annotation.gff3.genes are gene names from gencode v44 (hosted at: https://www.gencodegenes.org/human/). Generated by v44_ensembl-from-gencode.py.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published