Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Feature/orchestration 3322 merge (#225)
* adding new index * added unit tests for new index * adding max_count to index buckets * simplifying name * adding test for the index * preparing for a run * adding reader * using the extended readers and writers * removing large(>1000bp) clinvar variants * adding ref alt and compressed length in the record * fixing the unit tests * compressing the nsa index buckets * avoiding creating extended reader for every GetAnnotation query * creating db and index for OneKg * adding the lazy index * adding unit tests for lazy index and buckets * adding data source versions and genome assembly to the nsa index * adding an interface for Supplementary annotation data item * adding the provider * updating the provider * adding multiple annotations per position * adding data source details like matchByAllele, isArray, etc * adding new sa object to replace the old complex logic * testing the nsa provider * fixing bugs * adding the heap to the writer and making dbsnp an array of ids * optimizing index memory requirements * creating gnomad db * adding topmed nsa creator * adding indexes and readers for intervals * adding unit tests for interval index * Nirvana is now working with everything streamed in * Cleaned the codes for handling inputs from S3 bucket * Created the lambda wrapper project * WIP * preparing the interval readers and writers * Signed URL based solution for VCF reading * adding zstd dict to readers and writers. adding interval to NsaProvider * Use POCO type for Lambda input parsing * dgv ready to roll * StreamAnntation class implemented * Bugfix; StreamAnnotation works well locally * WIP * switching over to fixe sonarcube issues * WIP * make NirvanaLambda work * Use Json for lambda output * Use POCO for lambda output * refactoring * remove code that has been commented out * return null if chrom doesnot exist in index * displaying data related to compression decompression * starting with indexes * adding tests * working on the reader * Fixed the issue that truncated Json file generated * modifying the saWriter to become the new writer that has the blocked streams built in * removing compile errors * WIP * unit tests for reader writer working * Added regionEndpoint to LambdaWrapper * starting the preLoading effort * reading SA from s3 * adding timing output * code cleanup * adding new sa to lambda * ready for unit testing * reorganizing code. * better memory with better dictionary initialization * bugfix. NsaProvider should not add null arrays of supp intervals to annotated position * added onek SV database * removing files, updating unit tests * discarding large varaints * adding the cosmic db creator * removing all signs of TSV creators, readers, indexes, etc. * Updated the code to generate MITOMAP databases directly (#210) * fixing bug, refactoring NsaProvider, added GlobalMinor item * compiling with ref minor index * bugfixing the ref minor db creator * bug fixing. Ref minor tags are now showing up * phylop database maker ready for testing * phylop database is huge. 7.4GB. even larger than Gnomad. * npd writer done. working on npd reader * comments from code review * initial implementation * adding omim support * GetFileSize method tested * adding omim db creator, gene db writer * Tested Orchestrator locally * Use MemoryStream to pass the payload * Add JSON output index support * adding unit tests for gene reader and writer * gene annotations are showing up in output * now the json output is valid * Updated the csproj files for two Lambda projects * exac gene scores in place * removing empty omim entries * Additional changes in csproj files * removing legacy code. End to end unit tests broken * phylop bug for unknown chromosome fixed * use sync long running tasks * adding custom annotation and a minor bug fix * removing debugging code * VCF filtering feature implemented * Accept config files for http SA resources * Code cleanup * Custom SA Lambda created; Refactoring * Fix: always add chromosome as the key in the preload dictionary * Refactoring * CustomSaLambda tested * Get region endpoint from environment variable * adding custom interval support. removing .version file for custom annotation * adding unit tests for SA * Refactoring and bugfix * Use two readers for header stream and variant stream * Remove the need of version file when creating custom SA * new custom nsa naming convention. Increased unit test coverage for SAutils * Fixed the bug in proload vcf stream * LambdaWrapper bugfix * adding unit tests for omim, exacScores, clingen * PartitionInfo refactoring * incrementing data version for anavrin * Get the references in input VCF from tabix query * Tabix now uses IChromosome internally. * Merged some updates from develop. * removing duplicates from data source versions in header * Also check the chromosome in PassedTheEnd method of IVcfFilter * Integrate Tabix bugfix and update * fixing dbsnp output * fix for dbsnp * Fixed issues related to differing reference and tabix indices. * Make blockoffset seek working * bugfixing the preload operation in NsaReader * Bugfix: FastForward now skips the header lines and checks chr name * Don't throw exception when ending a section not openned yet in JasixIndex * unit test for Nsa reader preload and nsa provider * removing null global minor entries, reverting gene entries to old schema * Refactoring; Enabled SIFT and PolyPhen * Bugfix: create seekable webreadstream from HttpStreamSource * fixed omim bug where omim gene symbol lookup was wrong * Created Cloud project for common POCO class and AWS utils; Update the POCO models according to the Swagger page * Bugfix: Update the name of annotation lambda; add missing base name to annotation output * more omim bug fixes. discarding entries with no gene symbol from OMIM * moving SV annotations to positions. added reciprocal overlap * fixing reciprocal overlap issue * fixed unit test * dgv was missing * updating cosmic schema * Refactoring for unit tests * custom annotation fields are all string * trimming white spaces from headers in custom TSV * Refactored the S3 upload function * updated cosmic tissue object and removed AA from onekg * Fixed bugs in MitoMap database generation * Fixed the bugs in handling annotation jobs with annotation range set to null * Update the break points in chromosome partitioning * fixing bug for unrecognized contigs in preLoad utilities * Reduce the memory usage of preload function * fixing cosmic differences * replacing _ with space for cancer types * fixing unit test * cosmic small variants are capped at lenght 1k. Filtering for conflicting alleles applied * running after onekg bug for grch38. input file was incorrect * global major freq is 7 decimal points * Only preload a subset of the annotations * fixing the empty genes section issue * WIP * removing items that dont have valid refAllele * remving reciprocal overlap for breakends * fixed clinvar bug introduced while ref base checking * Use both RefSeq and Ensembl cache; AnnotationLambda with the same qualifier as the NirvanaLambda will be invoked * Added debugging codes * Upgrade to dotnet core 2.1 * ancestral allele back to onekg and custom annotation intervals being reported for all * Set AmazonLambdaClient timeout to 5 mins; clean /tmp folder before and after each annotation job * fixing clinvar bug and removing phylop scores for GRCh37 chrM * reciprocal overlap is 0 for insertions * fixing a typo * reciprocal overlap for insertion are not reported * Revert staggered preloading of SA * Fixed problems introduced during the merging * Cleaned the repo * Additional code cleaning * Add unit tests; Fix the bug in PassedTheEnd method * Added more unit tests * Calling AnnotationLambda w/o credentials; Always invoke the latest version of AnnotationLambda * Get ARN of annotation lambda as an environment variable * Bugfix: check null SaProvider before preloading * Remove the SortedVcfChecker * Use type keywords * More changes about using type keywords
- Loading branch information