Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors parsing plastid genbank records #8

Open
nilsj9 opened this issue Jul 27, 2020 · 1 comment
Open

Errors parsing plastid genbank records #8

nilsj9 opened this issue Jul 27, 2020 · 1 comment

Comments

@nilsj9
Copy link

nilsj9 commented Jul 27, 2020

Hi @gmbecker ,
currently I am attempting to parse a bunch of plastid genome records using genbankr. Thereby I am encountering recurring error messages and wonder wheter it is caused by a bug in genbankr or by wrong formatted GenBank Flat files. In the following I am listing three frequent error messages:

genbankr::readGenBank(genbankr::GBAccession("NC_033333"))
Error in `[[<-`(`*tmp*`, name, value = c("BWX36_gp082.1", "BWX36_gp082.1",  : 
  28 elements in value to replace 44 elements

genbankr::readGenBank(genbankr::GBAccession("NC_029719"))
Error in .Call2("solve_user_SEW0", start, end, width, PACKAGE = "IRanges") : 
  In range 13: at least two out of 'start', 'end', and 'width', must
  be supplied.
In addition: Warning messages:
1: In FUN(X[[i]], ...) : NAs introduced by coercion
2: In FUN(X[[i]], ...) : NAs introduced by coercion

genbankr::readGenBank(genbankr::GBAccession("NC_017894"))
Error : subscript contains NAs

> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252    LC_MONETARY=German_Germany.1252 LC_NUMERIC=C                   
[5] LC_TIME=German_Germany.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5                  lattice_0.20-41             prettyunits_1.1.1           Rsamtools_2.4.0             Biostrings_2.56.0          
 [6] assertthat_0.2.1            digest_0.6.25               BiocFileCache_1.12.0        R6_2.4.1                    GenomeInfoDb_1.24.2        
[11] stats4_4.0.2                RSQLite_2.2.0               httr_1.4.2                  pillar_1.4.6                zlibbioc_1.34.0            
[16] rlang_0.4.7                 GenomicFeatures_1.40.1      progress_1.2.2              curl_4.3                    rentrez_1.2.2              
[21] blob_1.2.1                  S4Vectors_0.26.1            Matrix_1.2-18               BiocParallel_1.22.0         stringr_1.4.0              
[26] RCurl_1.98-1.2              bit_1.1-15.2                biomaRt_2.44.1              DelayedArray_0.14.1         compiler_4.0.2             
[31] rtracklayer_1.48.0          pkgconfig_2.0.3             askpass_1.1                 BiocGenerics_0.34.0         openssl_1.4.2              
[36] tidyselect_1.1.0            SummarizedExperiment_1.18.2 tibble_3.0.3                GenomeInfoDbData_1.2.3      IRanges_2.22.2             
[41] matrixStats_0.56.0          XML_3.99-0.5                crayon_1.3.4                dplyr_1.0.0                 dbplyr_1.4.4               
[46] GenomicAlignments_1.24.0    bitops_1.0-6                rappdirs_0.3.1              grid_4.0.2                  jsonlite_1.7.0             
[51] lifecycle_0.2.0             DBI_1.1.0                   magrittr_1.5                stringi_1.4.6               XVector_0.28.0             
[56] ellipsis_0.3.1              generics_0.0.2              vctrs_0.3.2                 tools_4.0.2                 bit64_0.9-7                
[61] BSgenome_1.56.0             Biobase_2.48.0              glue_1.4.1                  purrr_0.3.4                 hms_0.5.3                  
[66] parallel_4.0.2              AnnotationDbi_1.50.1        GenomicRanges_1.40.0        memoise_1.1.0               genbankr_1.16.0            
[71] VariantAnnotation_1.34.0 

I would be very grateful if you could help me fix these problems.
Thank you in advance and best wishes.

@kathooks
Copy link

kathooks commented Aug 8, 2022

Hi @gmbecker , hi @nilsj9

I have the first of the issues with a bunch of human RefSeq identifiers, e.g.:

genbankr::readGenBank(genbankr::GBAccession("NM_000494"))
Annotations don't have 'locus_tag' label, using 'gene' as gene_id column
Annotations don't have 'locus_tag' label, using 'gene' as gene_id column
 Error in `[[<-`(`*tmp*`, name, value = c("COL17A1.1", "COL17A1.1", "COL17A1.1",  : 
  53 elements in value to replace 56 elements

It originates from genbankReader.R, line 873. Works when replacing with:

exns$transcript_id = cdss$transcript_id

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants