You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I just found your package - looks really useful: thank you!
I had no trouble reading in my genbank sequence using readGenBank, but when I tried to use makeTxDbFromGenBank I encountered some errors. I got a little way down the road in trying to hack a solution before giving up - I've included notes below.
It suspect it is something about the specific Genbank entry I'm trying to use: it's a viral genome, so most genes are intronless, and most genes lack a transcript annotation and lack exon annotation (most have only the gene and CDS). I don't know how common this situation is, so don't know how worthwhile it will be for you to fix it.
But as I started looking at the code, I thought it might not be too difficult to fix it, and I would guess it is might be common enough for genomes like viruses and simple eukaryotes like S.cerevisiae and friends that it could be good motivation.
library(genbankr)
# this works
ref <- readGenBank(GBAccession("JQ673480.1"))
##### makeTxDbFromGenBank error #1:
ref_txdb <- makeTxDbFromGenBank(ref, reassign.ids = FALSE)
# Error in data.frame(tx_id = as.integer(factor(txgr$transcript_id)), gene_id = as.character(txgr$gene_id), :
# arguments imply differing number of rows: 3, 0
I started playing around with the makeTxDbFromGenBank function and was able to get past this error by changing line 87 of txdb.R as follows:
gene_id = as.character(txgr$gene_id)
# changed to to
gene_id = as.character(txgr$gene)
###### error # 3
makeTxDbFromGenBank_JY(ref, reassign.ids = FALSE)
#Error in .check_foreign_key(splicings$tx_id, "integer", "splicings$tx_id", :
# all the values in 'splicings$tx_id' must be present in #'transcripts$tx_id'
I can see that the problem is stemming from the fact I have many CDSs without a corresponding transcript. The help page ?readGenBank sheds some light on why that is, for this genome: In files where transcripts are not present, 'approximate transcripts' defined by the ranges spanned by groups of exons are used. Currently, we do not support generating approximate transcripts from CDSs in files that contain actual transcript annotations, even if those annotations do not cover all genes with CDS/exon annotations.
Would you be able to implement that support to help in cases like mine? thanks for considering it!
hi there,
I just found your package - looks really useful: thank you!
I had no trouble reading in my genbank sequence using
readGenBank
, but when I tried to usemakeTxDbFromGenBank
I encountered some errors. I got a little way down the road in trying to hack a solution before giving up - I've included notes below.It suspect it is something about the specific Genbank entry I'm trying to use: it's a viral genome, so most genes are intronless, and most genes lack a transcript annotation and lack exon annotation (most have only the gene and CDS). I don't know how common this situation is, so don't know how worthwhile it will be for you to fix it.
But as I started looking at the code, I thought it might not be too difficult to fix it, and I would guess it is might be common enough for genomes like viruses and simple eukaryotes like S.cerevisiae and friends that it could be good motivation.
I started playing around with the makeTxDbFromGenBank function and was able to get past this error by changing line 87 of txdb.R as follows:
After fixing that I tried again:
I was able to get past that one by changing line 71 of txdb.R
and on trying again:
I can see that the problem is stemming from the fact I have many CDSs without a corresponding transcript. The help page
?readGenBank
sheds some light on why that is, for this genome:In files where transcripts are not present, 'approximate transcripts' defined by the ranges spanned by groups of exons are used. Currently, we do not support generating approximate transcripts from CDSs in files that contain actual transcript annotations, even if those annotations do not cover all genes with CDS/exon annotations.
Would you be able to implement that support to help in cases like mine? thanks for considering it!
all the best,
Janet Young
The text was updated successfully, but these errors were encountered: