Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty BED file created - no matching gene symbols or IDs available #278

Open
fritjoflammers opened this issue Jan 23, 2025 · 3 comments
Open

Comments

@fritjoflammers
Copy link

fritjoflammers commented Jan 23, 2025

Hi,

I'm trying to run IsoQuant with pre-assembled isoforms, generated from PacBio's IsoSeq pipeline.

This is my command:

isoquant.py \
        --reference genome.fa \
        --genedb gemoma_interproscan_annotation.modified.filtered.sorted.gff.db \
        --complete_genedb \
        --fastq collapsed_isoforms-BioSample_1.fasta  collapsed_isoforms-BioSample_2.fasta \
        --data_type assembly -o isoquant_realdata_test --fl_data --count_exons

For my organism, a bird, I have gene annotation in GTF format that conforms the requirements for Pigeon.

I can run IsoQuant through successfully, however I seem to have generic (= newly created gene IDs) in all output files that make it impossible to trace the data back to my original gene annotation. Also all logs seems to be normal. The only thing I discovered was that the BED file created from the gene annotation is completely empty.

The same happens when manually creating a genedb with gffutils and providing the database file.

Here's the relevant log:

2025-01-23 14:22:31,040 - INFO - Converting gene annotation file /cluster/scratch/flamme/isoquant_realdata_test/data/gemoma_interproscan_annotation.modified.filtered.sorted.gff.db to .bed format
2025-01-23 14:22:32,258 - INFO - Gene database BED written to isoquant_realdata_test/gemoma_interproscan_annotation.modified.filtered.sorted.gff.bed

Later in continues with thousands of entries of type:

2025-01-23 14:31:17,685 - WARNING - Gene gene_12921 has no exons / transcripts, check your input annotation
2025-01-23 14:31:17,686 - WARNING - Genes gene_12921 have no exons, check you GTF file

I suppose there is some incompatibility in my GTF file, but I have no idea where to start troubleshooting this and would appreciate any help you could give me in this regard.

Thanks!

@fritjoflammers
Copy link
Author

I also tried without --complete_genedb, which might be more correct here, but it still results in an empty BED file.

@andrewprzh
Copy link
Collaborator

Dear @fritjoflammers

Sorry for the delay, I was out of the office.

How did you get the GTF and .db files?
Could you send a small part of your GTF as an example?

Best
Andrey

@fritjoflammers
Copy link
Author

Hi Andrey,

the GTF was generated with GeMoMa and then modified to match the requirement for Pigeon. The DB was generated within IsoQuant. What other information do you need?

This is how the GTF looks:

chr1A	GAF	gene	18420	208592	.	-	.	transcripts 1; gene_name PDE3A; gene_id OENMELG00000003088; 
chr1A	GeMoMa	transcript	18420	208592	4614	-	.	transcript_id OENMELG00000003088.t1; gene_name PDE3A; gene_id OENMELG00000003088; 
chr1A	GeMoMa	exon	18420	18667	.	-	.	transcript_id OENMELG00000003088.t1; gene_name PDE3A; gene_id OENMELG00000003088; exon_id OENMELG00000003088.t1.e1; 
chr1A	GeMoMa	exon	19776	20016	.	-	.	transcript_id OENMELG00000003088.t1; gene_name PDE3A; gene_id OENMELG00000003088; exon_id OENMELG00000003088.t1.e2; 

Thanks!
Fritjof

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants