Skip to content

Commit

Permalink
Update genome_annotation.rst
Browse files Browse the repository at this point in the history
  • Loading branch information
JonEilers committed Dec 21, 2023
1 parent 869b464 commit e9a1cde
Showing 1 changed file with 6 additions and 2 deletions.
8 changes: 6 additions & 2 deletions docs/source/genome_annotation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,11 @@ The evolutionary significance of TEs and REs is now increasingly acknowledged. T
Expression Data Mapping and Protein Database Alignment
----------

The next two steps are gene expression data (`rna-seq <https://en.wikipedia.org/wiki/RNA-Seq>`_) mapping and protein database alignment(e.g. `Uniprot <https://en.wikipedia.org/wiki/UniProt>`_ and `Refseq <https://en.wikipedia.org/wiki/RefSeq>`_). Mapping rna-seq data represents gene expression and is key data for finding where genes are located in a genome and what the structure of the gene is (i.e. `exons <https://en.wikipedia.org/wiki/Exon>`_ and `introns <https://en.wikipedia.org/wiki/Intron>`_). Additionally, publicly available protein databases contain millions of protein sequences which can be used to inform where start, stop, and splice sites are located. This can be particularly useful as gene expression data does not always contain information on every gene. This is due to gene expression being tissue specific and if there is no gene expression atlas of all tissue types and developmental stages, then some genes may not have expression data to corroborate their existence. Additionaly, protein databases can contain manually curated gene models where someone has checked the evidence for the gene stucture and corrected any errors.
Following the initial genome assembly, the subsequent phases involve the integration of gene expression data, typically obtained through RNA sequencing (RNA-Seq), and the alignment of this data with established protein databases, such as UniProt and RefSeq. RNA-Seq mapping is pivotal for illuminating gene expression patterns, which are instrumental in pinpointing gene locations within the genome and elucidating gene structures, including the identification of exons and introns. This process is crucial as it provides insights into the functional aspects of the genome.

In addition to RNA-Seq data, leveraging protein databases is an invaluable strategy. These databases host millions of protein sequences that serve as references for determining vital genetic markers such as start and stop codons, and splice sites. This approach is particularly beneficial in instances where RNA-Seq data may not encompass all genes. Gene expression is often tissue-specific, and without a comprehensive atlas covering all tissue types and developmental stages, certain genes might lack corresponding expression data. In such cases, protein databases offer a complementary resource.

Moreover, these protein databases often include manually curated gene models. This manual curation involves thorough verification and correction of gene structures based on available evidence, enhancing the accuracy of gene predictions. By integrating RNA-Seq data with information from protein databases, researchers can achieve a more complete and precise understanding of the genomic landscape, even in areas where gene expression data is limited or absent.

* :doc:`Mapping Gene Expression Data to the Genome Assembly </annotation/rna-seq_mapping>`
* :doc:`Aligning Protein Databases to the Genome Assembly </annotation/protein_database_alignment>`
Expand Down Expand Up @@ -76,7 +80,7 @@ In addition to their role in gene regulation, ncRNAs are pivotal in `epigenetic

Moreover, the significance of ncRNAs extends to disease contexts, with their dysregulation being linked to various diseases including `cancers, neurological disorders, and heart diseases <https://link.springer.com/article/10.1007/s10142-022-00947-4>`_. This association offers insights into disease mechanisms and potential therapeutic targets. Additionally, ncRNAs facilitate intercellular communication, often found in exosomes and influencing neighboring or distant cells. Their conservation across species underscores their evolutionary importance. Therefore, identifying and properly annotating ncRNAs in the genome assembly is not just a matter of cataloging; it's a crucial step in unraveling the complex orchestration of life at the molecular level, revealing intricate mechanisms fundamental to both health and disease. The ongoing discovery and study of ncRNAs continue to illuminate the vast, uncharted territories of non-protein-coding genes, offering profound insights into the complexities of `genetic regulation and function <https://www.sciencedirect.com/science/article/abs/pii/S1874939919302160>`_.

Because of the difficulty in differentiating ncRNA from protein-coding RNA, a combination of bioinformatic tools and ncRNA databases are utilized to identify and annotate ncRNA. Databases include: `NONCODE <http://www.noncode.org/>`_, `RNAcentral <https://rnacentral.org/>`_, `FANTOM <https://fantom.gsc.riken.jp/>`_, `RFAM <https://rfam.org/>`_, etc. Tools for identifying and annotating ncRNA are also numerous such as `Infernal <http://eddylab.org/infernal/>`_ and ncRNA type-specific tools such as for `tRNA <http://gtrnadb.ucsc.edu/>`_, `lncRNA <https://academic.oup.com/nar/article/45/8/e57/2798184?login=false>`_, `miRs <https://tools4mirs.org/>`_, and the list goes on (`piRNA, tsRNA, rRNA, snoRNA, sRNA, etc <https://pubmed.ncbi.nlm.nih.gov/29730207/>`_)
Because of the difficulty in differentiating ncRNA from protein-coding RNA, a combination of bioinformatic tools and ncRNA databases are utilized to identify and annotate ncRNA. Databases include: `NONCODE <http://www.noncode.org/>`_, `RNAcentral <https://rnacentral.org/>`_, `FANTOM <https://fantom.gsc.riken.jp/>`_, `RFAM <https://rfam.org/>`_, etc. Tools for identifying and annotating ncRNA are also numerous, popular ones include `Infernal <http://eddylab.org/infernal/>`_ and ncRNA type-specific tools such as for `tRNA <http://gtrnadb.ucsc.edu/>`_, `lncRNA <https://academic.oup.com/nar/article/45/8/e57/2798184?login=false>`_, `miRs <https://tools4mirs.org/>`_, the list goes on (`piRNA, tsRNA, rRNA, snoRNA, sRNA, etc <https://pubmed.ncbi.nlm.nih.gov/29730207/>`_)

* :doc:`An attempt at finding all the ncRNAs in an assembly <>`

0 comments on commit e9a1cde

Please sign in to comment.