Name		Name	Last commit message	Last commit date
parent directory ..
DNA_sequence_function_prediction.ipynb		DNA_sequence_function_prediction.ipynb
README.md		README.md

README.md

2.CNN_RNN_sequence_analysis

To use this repository, first download the data here and decompress the files in this folder.

In this example, we show how to use CNN and RNN to predict the functionality of non-coding DNA sequences. We use the data from DeepSEA. As described in DeepSEA and DanQ, the human GRCh37 reference genome was segmented into non-overlapping 200-bp bins. The inputs of the deep learning model are the 1000-bp DNA sequences which are centered on the 200-bp bins. In terms of the labels of those sequences, they were generated by collecting profiles from ENCODE and Roadmap Epigenomics data releases, which resulted in a 919 binary vector for each sequence (690 transcription factor binding profiles, 125 DNase I–hypersensitive profiles and 104 histone-mark profiles). To encode the DNA sequence string into a mathematical form which can be fed to the model, we use the one-hot encoding. In terms of the model, because for DNA sequences, not only do the specific motifs matter, but also the interaction between the upstream and downstream motifs also plays important roles in determining the sequence functionality, we combine CNN and RNN, stacking a bi-directional LSTM layer on top of 1D convolutional layers. The original implementation of DanQ requires Theano, which has been discontinued. We reimplemented the idea solely using Keras.

Reference:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2.CNN_RNN_sequence_analysis

2.CNN_RNN_sequence_analysis

README.md

2.CNN_RNN_sequence_analysis

Files

2.CNN_RNN_sequence_analysis

Directory actions

More options

Directory actions

More options

Latest commit

History

2.CNN_RNN_sequence_analysis

Folders and files

parent directory

README.md

2.CNN_RNN_sequence_analysis