Skip to content

Latest commit

 

History

History
 
 

scripts

get_DE_events.py

This scripts and its related files are part of the supplemental material for the paper
"Quantifying RNA editing in deep transcriptome datasets"

This script compares REDItools output table arising from multiple samples and returns dysregulated RNA editing at recoding events by means of the Mann-Whitney U-test described in Silvestris et al. (2019) or the statistical pipeline proposed by Tran et al. (2019). REDItoools output table are pre-filtered according to these main following criteria.

  • RNAseq coverage per position (default 10 reads)
  • Minimum editing frequency per position (default 10%)

For each editing candidate, the script applies the MannWhitney test to check the significance between the two groups, A and B.
By default the test is carried out only if the number of editing events per position is equal to 50% of the samples per group.
This treshold can be manually modified (for both groups) by playing with the -mtsA and -mtsB options respectively.
Returned p-values can be corrected using Benjamini Hochberg or Bonferroni tests.

Usage:

usage: get_DE_events.py [-h] [-c MIN_COVERAGE] [-cpval PVALUE_CORRECTION]
                        [-input_file SAMPLES_INFORMATIONS_FILE]
                        [-gene_pos_file GENE_POS_FILE] [-f MIN_EDIT_FREQUENCY]
                        [-mtsA GROUPA_MIN_SAMPLE_TESTING]
                        [-mtsB GROUPB_MIN_SAMPLE_TESTING]
                        [-sig ONLY_SIGNIFICANT]
                        [-siglevel STATISTICAL_SIGNIFICANCE] [-linear]
                        [-graph] [-chr_col CHR_COLUMN] [-rsite RSITE]

optional arguments: -h, --help show this help message and exit

-c MIN_COVERAGE Coverage-q30

-cpval PVALUE_CORRECTION 1 --> Bonferroni correction / 2 --> Benjamini Hochberg

-input_file SAMPLES_INFORMATIONS_FILE (.sif) Comma separated file e.g: Sample,Group,Type (e.g SRR1093527,GROUPA,BrainCerebellum..., SRR1088437,GROUPB,ArteryTibial... etc) An example file is provided here

-gene_pos_file GENE_POS_FILE nonsynonymous_table_NONREP derived from Rediportal NOTE: A gene_pos file is required by -graph or -rsite. An example file can be found here here.

-f MIN_EDIT_FREQUENCY Editing Frequency

-mtsA GROUPA_MIN_SAMPLE_TESTING min percentage of groupA samples

-mtsB GROUPB_MIN_SAMPLE_TESTING min percentage of groupB samples

-sig ONLY_SIGNIFICANT Return only statistically significant editing events

-siglevel STATISTICAL_SIGNIFICANCE cutoff level to reject H0 hypothesis default 0.05

-linear Enable linear statistical model (Tran et al., 2019).

-graph R graph compatible table containing the following columns: Site|Delta|Mannwhitney|pval|Benjamini Hochberg corrected pvalue|status NOTE: THIS OPTION CAN BE USED ONLY IN COMBINATION with -Gene_pos_file

-chr_col CHR_COLUMN If set to "yes" a chromosome_position column will be added to R graph table. NOTE: THIS OPTION IS SPECIFIC FOR -graph & -Gene_pos_file COMBINATION

-rsite RSITE If set to "yes" all recoding sites will be shown in the output table. NOTE: THIS OPTION ONLY WORKS IN DEFAULT MODE.

e.g. python ../REDItools/accessory/get_DE_events.py -cpval 2 -input_file sample_information.csv -sig yes

The script will filter REDItoolDnaRna.py outputs for each sample contained in the SAMPLES_INFORMATIONS_FILE returning only significant editing events (pval <= 0.05) in accordance with Benjamini Hochberg correction.

Accessory files

  • sample_status_file_creator.py
  • This script generates a sample_information.csv file (.sif) compatible with get_DE_events.py. It requires:
    • A csv sample file containing the main informations about each sample to be used in the experiment. An example of this file is included.
    • Samples group1 (e.g. ArteryTibial)
    • Samples group2 (e.g BrainCerebellum)

    Usage:
    python sample_status_file_creator.py csv_input_file, sample_group1, sample_group2

  • sample_path_folder_creator.py
  • This script will copy the Reditools tables in different directories following the sample/Group subdivisions reported in the sample informations file (.sif). It requires:
    • A sample status file (.sif) like this that can be generated with the previous script.

    Usage:
     python sample_path_folder_creator.py csv_sample_file.sif

Note. The script assumes that REDItools outputs (e.g. SRR1071289, SRR1101591) are contained in a "tables" folder in your main working directory, otherwise modify the last line of the script accordingly.