PerturbSci_Kinetics

Reads processing scripts for PerturbSci-Kinetics. The bioRxiv preprint: https://doi.org/10.1101/2023.01.29.526143

Background SNP calling (/bg_SNP_calling/SNP_calling_processing.sh)

Key parameters

Fastq input: Paired-end full-coverage bulk RNA-seq.
Sample ID: a text file containing the prefix of each sample on each line. R1 and R2 files from the sample should share the same prefix.
Reference fasta: the fasta file of the reference genome. It is used during reads pileup.
Index: the STAR index of the reference genome.
Output folder: the directory for all output files.
Script folder: the folder for all sub scripts.
Other parameters include core number, the directories of packages.

Steps

Trim adapter sequences by automatic detection.
STAR alignment.
Filter aligned reads.
Merge bams and sort the merged bam.
Summarize the base identities of reads mapped to each genomic location.
Inherent SNP calling.

Key output

A vcf file containing background mutations in RNA.

Single cell whole/nascent transcriptomes reprocessing steps (/whole_tx_processing/Main_processing.sh)

Key parameters

Fastq input: Paired-end PerturbSci-Kinetics demultiplexed whole transcriptome fastq files.
Sample ID: a text file containing the prefix of each sample on each line. R1 and R2 files from the sample should share the same prefix.
Reference fasta: the fasta file of the reference genome.
Index: the STAR index of the reference genome.
Gtf file: the annotation file for the matched reference genome. It is used in feature counting.
Reference SNP file: the SNP vcf file generated from the script above. It is used to filter out inherent mutations in the RNA.
Output folder: the directory for all output files.
Script folder: the folder for all sub scripts.
Cutoff: only cell barcodes with reads number > this cutoff will be considered for further processed.
Custom barcode folder: the folder for all barcodes.
RT barcode, ligation barcode: pickle files containing all valid barcode sequences with at most 1 mismatch.
Barcodes: the text file containing all RT+ligation barcode combinations.
Other parameters include core number, the directories of packages.

Steps

Change file names of fastq to make them callable in the following steps.
Attach UMI sequences on R1 to headers of R2.
Trim potential polyA sequences from the 3'end of R2.
STAR alignment.
Filter aligned reads.
PCR duplicates removal based on both mapped genomic coordinates and UMI.
Single-cell sam files generation.
Transform the alignment information in single-cell sams to tables at the single-base level.
Identify T>C mutations on each single read and extract read names of nascent reads.
Extract nascent reads from single-cell sams.
Gene-level feature counting on both single-cell whole/nascent sams and re-format the single-cell gene expression matrix.

Key output

Two R object files containing single-cell whole/nascent tx expression count matrix respectively.

Single cell sgRNA reprocessing steps (/sgRNA_reads_processing/sgRNA_processing.sh)

Key parameters

Fastq input: Paired-end PerturbSci-Kinetics demultiplexed sgRNA fastq files.
Cutoff: only cell barcodes with the number of sgRNA UMI > cutoff will be considered
SgRNA correction file: pickle files containing all valid sgRNA sequences with at most 1 mismatch.
SgRNA annotation df: A txt file including the sgRNA names, and corresponding gene symbols. It is used during the expression matrix construction.
Other parameters are roughly the same as above.

Steps

Change file names of fastq to make them callable in the following steps.
One-step sgRNA identification, de-duplication, and counting.
Re-format the single-cell sgRNA expression matrix.

Key output

An R object file containing an single-cell sgRNA expression matrix.

Paired-end bulk SLAM-seq reads reprocessing steps (/SLAMseq_processing/SLAM_seq_main_processing.sh)

Key parameters

Parameters are roughly the same as those in single-cell processing scripts.

Steps

Change file names of fastq to make them callable in the following steps.
Attach UMI sequences on R1 to headers of R2.
Trim potential adapter sequences from the 3'end of R1 and R2.
STAR alignment.
Filter aligned reads.
PCR duplicates removal by picard.
Transform the alignment information in sams to tables at the single-base level.
Split the alignment info table into small sub tables.
Identify T>C mutations on each read pair.
Merge mutation info identified from all sub tables under one sample, and extract names of nascent reads.
Extract nascent reads from sams.
Gene-level feature counting on both whole/nascent bams and re-format the gene expression matrix.

Key output

Two R object files containing gene x sample whole/nascent tx expression count matrix respectively.

Key R functions in downstream analysis (/downstream_functions/key_functions.R)

filter_dT_cells(): Get single-cell whole tx expression matrix from the output R.object of the preprocessing script.
gene_id2gene_names(): Convert gene ids to gene symbols using the matched gtf file
gRNA_cell_reformatting(): Read and reformat the sgRNA single-cell expression matrix to make it compatible with the integradation with whole tx info.
match_whole_nascent_txme_with_gRNA(): Integrate whole tx data with sgRNA info, identify sgRNA-based singlets, and return a integrated obj.
synth_deg_bootstrapping_NTC_vs_KD(): Calculate synthesis and degradation rates on cell populations. Also perform permutation tests between perturbations and NTC to examine the statistical significance.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
SLAMseq_processing		SLAMseq_processing
barcode_files		barcode_files
bg_SNP_calling		bg_SNP_calling
downstream_functions		downstream_functions
env_config		env_config
example_files		example_files
sgRNA_reads_processing		sgRNA_reads_processing
whole_tx_processing		whole_tx_processing
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PerturbSci_Kinetics

Background SNP calling (/bg_SNP_calling/SNP_calling_processing.sh)

Key parameters

Steps

Key output

Single cell whole/nascent transcriptomes reprocessing steps (/whole_tx_processing/Main_processing.sh)

Key parameters

Steps

Key output

Single cell sgRNA reprocessing steps (/sgRNA_reads_processing/sgRNA_processing.sh)

Key parameters

Steps

Key output

Paired-end bulk SLAM-seq reads reprocessing steps (/SLAMseq_processing/SLAM_seq_main_processing.sh)

Key parameters

Steps

Key output

Key R functions in downstream analysis (/downstream_functions/key_functions.R)

About

Releases

Packages

Languages

License

JunyueCaoLab/PerturbSci_Kinetics

Folders and files

Latest commit

History

Repository files navigation

PerturbSci_Kinetics

Background SNP calling (/bg_SNP_calling/SNP_calling_processing.sh)

Key parameters

Steps

Key output

Single cell whole/nascent transcriptomes reprocessing steps (/whole_tx_processing/Main_processing.sh)

Key parameters

Steps

Key output

Single cell sgRNA reprocessing steps (/sgRNA_reads_processing/sgRNA_processing.sh)

Key parameters

Steps

Key output

Paired-end bulk SLAM-seq reads reprocessing steps (/SLAMseq_processing/SLAM_seq_main_processing.sh)

Key parameters

Steps

Key output

Key R functions in downstream analysis (/downstream_functions/key_functions.R)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages