This repository contains the code for the paper "Exaptation of ancestral cell identity networks enables C4 photosynthesis" published in XXXX.
Ensure that R version 4.0.5 is installed on your system. Download it from CRAN.
# Read the requirements file
requirements <- read.table("R_requirements.txt", header = FALSE, skip = 1, sep = "\t", stringsAsFactors = FALSE, col.names = c("Package", "Version"))
# Check for missing packages and install them
missing_packages <- setdiff(requirements$Package, installed.packages()[, "Package"])
if (length(missing_packages) > 0) {
install.packages(missing_packages)
}
# Verify installation
# Re-check installed packages
installed <- installed.packages()[, "Package"]
# Identify any packages that failed to install
failed_to_install <- setdiff(missing_packages, installed)
if (length(failed_to_install) > 0) {
warning("The following packages failed to install:", paste(failed_to_install, collapse = ", "))
} else {
message("All packages installed successfully.")
}
Clone this repository to access the scripts on your local machine:
git clone https://github.com/joey1463/C3-C4.git
cd C3-C4
This repository contains a series of R scripts designed to handle various aspects of RNA-seq data analysis, particularly focusing on sci-RNA-seq3 demultiplexing, clustering, and visualization for Rice and Sorghum experiments. Each script is tailored to specific stages of the data analysis pipeline.
- Purpose: Initializes the environment by loading necessary R packages.
- Purpose: Processes read counts from sci-RNA-seq3 experiments for 0-12 hour time points, separating data by species and compiling into counts matrices suitable for Seurat analysis.
- Operations:
- Barcode Reading: Differentiates between RT and LIG barcodes, handling variations in barcode length (9 or 10 bp).
- Matrix Generation: Constructs a matrix ensuring all rows are aligned and consistent.
- Data Cleaning: Removes control wells and ensures no duplicate names in the datasets for Sorghum and Rice.
- Exporting Data: Outputs a report detailing the barcode processing.
- Purpose: Similar to Script 01, but processes data from 48-hour time points.
- Purpose: Integrates sci-RNA-seq3 and 10X-RNA experiments for rice into a single Seurat object.
- Purpose: Similar to Script 03 but for sorghum.
- Purpose: Performs cross-species clustering for rice and sorghum using ortholog matches.
- Purpose: Visualizes the clustered rice atlas from RNA data, supporting subsampling for performance.
- Data: Loads
L1_rice_combined.RData
(full dataset).
- Purpose: Generates barplots to compare cluster representation across assay types and time points.
- Purpose: Identifies the top 200 specific markers for each cluster in the rice data.
- Purpose: Labels cell types in the Rice Atlas and generates dot plots using published markers.
- Data: Loads
L1_rice_combined_labelled.RData
.
- Purpose: Processes and clusters 10X-RNA nuclei data from the rice mTurquoise line.
- Features:
- Combines data from two replicates.
- Identifies markers.
- Generates plots.
- Purpose: Uses mTurquoise line specific markers to identify bundle sheath cells, comparing with the Rice atlas.
- Features:
- Analysis of marker overlap.
- Module score plot generation.
- Purpose: Re-cluster various cell types including mesophyll, epidermal, and vasculature clusters.
- Cell Classes: Mesophyll, Vasculature, Epidermis
- Purpose: Visualize clustering results for different rice cell classes.
- Visualized Cell Classes: Vasculature, Mesophyll, Epidermis
- Purpose: Calculate differentially expressed genes across identified cell types.
- Purpose: Analyze transcriptional profiles and differential expression under etiolated conditions.
- Features:
- Generate and analyze pseudo-bulked expression profiles.
- Compare T0 and T12 responses.
- Purpose: Subcluster mesophyll cells using differentially expressed genes.
- Features:
- Recluster using variable features.
- Purpose: Import pseudo-bulked expression profiles and other relevant data sets.
- Features:
- Handle time-series data.
- Export gene lists for further analysis.
- Purpose: Analyze light-responsive genes involved in photosynthesis within specific cell types.
- Features:
- Subset and analyze differentially expressed genes.
- Generate heatmaps.
- Purpose: Construct volcano plots to visualize differentially expressed genes between cell types over time.
- Features:
- Generate volcano plots for 0h and 12h time points.
- Additional scatterplot visualization.
- Description: Generates cell-type specific gene expression profiles for selected candidate genes.
- Key Operations: Gene expression plotting.
- Figures: HY5s and PIFs, supplementary figures.
- Description: Clusters and visualizes expression patterns of genes differentially expressed in response to light.
- Key Operations: Differential expression clustering followed by Z-score plotting.
- Description: Performs ANCOVA analysis on pseduobulked transcriptional profiles to analyze differential expression between cell type pairs.
- Description: Visualizes clustered Sorghum Atlas using RNA data.
- Key Operations: Data subsampling for local processing, feature plotting.
- Description: Creates barplots to compare cluster representation across different assay types and time points.
- Description: Identifies top cluster-specific markers for each cluster.
- Key Operations: Marker identification.
- Description: Labels cell types in Sorghum Atlas and generates dot plots using published markers.
- Key Operations: UMAP visualization, dot plot creation.
- Description: Clusters specific cell classes including mesophyll, epidermal, and vasculature clusters.
- Description: Visualizes clustering of cell classes like vasculature, mesophyll, and epidermis.
- Description: Computes differentially expressed genes for each identified cell type.
- Purpose: Computes pseudo-bulk transcriptional profiles for each cell type and identifies differentially expressed genes between mesophyll and bundle sheath cell types under etiolated conditions.
- Key Functions: Average and aggregate expression computation, T0 and T12 response analysis.
- Purpose: Subclusters bundle sheath 10X nuclei from sorghum using differentially expressed genes as variable features.
- Data: Loads reclustered data from
L3_sorghum_bundle_sheath_reclustered.RData
.
- Purpose: Reads in pseudo-bulked expression profiles, names of photosynthesis-related genes, and transcription factors.
- Operations: Time averaging, gene list export for supplementary materials.
- Purpose: Analyzes light-responsive differentially expressed genes in specific cell types, focusing on photosynthesis genes.
- Output: Generates heatmaps for visual analysis and saves them to
heatmap_sup_sorghum.txt
.
- Purpose: Constructs volcano plots to visualize differentially expressed genes between mesophyll and bundle sheath cell types at T0 and T12.
- Features: Includes extended data scatterplot.
- Purpose: Creates cell-type specific expression profiles for candidate genes.
- Purpose: Clusters differentially expressed genes by light response to identify dominant expression trends, visualized through Z-score plots.
- Figures: Includes figures for both upregulated and downregulated genes.
- Purpose: Performs ANCOVA analysis on unnormalized pseudo-bulked transcriptional profiles to evaluate differential expression due to cell type and light response.
- Details: Outputs tissue or time factor coefficients and p-values for significant genes.
- Purpose: Identifies cell type-specific marker genes across species.
- Purpose: Analyzes overlap of cell type-specific genes across species using orthology datasets.
- Outputs: Heatmaps of significance and gene overlap counts.
- Purpose: Generates Sankey plots to display comparisons of cell-type marker genes across species.
- Features:
- Plotting Sankey diagrams.
- Exporting specific lists related to the plots.
- Purpose: Processes cross-species clustered data to label nuclei based on their annotation from the rice or sorghum RNA atlas.
- Features:
- Visualizing unlabelled and labelled UMAPs.
- Conditional labeling based on cell type source.
- Purpose: Identifies differential expression of cell type markers in the bundle sheath cells of rice and sorghum.
- Features:
- Calculation and visualization of gained and lost genes.
- Saving and exporting data subsets for further analysis.
- Purpose: Compares differentially partitioned genes across species.
- Features:
- Handling data partitions and merging by orthogroups.
- Visualizing results through heatmaps and bar plots.
- Purpose: Examines partitioning patterns specifically in mesophyll vs. bundle sheath across species.
- Features:
- Generating heatmaps for differentially and consistently partitioned gene pairs.
- Purpose: Calls peaks using MACS2 on different 10X-multiome libraries for rice and sorghum.
- Features:
- Peak calling with replication management.
- Purpose: Clusters Rice 10X-Multiome RNA data.
- Features:
- Analysis of multiple replicates.
- Assessment of clustering metrics.
- Purpose: Clusters Rice 10X-Multiome ATAC data.
- Features:
- Data merging and object creation.
- Visualization of clustering results.
- Purpose: Combines assembled RNA and ATAC multiome data into a single object to identify cell types.
- Features:
- Comprehensive data assembly and cell type identification.
- Purpose: Compares cell type-specific gene expression markers from multiome data with RNA atlas data.
- Features:
- Identification and comparison of markers.
- Visualization via heatmaps.
- Features:
- Computes mesophyll-specific and bundle sheath-specific genes.
- Visualizes DOF family gene expression patterns in these cell types.
- Features:
- Identifies over-represented cis-regulatory elements responsive to light in each cell type.
- Uses JASPAR database for motif information.
- Features:
- Clusters Sorghum 10X-Multiome RNA data.
- Features:
- Clusters Sorghum 10X-Multiome ATAC data, including peak calling and data merging.
- Features:
- Combines RNA and ATAC data into one object and identifies cell types.
- Features:
- Compares cell type-specific gene expression markers with RNA Atlas data.
- Features:
- Similar to Script 50, adapted for Sorghum.
- Features:
- Similar to Script 51, adapted for Sorghum.
- Features:
- Analyzes gene expression partitioning across species and identifies differentially and consistently partitioned genes.
- Features:
- Overlaps cis-regulatory elements across species, assessing statistical significance.
- Purpose: Compares DOF site counts associated with differentially partitioned genes across species using a binomial test.
- Visualizations: Includes scatterplots for differential and consistent comparisons.
- Output: Exports analysis results.
- Purpose: Generates heatmaps of cell type-specific accessible chromatin from a multiome dataset.
- Key Features: Visualizes patterns of accessibility and outputs gene names for GO terms analysis.
- Purpose: Analyzes light-responsive changes in chromatin accessibility for selected genes.
- Output: Plots changes and reports statistical significance.
- Purpose: Identifies and ranks enriched motifs within accessible chromatin of differentially partitioned genes.
- Data Management: Includes steps to save and load results for further iteration and analysis.
- Purpose: Counts Dof.2 motifs in differentially partitioned genes using motif scanning.
- Verification: Compares motif count to manual counts as a sanity check.
- Purpose: Plots accessible chromatin tracks within specific cell types for candidate genes.
- Key Feature: Focuses on genes like GAPDH and NADP-ME.
- Purpose: Similar to the Rice script 61, adapted for Sorghum.
- Data Handling: Includes commands to set working directories and export results.
- Purpose: Assesses changes in chromatin accessibility in response to light conditions.
- Statistical Analysis: Reports p-values and conducts pairwise t-tests.
- Purpose: Analyzes enriched motifs in Sorghum, similar to the corresponding Rice script.
- Purpose: Counts Dof.2 motifs in Sorghum, ensuring consistency with similar analyses in Rice.
- Purpose: This script plots accessible chromatin tracks within mesophyll and bundle sheath cell types for candidate genes in Sorghum. It is designed to help visualize differences in chromatin accessibility between these cell types.
- Main Functions:
- Read in data
- Call peaks
- Plot for genes like GAPDH and NADPME
- Purpose: Computes and analyzes motif enrichment for genes that are consistently partitioned into the bundle sheath in both rice and sorghum. This script extends the analysis to include orthologs from other C3 grasses.
- Main Functions:
- Read in required data for multiple species (Chasmanthium, Rice, Sorghum, Barley, Brachypodium)
- Use AME from the MEME suite to assess cis-regulatory enrichment within their promoters
- Plot and export outcomes