Skip to content

Data processing scripts for polyploid A.thaliana transcriptome analysis.

Notifications You must be signed in to change notification settings

barneypotter24/ploidy-seq

Repository files navigation

Ploidy-seq data analysis

DISCLAIMER: Don't do any of this yet, I haven't debugged everything yet so it probably won't work.

First, we will need to create an instance on Jetstream Atmosphere and connect to iRODS. An explanation for how to do both of these things is available on Mick Song's blog: https://michaelsongagradstudent.github.io/blog/2017/04/12/Cheat_Sheet_Atmosphere

Once we are in our Atmosphere web shell, we first want to install the package manager conda:

wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh

We will need to re-initialize our web shell:

source ~/.bashrc

Next, clone the repository and cd into it:

git clone https://github.com/barneypotter24/ploidy-seq.git
cd ploidy-seq

From there, we can install all the software that we will need for the analysis and migrate our data from the CyVerse data store. All of this can be done by running the command:

bash setup.sh

At several points during the installation, we will be prompted to accept the installation of programs that we will use. Just hit the return key to accept the installation. Once the command is done running we will have a few things:

  • populated directories for all of our raw and reference data
  • a new environment built inside which all of our programs are installed called ploidy-seq
  • empty directories that will store temporary files used during the pipeline as well as our pipeline output We will activate our new environment by running:
source activate ploidy-seq

This gives us access to all the programs that we need to continue.

Now, we can test that everything is correctly installed and all of our data is living in the correct place:

snakemake -n

If no errors come up, we can start our analysis. Note that analysis will run on every file that is listed in config.json, and it may take a long time.

snakemake

All the output should end up in the folders htseq and fastqc and no intermediary files will be stored, to keep space use to a minimum.

Finally, move all files back to iPlant:

cd htseq/
iput -bf *.txt /iplant/home/jcoate/Arabidopsis/2017/HTSeq/

About

Data processing scripts for polyploid A.thaliana transcriptome analysis.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published