Skip to content

A workflow and toolkit of perl scripts for selecting rev-env variants for synthesis

Notifications You must be signed in to change notification settings

MullinsLab/AMP-pl-variant-selection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AMP-pl-variant-selection

A collection of perl scripts for selecting rev-env (RE) variants for synthesis using a 9.5% AA frequency cutoff for gp160. A consensus rev is appended to this sequence.

IMPORTANT: This workflow will only process rev-env-nef (REN) amplicons in its current state. Processing for the gag-pol (GP) amplicons may be pursued in the future.

Setup and Usage

A snakemake framework is provided for running the scripts. The directory postproc/ should contain the following subdirectory structure:

postproc/
  dataset-1/
    sample-1/
      sample-1.fasta
    sample-2/
      sample-2.fasta
  dataset-2/
    sample-1/
      sample-1.fasta
    ...

The input .fasta files (i.e. sample-1.fasta, sample-2.fasta above) should be single genome nucleotide sequences. Whether the input is aligned or not will not affect the result, though you may get a warning message from BLAST regarding gap characters. The input sequences can be uppercase or lowercase.

If you're running the porpid-postproc workflow you can also just copy over the postproc/ directory. If you run into issues targeting the right sequence collections, you can always edit the input line of the variant analysis rule in the Snakefile:

rule analyze_variants:
    input:
        "postproc/{dataset}/{sample}/{sample}.fasta" #add porpid_postproc() here if running subworkflow
    params:
        p = variant_params
...

Just include your suffix of choice after the last {sample} variable.

Next, specify what samples you want to run in the config.yaml file, i.e

dataset-1:
  - sample-1
  - sample-2
dataset-2:
  - sample-1

Finally, verify that the path to blast at the top of the Snakefile is correct.

configfile: "config.yaml"
BLAST = "/usr/local/bin" #configure path to blast executable
DATASETS = [d for d in config for s in config[d]]
SAMPLES = [s for d in config for s in config[d]]

Execute by running snakemake via command line. The results will be output to the variant_analysis/ directory. The principal output files will be the .fasta file selecting for synthesis (which end in a variant number like *00[0-9]_s.fasta) and a log file ending in *_log.txt. There will also be a sample_summary.csv file for each submitted dataset with relevant statistics for each sample.

About

A workflow and toolkit of perl scripts for selecting rev-env variants for synthesis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published