Simple scripts to perform basic FastX file manipulations
Copyright Jackson M. Tsuji, 2022
- All scripts require the
basic-sequence-analysis-version
helper script to run - A few scripts have additional dependencies (e.g.,
seqtk
,bbmap
, and so on). This is mentioned in the help statement of each script.
See the help statement at the start of each script by running the script. Many of the simpler ones support STDIN/STDOUT.
check_md5_hashes.sh
: tests if .fastq.gz files have the same MD5 hash as downloaded (e.g., from a webserver)download_NCBI_genomes.sh
: downloads genomes from NCBI given a list of search queriesfasta_dealign.sh
: dealigns fastA filefasta_U_T_swap.sh
: converts between U's and T's in FastA filesfastq_get_names.sh
: grab names from FastQ filefastx_subset.sh
: subset a batch of FastX files (FastA/FastQ) to a given number or proportion of readsphylogeny_builder_16S_rRNA.sh
: build a 16S rRNA gene tree subsetted to a particular part of the 16S gene, starting from unaligned sequencesseq_name_simplify.sh
: removes special characters and such from FastA filestext_find_and_replace.sh
: Find and replace specific text entries in an input text filepredict_short_orfs.sh
: Predict open reading frames (ORFs) from short read metagenome data by wrapping bbmap's reformat.sh and FGS++
test_data
: for testing some of the scripts. Still in progress.tutorials
: explaining how to install/use the more complex scripts. Still in progress.
These scripts are 'quick and dirty' and do not carefully check user input, so be careful to follow the usage instructions carefully or look at the code to see how the tools work (workflow is very straightforward for most). Please let me know if you have questions or run into issues.