This is a simple command line utility to deal with FASTA formatted biological sequence collections, inspired by goalign and built in with bashly.
The idea is to have very simple operations (mainly using awk
) so they can be executed on a per sequence basis, eliminating the need to load the whole file in memory. This allows the user to operate on very large FASTA files.
This tool is distributed as a shell script. So if you download the fastatools
script it should work seamlessly on *NIX
systems.
For now you can refer to the generated help message by using fastatools help
or fastatools [command] help
or the definition file.
This is designed to be pipeable, so the default IO is standard input and output. However for all commands the -i
or --input
flag can be used to specify an input and for most commands the -o
or --output
flag can be used to specify an output flag.
This is a short list presenting the avaiable fastatools
commands.
count
: Get the number of sequencesnames
: Get names of sequenceslength
: Get lengths of sequencesfreqs
: Get character frequencies in sequences
select
: Select sequences in FASTA file by namesubset
: Select sequences in FASTA file by indexhead
: Print first n sequencestail
: Print last n sequencessubsite
: Select specific sites in aligned sequences
upper
: Transform sequences to uppercaselower
: Transform sequences to lowercasepretty
: Pretty print FASTA file, wrapping sequences to desired widthrc
: Reverse complement sequences
rename
: Rename sequences in FASTA fileaddid
: Add an identified to each sequence name in a sequence name
split
: Split a fasta file into several fasta files
completion
: Generate BASH completion script (auto-generated by bashly)