-
Notifications
You must be signed in to change notification settings - Fork 3
Home
Latest release is version 0.6.2.
While every full-featured shell provides job control, it is only meant for manual, interactive handling of several jobs, and not much more. prll
(pronounced "parallel") was created to simplify a common task of running a large number of jobs a few at a time. Its distinguishing feature is the ability to run shell functions in the context of the current shell. See the features summary below for a quick overview.
If you have a bunch of files to process, a loop is what you need. However, if you have a multicore/multiprocessor machine, it is much more efficient to run as many processes in parallel as there are CPUs available. While a minor extension to the loop might be adequate, it is not the most efficient solution. This article describes how to do parallel execution using a loop, or using the shell's notion of a job, and the shortcomings of both methods. It also describes prll
's predecessor, which was called mapp
, and on which prll
is based. In the end, they do the same thing, but use different means of interprocess communication.
prll
is implemented as a shell function, with helper programs written in C
. While there are other ways to tackle the problem, like using the xargs
utility, and while many are "saner" in some sense, having a shell function has a distinct advantage: you don't need to write any scripts or programs. Implement your task as a shell function, and prll
will run it using the context of your current shell. This makes one-off commands possible without having to put them into script files, which would be too bothersome. As an example, to flip all photos in the current directory, just do
myfn() { mogrify -flip $1 ; }
prll myfn *.jpg
With version 0.3 or later, you can even do just
prll -s 'mogrify -flip $1' *.jpg
For comparison, here is the same thing with a non-parallel loop:
for i in *.jpg ; do
mogrify -flip $i
done
prll
also has xargs
-like ability to read standard input, with both newline and null separators, which enables processing of data that is harder to quote. The difference from xargs
is that prll
is fed a shell function, making interactive use easier. xargs
takes a simple command, and complex commands must be wrapped in a script or in bash -c
or such. Also, parallel execution in xargs
must be specified separately, while prll
reads the number of CPUs automatically. Not to mention that xargs
is prone to data loss when doing parallel execution while prll
features full output buffering and locking which prevents that. Please note that this is not a rant against xargs
. xargs
is simply not a tool for parallel execution, it is a tool for constructing argument lists for other programs, and cooperates with prll
.
The shell function you write can be anything. The manual has an example of a function that takes more than one argument. Also, if you use ssh
, preferrably with key-based authentication and ssh-agent
, you can use prll
to handle execution over several machines — an ad-hoc cluster (but see below for an alternative).
- Easy to use. Focuses on a single task and doesn't try to emulate a kitchen sink.
- Code is passed in shell functions to ease interactive use.
- Should work in all bourne-like shells and in several operating systems.
- Execution can be terminated gracefully, letting started jobs finish their work.
- Can be terminated from within the code it executes, easing aborting on errors or implementing an ad-hoc parallel search.
- Does internal buffering and locking to prevent mangling/interleaving of output from separate jobs.
- User-accessible locks to easily share resources between processes.
While there are many other programs available to run parallel jobs, none (to the author's knowledge) allow running shell functions. On the other hand, there are times when prll
simply isn't enough. Its networking abilities, for example, are limited by what the user manages to put into the function they execute. Using other computers on the network as additional processors (i.e. automatically fetching the number of physical CPUs from other computers and launching jobs accordingly) is not trivial.
When the task at hand requires a more featureful tool, you should take a look at GNU parallel
. It won't run shell functions, of course, but in a networked environment, that doesn't make much sense anyway.
- bourne-like shell, such as
bash
,zsh
,dash
and others (even busybox!) -
C
compiler, such asgcc
- GNU
make
- OS support for System V Message Queues and Semaphores
- device files
/dev/urandom
or/dev/random
- the
cat
utility - optional tests require utilites
ps
,tr
,grep
,sort
,split
,diff
anduname
- optional rebuilding of the man page requires
txt2man
These requirements should be satisfied by your system by default, excepting perhaps the compiler and its toolchain, which are not installed by default on systems such as Ubuntu Linux. Refer to your system's documentation on how to install missing programs.
Optionally (on Linux), the /proc/cpuinfo
file can be used to automatically determine the number of processors, but it is not mandatory.
prll
passes basic tests on the following Operating Systems: GNU/Linux, FreeBSD, OpenBSD, MacOS X, Solaris versions 8-10.
See the manual for the list of issues known at the time of release. There were no further bugs reported for the current version.