Skip to content
Jure Varlec edited this page Mar 13, 2015 · 1 revision

Latest release is version 0.6.2.

Descripton

While every full-featured shell provides job control, it is only meant for manual, interactive handling of several jobs, and not much more. prll (pronounced "parallel") was created to simplify a common task of running a large number of jobs a few at a time. Its distinguishing feature is the ability to run shell functions in the context of the current shell. See the features summary below for a quick overview.

If you have a bunch of files to process, a loop is what you need. However, if you have a multicore/multiprocessor machine, it is much more efficient to run as many processes in parallel as there are CPUs available. While a minor extension to the loop might be adequate, it is not the most efficient solution. This article describes how to do parallel execution using a loop, or using the shell's notion of a job, and the shortcomings of both methods. It also describes prll's predecessor, which was called mapp, and on which prll is based. In the end, they do the same thing, but use different means of interprocess communication.

prll is implemented as a shell function, with helper programs written in C. While there are other ways to tackle the problem, like using the xargs utility, and while many are "saner" in some sense, having a shell function has a distinct advantage: you don't need to write any scripts or programs. Implement your task as a shell function, and prll will run it using the context of your current shell. This makes one-off commands possible without having to put them into script files, which would be too bothersome. As an example, to flip all photos in the current directory, just do

myfn() { mogrify -flip $1 ; }
prll myfn *.jpg

With version 0.3 or later, you can even do just

prll -s 'mogrify -flip $1' *.jpg

For comparison, here is the same thing with a non-parallel loop:

for i in *.jpg ; do
  mogrify -flip $i
done

prll also has xargs-like ability to read standard input, with both newline and null separators, which enables processing of data that is harder to quote. The difference from xargs is that prll is fed a shell function, making interactive use easier. xargs takes a simple command, and complex commands must be wrapped in a script or in bash -c or such. Also, parallel execution in xargs must be specified separately, while prll reads the number of CPUs automatically. Not to mention that xargs is prone to data loss when doing parallel execution while prll features full output buffering and locking which prevents that. Please note that this is not a rant against xargs. xargs is simply not a tool for parallel execution, it is a tool for constructing argument lists for other programs, and cooperates with prll.

The shell function you write can be anything. The manual has an example of a function that takes more than one argument. Also, if you use ssh, preferrably with key-based authentication and ssh-agent, you can use prll to handle execution over several machines — an ad-hoc cluster (but see below for an alternative).

Summary of features

  • Easy to use. Focuses on a single task and doesn't try to emulate a kitchen sink.
  • Code is passed in shell functions to ease interactive use.
  • Should work in all bourne-like shells and in several operating systems.
  • Execution can be terminated gracefully, letting started jobs finish their work.
  • Can be terminated from within the code it executes, easing aborting on errors or implementing an ad-hoc parallel search.
  • Does internal buffering and locking to prevent mangling/interleaving of output from separate jobs.
  • User-accessible locks to easily share resources between processes.

Alternatives

While there are many other programs available to run parallel jobs, none (to the author's knowledge) allow running shell functions. On the other hand, there are times when prll simply isn't enough. Its networking abilities, for example, are limited by what the user manages to put into the function they execute. Using other computers on the network as additional processors (i.e. automatically fetching the number of physical CPUs from other computers and launching jobs accordingly) is not trivial.

When the task at hand requires a more featureful tool, you should take a look at GNU parallel. It won't run shell functions, of course, but in a networked environment, that doesn't make much sense anyway.

Requirements

  • bourne-like shell, such as bash, zsh, dash and others (even busybox!)
  • C compiler, such as gcc
  • GNU make
  • OS support for System V Message Queues and Semaphores
  • device files /dev/urandom or /dev/random
  • the cat utility
  • optional tests require utilites ps, tr, grep, sort, split, diff and uname
  • optional rebuilding of the man page requires txt2man

These requirements should be satisfied by your system by default, excepting perhaps the compiler and its toolchain, which are not installed by default on systems such as Ubuntu Linux. Refer to your system's documentation on how to install missing programs.

Optionally (on Linux), the /proc/cpuinfo file can be used to automatically determine the number of processors, but it is not mandatory.

prll passes basic tests on the following Operating Systems: GNU/Linux, FreeBSD, OpenBSD, MacOS X, Solaris versions 8-10.

Known issues / bugs

See the manual for the list of issues known at the time of release. There were no further bugs reported for the current version.

Clone this wiki locally