MPL-EM (MPL*) is a compiler for standard ML that implements support for nested (fork-join) parallelism. MPL-EM generates executables with excellent multicore performance because it utilizes the theory of disentanglement to manage memory of parallel programs. [1,2,3,4,5]. It builds on the MPL compiler.
Roughly speaking, the disentanglement property states that an object allocated by a thread may not be shared with a concurrently executing thread. MPL-EM exploits disentanglement at the granularity of memory objects. Specifically, it distinguishes between disentangled and entangled objects and handles disentangled objects very efficiently, while incurring modest overhead for entangled objects, which are rare.
We note, for the artifact reviewers, that MPL does not support entanglement but MPL-EM (the language presented in the paper) does. We are in the process of merging MPL-EM and MPL. In the meantime, we have two separate branches within the same repo. MPL is at the master branch and MPL-EM is at pldi23-artifact.
MPL extends SML with a number of primitives for parallelism and concurrency.
To directly run some examples,
please read examples/README.md
which provides instructions to run programs.
val par: (unit -> 'a) * (unit -> 'b) -> 'a * 'b
val parfor: int -> (int * int) -> (int -> unit) -> unit
val alloc: int -> 'a array
The par
primitive takes two functions to execute in parallel and
returns their results.
The parfor
primitive is a "parallel for loop". It takes a grain-size g
, a
range (i, j)
, and a function f
, and executes f(k)
in parallel for each
i <= k < j
. The grain-size g
is for manual granularity
control: parfor
splits the input range into approximately (j-i)/g
subranges,
each of size at most g
, and each subrange is processed sequentially. The
grain-size must be at least 1, in which case the loop is "fully parallel".
The alloc
primitive takes a length and returns a fresh, uninitialized array
of that size. Warning: To guarantee no errors, the programmer must be
careful to initialize the array before reading from it. alloc
is intended to
be used as a low-level primitive in the efficient implementation of
high-performance libraries. It is integrated with the scheduler and memory
management system to perform allocation in parallel and be safe-for-GC.
val compareAndSwap: 'a ref -> ('a * 'a) -> 'a
val arrayCompareAndSwap: ('a array * int) -> ('a * 'a) -> 'a
compareAndSwap r (x, y)
performs an atomic
CAS
which attempts to atomically swap the contents of r
from x
to y
,
returning the original value stored in r
before the CAS.
Polymorphic equality is determined
in the same way as MLton.eq, which is a
standard equality check for simple types (char
, int
, word
, etc.) and
a pointer equality check for other types (array
, string
, tuples, datatypes,
etc.). The semantics are a bit murky.
arrayCompareAndSwap (a, i) (x, y)
behaves the same as compareAndSwap
but
on arrays instead of references. This performs a CAS at index i
of array
a
, and does not read or write at any other locations of the array.
MPL uses .mlb
files (ML Basis) to describe
source files for compilation. A typical .mlb
file for MPL is shown
below. The first three lines of this file respectively load:
- The SML Basis Library
- The
ForkJoin
structure, as described above - The
MLton
structure, which includes the MPL extensionMLton.Parallel
as described above, as well as various MLton-specific features. Not all MLton features are supported (see "Unsupported MLton Features" below).
(* libraries *)
$(SML_LIB)/basis/basis.mlb
$(SML_LIB)/basis/fork-join.mlb
$(SML_LIB)/basis/mlton.mlb
(* your source files... *)
A.sml
B.sml
The command to compile a .mlb
is as follows. By default, MPL
produces an executable with the same base name as the source file, i.e.
this would create an executable named foo
:
$ mpl [compile-time options...] foo.mlb
MPL has a number of compile-time options derived from MLton, which are documented here. Note that MPL only supports C codegen and does not support profiling.
Some useful compile-time options are
-output <NAME>
Give a specific name to the produced executable.-default-type int64 -default-type word64
Use 64-bit integers and words by default.-debug true -debug-runtime true -keep g
For debugging, keeps the generated C files and uses the debug version of the runtime (with assertions enabled). The resulting executable is somewhat peruse-able with tools likegdb
.
For example:
$ mpl -default-type -int64 -output foo sources.mlb
MPL executables can take options at the command line that control the run-time system. The syntax is
$ <program> [@mpl [run-time options...] --] [program args...]
The runtime arguments must begin with @mpl
and end with --
, and these are
not visible to the program via
CommandLine.arguments.
Some useful run-time options are
procs <N>
UseN
worker threads to run the program.set-affinity
Pin worker threads to processors. Can be used in combination withaffinity-base <B>
andaffinity-stride <S>
to pin threadi
to processor numberB + S*i
.block-size <X>
Set the heap block size toX
bytes. This can be written with suffixes K, M, and G, e.g.64K
is 64 kilobytes. The block-size must be a multiple of the system page size (typically 4K). By default it is set to one page.
For example, the following runs a program foo
with a single command-line
argument bar
using 4 pinned processors.
$ foo @mpl procs 4 set-affinity -- bar
In general, the basis library has not yet been thoroughly scrubbed, and many functions may not be safe for parallelism (#41). Some known issues:
Int.toString
is racy when called in parallel.Real.fromString
may throw an error when called in parallel.
Many MLton-specific features are unsupported, including (but not limited to):
share
shareAll
size
Finalizable
Profile
Signal
Thread
(partially supported but not documented)Cont
(partially supported but not documented)Weak
World
[1] Hierarchical Memory Management for Parallel Programs. Ram Raghunathan, Stefan K. Muller, Umut A. Acar, and Guy Blelloch. ICFP 2016.
[2] Hierarchical Memory Management for Mutable State. Adrien Guatto, Sam Westrick, Ram Raghunathan, Umut Acar, and Matthew Fluet. PPoPP 2018.
[3] Disentanglement in Nested-Parallel Programs. Sam Westrick, Rohan Yadav, Matthew Fluet, and Umut A. Acar. POPL 2020.
[4] Provably Space-Efficient Parallel Functional Programming. Jatin Arora, Sam Westrick, and Umut A. Acar. POPL 2021.
[5] Entanglement Detection with Near-Zero Cost. Sam Westrick, Jatin Arora, and Umut A. Acar. ICFP 2022.