Skip to content

MPLLang/mpl

Repository files navigation

MPL

MPL* is a compiler for standard ML that implements support for nested (fork-join) parallelism. MPL* generates executables with excellent multicore performance because it utilizes the theory of disentanglement to manage memory of parallel programs. [1,2,3,4,5]. It builds on the MPL compiler.

Roughly speaking, the disentanglement property states that an object allocated by a thread may not be shared with a concurrently executing thread. MPL* exploits disentanglement at the granularity of memory objects. Specifically, it distinguishes between disentangled and entangled objects and handles disentangled objects very efficiently, while incurring modest overhead for entangled objects, which are rare.

We note, for the artifact reviewers, that MPL does not support entanglement but MPL* (the language presented in the paper) does. We are in the process of merging MPL* and MPL. In the meantime, we have two separate branches within the same repo. MPL is at the master branch and MPL* is at pldi23-artifact. In this tour of MPL*, we call it MPL because the difference between them is not important.

Parallel and Concurrent Extensions

MPL extends SML with a number of primitives for parallelism and concurrency. Take a look at examples/ to see these primitives in action and run MPL programs.

The ForkJoin Structure

val par: (unit -> 'a) * (unit -> 'b) -> 'a * 'b
val parfor: int -> (int * int) -> (int -> unit) -> unit
val alloc: int -> 'a array

The par primitive takes two functions to execute in parallel and returns their results.

The parfor primitive is a "parallel for loop". It takes a grain-size g, a range (i, j), and a function f, and executes f(k) in parallel for each i <= k < j. The grain-size g is for manual granularity control: parfor splits the input range into approximately (j-i)/g subranges, each of size at most g, and each subrange is processed sequentially. The grain-size must be at least 1, in which case the loop is "fully parallel".

The alloc primitive takes a length and returns a fresh, uninitialized array of that size. Warning: To guarantee no errors, the programmer must be careful to initialize the array before reading from it. alloc is intended to be used as a low-level primitive in the efficient implementation of high-performance libraries. It is integrated with the scheduler and memory management system to perform allocation in parallel and be safe-for-GC.

The MLton.Parallel Structure

val compareAndSwap: 'a ref -> ('a * 'a) -> 'a
val arrayCompareAndSwap: ('a array * int) -> ('a * 'a) -> 'a

compareAndSwap r (x, y) performs an atomic CAS which attempts to atomically swap the contents of r from x to y, returning the original value stored in r before the CAS. Polymorphic equality is determined in the same way as MLton.eq, which is a standard equality check for simple types (char, int, word, etc.) and a pointer equality check for other types (array, string, tuples, datatypes, etc.). The semantics are a bit murky.

arrayCompareAndSwap (a, i) (x, y) behaves the same as compareAndSwap but on arrays instead of references. This performs a CAS at index i of array a, and does not read or write at any other locations of the array.

Using MPL

MPL uses .mlb files (ML Basis) to describe source files for compilation. A typical .mlb file for MPL is shown below. The first three lines of this file respectively load:

  • The SML Basis Library
  • The ForkJoin structure, as described above
  • The MLton structure, which includes the MPL extension MLton.Parallel as described above, as well as various MLton-specific features. Not all MLton features are supported (see "Unsupported MLton Features" below).
(* libraries *)
$(SML_LIB)/basis/basis.mlb
$(SML_LIB)/basis/fork-join.mlb
$(SML_LIB)/basis/mlton.mlb

(* your source files... *)
A.sml
B.sml

Compiling a Program

The command to compile a .mlb is as follows. By default, MPL produces an executable with the same base name as the source file, i.e. this would create an executable named foo:

$ mpl [compile-time options...] foo.mlb

MPL has a number of compile-time options derived from MLton, which are documented here. Note that MPL only supports C codegen and does not support profiling.

Some useful compile-time options are

  • -output <NAME> Give a specific name to the produced executable.
  • -default-type int64 -default-type word64 Use 64-bit integers and words by default.
  • -debug true -debug-runtime true -keep g For debugging, keeps the generated C files and uses the debug version of the runtime (with assertions enabled). The resulting executable is somewhat peruse-able with tools like gdb.

For example:

$ mpl -default-type -int64 -output foo sources.mlb

Running a Program

MPL executables can take options at the command line that control the run-time system. The syntax is

$ <program> [@mpl [run-time options...] --] [program args...]

The runtime arguments must begin with @mpl and end with --, and these are not visible to the program via CommandLine.arguments.

Some useful run-time options are

  • procs <N> Use N worker threads to run the program.
  • set-affinity Pin worker threads to processors. Can be used in combination with affinity-base <B> and affinity-stride <S> to pin thread i to processor number B + S*i.
  • block-size <X> Set the heap block size to X bytes. This can be written with suffixes K, M, and G, e.g. 64K is 64 kilobytes. The block-size must be a multiple of the system page size (typically 4K). By default it is set to one page.

For example, the following runs a program foo with a single command-line argument bar using 4 pinned processors.

$ foo @mpl procs 4 set-affinity -- bar

Known Issues

Basis Library

In general, the basis library has not yet been thoroughly scrubbed, and many functions may not be safe for parallelism (#41). Some known issues:

  • Int.toString is racy when called in parallel.
  • Real.fromString may throw an error when called in parallel.

Unsupported MLton Features

Many MLton-specific features are unsupported, including (but not limited to):

  • share
  • shareAll
  • size
  • Finalizable
  • Profile
  • Signal
  • Thread (partially supported but not documented)
  • Cont (partially supported but not documented)
  • Weak
  • World

References

[1] Hierarchical Memory Management for Parallel Programs. Ram Raghunathan, Stefan K. Muller, Umut A. Acar, and Guy Blelloch. ICFP 2016.

[2] Hierarchical Memory Management for Mutable State. Adrien Guatto, Sam Westrick, Ram Raghunathan, Umut Acar, and Matthew Fluet. PPoPP 2018.

[3] Disentanglement in Nested-Parallel Programs. Sam Westrick, Rohan Yadav, Matthew Fluet, and Umut A. Acar. POPL 2020.

[4] Provably Space-Efficient Parallel Functional Programming. Jatin Arora, Sam Westrick, and Umut A. Acar. POPL 2021.

[5] Entanglement Detection with Near-Zero Cost. Sam Westrick, Jatin Arora, and Umut A. Acar. ICFP 2022.