Skip to content

Latest commit

 

History

History
356 lines (273 loc) · 13.5 KB

beam_loader.asciidoc

File metadata and controls

356 lines (273 loc) · 13.5 KB

The BEAM Loader

Transforming from Generic to Specific instructions

The BEAM loader does not just take the external beam format and writes it to memory. It also does a number of transformations on the code and translates from the external (generic) format to the internal (specific) format.

The code for the loader can be found in beam_load.c (in erts/emulator/beam) but most of the logic for the translations are in the file ops.tab (in the same directory).

The first step of the loader is to parse beam file, basically the same work as we did in Erlang in [CH-beam_modules] but written in C.

Then the rules in ops.tab are applied to instructions in the code chunk to translate the generic instruction to one or more specific instructions.

The translation table works through pattern matching. Each line in the file defines a pattern of one or more generic instructions with arguments and optionally an arrow followed by one or more instructions to translate to.

The transformations in ops tab tries to handle patterns of instructions generated by the compiler and peephole optimize them to fewer specific instructions. The ops tab transformations tries to generate jump tables for patterns of selects.

The file ops.tab is not parsed at runtime, instead a pattern matching program is generated from ops.tab and stored in an array in a generated C file. The perl script beam_makeops (in erts/emulator/utils) generates a target specific set of opcodes and translation programs in the files beam_opcodes.h and beam_opcodes.c (these files end up in the given target directory e.g. erts/emulator/x86_64-unknown-linux-gnu/opt/smp/).

The same program (beam_makeops) also generates the Erlang code for the compiler back end beam_opcodes.erl.

Understanding ops.tab

The transformations in ops.tab are executed in the order that they are written in the file. So just like in Erlang pattern matching, the different rules are triggered from top to bottom.

The types that ops.tab uses for arguments in instructions can be found in [AP-Instructions].

Transformations

Most of the rules in ops.tab are transformations between different instructions. A simple transformation looks like this:

move S x==0 | return => move_return S

This combines a move from any location to x(0) and return into a single instruction called move_return. Let’s break the tranformation apart to see what the different parts do.

move

is the instruction that the pattern first has to match. This can be either a generic instruction that the compiler has emitted, or a temporary instruction that ops.tab has emitted to help with transformations.

S

is a variable binding any type of value. Any value in the pattern (left hand side or ) that is used in the generator (right hand side of ) has to be bound to a variable.

x==0

is a guard that says that we only apply the transformation if the target location is an x register with the value 0. It is possible to chain multiple types and also bind a variable here. For instance D=xy==0 would allow both x and y registers with a value of 0 and also bind the argument to the variable D.

|

signifies the end of this instruction and the beginning of another instruction that is part of the same pattern.

return

is the second instruction to match in this pattern.

signifies the end of the pattern and the start of the code that is to be generated.

move_return S

is the name of the generated instruction together with the name of the variable on the lhs. It is possible to generate multiple instructions as part of a transformation by using the | symbol.

A more complex example

More complex translations can be done in ops.tab. For instance take the select_val instruction. It will be translated by the loader into either a jump table, a linear search array or a binary search array depending on the input values.

is_integer Fail=f S | select_val S=s Fail=f Size=u Rest=* | \
  use_jump_tab(Size, Rest) => gen_jump_tab(S, Fail, Size, Rest)

The above transformation creates a jump table if possible of the select_val. There are a bunch of new techniques used in the transformations.

S

is used in both is_integer and select_val. This means that both the values have to be of the same type and have the same value. Furthermore the S=s guard limits the type to a be a source register.

Rest=*

allows a variable number of arguments in the instruction and binds them to variable Rest.

use_jump_tab(Size, Rest)

calls a the use_jump_tab C function in beam_load.c that decides whether the arguments in the select_val can be transformed into a jump table.

\

signifies that the transformation rule continues on the next line.

gen_jump_tab(S, Fail, Size, Rest)

calls the gen_jump_tab C function in beam_load.c that takes care of generating the appropriate instruction.

Specific instruction

When all transformations are done, we have to decide how the specifc instruction should look like. Let’s continue to look at move_return:

%macro: move_return MoveReturn -nonext
move_return x
move_return c
move_return n

This will generate three different instructions that will use the MoveReturn macro in beam_emu.c to do the work.

%macro: move_return

this tells ops.tab to generate the code for move_return. If there is no %macro line, the instruction has to be implemented by hand in beam_emu.c. The code for the instruction will be places in beam_hot.h or beam_cold.h depending on if the %hot or %cold directive is active.

MoveReturn

tells the code generator to that the name of the c-macro in beam_emu.c to use is MoveReturn. This macro has to be implemented manually.

-nonext

tells the code generator that it should not generate a dispatch to the next instruction, the MoveReturn macro will take care of that.

move_return x

tells the code generator to generate a specific instruction for when the instruction argument is an x register. c for when it is a constant, n when it is NIL. No instructions are in this case generated for when the argument is a y register as the compiler will never generate such code.

The resulting code in beam_hot.h will look like this:

OpCase(move_return_c):
    {
    MoveReturn(Arg(0));
    }

OpCase(move_return_n):
    {
    MoveReturn(NIL);
    }

OpCase(move_return_x):
    {
    MoveReturn(xb(Arg(0)));
    }

All the implementor has to do is to define the MoveReturn macro in beam_emu.c and the instruction is complete.

Macro flags

The %macro rules can take multiple different flags to modify the code that gets generated.

The examples below assume that there is a specific instructions looking like this:

%macro move_call MoveCall
move_call x f

without any flags to the %macro we the following code will be generated:

BeamInstr* next;
PreFetch(2, next);
MoveCall(Arg(0));
NextPF(2, next);
Note
The PreFetch and NextPF macros make sure to load the address to jump to next before the instruction is executed. This trick increases performance on all architectures by a variying amount depending on cache architecture and super scalar properties of the CPU.
-nonext

Don’t emit a dispatch for this instructions. This is used for instructions that are known to not continue with the next instructions, i.e. return, call, jump.

%macro move_call MoveCall -nonext

MoveCall(xb(Arg(0)));
-arg_*

Include the arguments of type * as arguments to the c-macro. Not all argument types are included by default in the c-macro. For instance the type f used for fail labels and local function calls is not included. So giving the option -arg_f will include that as an argument to the c-macro.

%macro move_call MoveCall -arg_f

MoveCall(xb(Arg(0)), Arg(1));
-size

Include the size of the instruction as an argument to the c-macro.

%macro move_call MoveCall -size

MoveCall(xb(Arg(0)), 2);
-pack

Pack any arguments if possible. This places multiple register arguments in the same word if possible. As register arguments can only be 0-1024, we only need 10 bits to store them + 2 for tagging. So on a 32-bit system we can put 2 registers in one word, while on a 64-bit we can put 4 registers in one word. Packing instruction can greatly decrease the memory used for a single instruction. However there is also a small cost to unpack the instruction, which is why it is not enabled for all instructions.

The example with the call cannot do any packing as f cannot be packed and only one other argument exists. So let’s look at the put_list instruction as an example instead.

%macro:put_list PutList -pack
put_list x x x
BeamInstr tmp_packed1;
BeamInstr* next;
PreFetch(1, next);
tmp_packed1 = Arg(0);
PutList(xb(tmp_packed1&BEAM_TIGHT_MASK),
        xb((tmp_packed1>>BEAM_TIGHT_SHIFT)&BEAM_TIGHT_MASK),
        xb((tmp_packed1>>(2*BEAM_TIGHT_SHIFT))));
NextPF(1, next);

This packs the 3 arguments into 1 machine word, which halves the required memory for this instruction.

-fail_action

Include a fail action as an argument to the c-macro. Note that the ClauseFail() macro assumes the fail label is in the first argument of the instructions, so in order to use this in the above example we should transform the move_call x f to move_call f x.

%macro move_call MoveCall -fail_action

MoveCall(xb(Arg(0)), ClauseFail());
-gen_dest

Include a store function as an argument to the c-macro.

%macro move_call MoveCall -gen_dest

MoveCall(xb(Arg(0)), StoreSimpleDest);
-goto

Replace the normal next dispatch with a jump to a c-label inside beam_emu.c

%macro move_call MoveCall -goto:do_call

MoveCall(xb(Arg(0)));
goto do_call;

Optimizations

The loader performs many peephole optimizations when loading the code. The most important ones are instruction combining and instruction specialization.

Instruction combining is the joining of two or more smaller instructions into one larger instruction. This can lead to a large speed up of the code if the instructions are known to follow each other most of the time. The speed up is achieved because there is no longer any need to do a dispatch inbetween the instructions, and also the C compiler gets more information to work with when it is optimizing that instruction. When to do instruction combining is a trade-off where one has to consider the impact the increased size of the main emulator loop has vs the gain when the instruction is executed.

Instruction specialization removes the need to decode the arguments in an instruction. So instead of having one move_sd instruction, move_xx, move_xy etc are generated with the arguments already decoded. This reduces the decode cost of the instructions, but again it is a tradeoff vs emulator code size.

select_val optimizations

The select_val instruction is emitted by the compiler to do control flow handling of many functions or case clauses. For instance:

select(1) -> 3;
select(2) -> 3;
select(_) -> error.

compiles to:

{function, select, 1, 2}.
  {label,1}.
    {line,[{location,"select.erl",5}]}.
    {func_info,{atom,select},{atom,select},1}.
  {label,2}.
    {test,is_integer,{f,4},[{x,0}]}.
    {select_val,{x,0},{f,4},{list,[{integer,2},{f,3},{integer,1},{f,3}]}}.
  {label,3}.
    {move,{integer,3},{x,0}}.
    return.
  {label,4}.
    {move,{atom,error},{x,0}}.
    return.

The values in the condition are only allowed to be either integers or atoms. If the value is of any other type the compiler will not emit a select_val instruction. The loader uses a couple of hearistics to figure out what type algorithm to use when doing the select_val.

jump_on_val

Create a jump table and use the value as the index. This if very efficient and happens when a group of close together integers are used as the value to select on. If not all values are present, the jump table is padded with extra fail label slots.

select_val2

Used when only two values are to be selected upon and they to not fit in a jump table.

select_val_lins

Do a linear search of the sorted atoms or integers. This is used when a small amount of atoms or integers are to be selected from.

select_val_bins

Do a binary search of the sorted atoms or integers.

pre-hashing of literals

When a literal is loaded and used as an argument to any of the bifs or instructions that need a hashed value of the literal, instead of hashing the literal value every time, the hash is created by the loader and used by the instructions.

Examples of code using this technique is maps instructions and also the process dictionary bifs.