The BEAM loader does not just take the external beam format and writes it to memory. It also does a number of transformations on the code and translates from the external (generic) format to the internal (specific) format.
The code for the loader can be found in beam_load.c (in erts/emulator/beam) but most of the logic for the translations are in the file ops.tab (in the same directory).
The first step of the loader is to parse beam file, basically the same work as we did in Erlang in [CH-beam_modules] but written in C.
Then the rules in ops.tab are applied to instructions in the code chunk to translate the generic instruction to one or more specific instructions.
The translation table works through pattern matching. Each line in the file defines a pattern of one or more generic instructions with arguments and optionally an arrow followed by one or more instructions to translate to.
The transformations in ops tab tries to handle patterns of instructions generated by the compiler and peephole optimize them to fewer specific instructions. The ops tab transformations tries to generate jump tables for patterns of selects.
The file ops.tab is not parsed at runtime, instead a pattern matching program is generated from ops.tab and stored in an array in a generated C file. The perl script beam_makeops (in erts/emulator/utils) generates a target specific set of opcodes and translation programs in the files beam_opcodes.h and beam_opcodes.c (these files end up in the given target directory e.g. erts/emulator/x86_64-unknown-linux-gnu/opt/smp/).
The same program (beam_makeops) also generates the Erlang code for the compiler back end beam_opcodes.erl.
The transformations in
ops.tab
are executed in the order that they are written in the file. So just like in
Erlang pattern matching, the different rules are triggered from top to bottom.
The types that ops.tab
uses for arguments in instructions can be found in
[AP-Instructions].
Most of the rules in ops.tab
are transformations between different
instructions. A simple transformation looks like this:
move S x==0 | return => move_return S
This combines a move
from any location to x(0)
and return into a single
instruction called move_return
. Let’s break the tranformation apart to
see what the different parts do.
- move
-
is the instruction that the pattern first has to match. This can be either a generic instruction that the compiler has emitted, or a temporary instruction that
ops.tab
has emitted to help with transformations. - S
-
is a variable binding any type of value. Any value in the pattern (left hand side or
⇒
) that is used in the generator (right hand side of⇒
) has to be bound to a variable. - x==0
-
is a guard that says that we only apply the transformation if the target location is an
x
register with the value0
. It is possible to chain multiple types and also bind a variable here. For instanceD=xy==0
would allow bothx
andy
registers with a value of0
and also bind the argument to the variableD
. - |
-
signifies the end of this instruction and the beginning of another instruction that is part of the same pattern.
- return
-
is the second instruction to match in this pattern.
⇒
-
signifies the end of the pattern and the start of the code that is to be generated.
- move_return S
-
is the name of the generated instruction together with the name of the variable on the lhs. It is possible to generate multiple instructions as part of a transformation by using the
|
symbol.
More complex translations can be done in ops.tab
. For instance take the
select_val
instruction. It will be translated by the loader into either a jump table, a linear
search array or a binary search array depending on the input values.
is_integer Fail=f S | select_val S=s Fail=f Size=u Rest=* | \ use_jump_tab(Size, Rest) => gen_jump_tab(S, Fail, Size, Rest)
The above transformation creates a jump table if possible of the select_val
.
There are a bunch of new techniques used in the transformations.
- S
-
is used in both
is_integer
andselect_val
. This means that both the values have to be of the same type and have the same value. Furthermore theS=s
guard limits the type to a be a source register. - Rest=*
-
allows a variable number of arguments in the instruction and binds them to variable
Rest
. - use_jump_tab(Size, Rest)
-
calls a the use_jump_tab C function in
beam_load.c
that decides whether the arguments in theselect_val
can be transformed into a jump table. - \
-
signifies that the transformation rule continues on the next line.
- gen_jump_tab(S, Fail, Size, Rest)
-
calls the gen_jump_tab C function in
beam_load.c
that takes care of generating the appropriate instruction.
When all transformations are done, we have to decide how the specifc instruction should
look like. Let’s continue to look at move_return
:
%macro: move_return MoveReturn -nonext move_return x move_return c move_return n
This will generate three different instructions that will use the
MoveReturn
macro in beam_emu.c
to do the work.
- %macro: move_return
-
this tells
ops.tab
to generate the code formove_return
. If there is no%macro
line, the instruction has to be implemented by hand in beam_emu.c. The code for the instruction will be places inbeam_hot.h
orbeam_cold.h
depending on if the%hot
or%cold
directive is active. - MoveReturn
-
tells the code generator to that the name of the c-macro in beam_emu.c to use is MoveReturn. This macro has to be implemented manually.
- -nonext
-
tells the code generator that it should not generate a dispatch to the next instruction, the
MoveReturn
macro will take care of that. - move_return x
-
tells the code generator to generate a specific instruction for when the instruction argument is an x register.
c
for when it is a constant,n
when it isNIL
. No instructions are in this case generated for when the argument is a y register as the compiler will never generate such code.
The resulting code in beam_hot.h
will look like this:
OpCase(move_return_c):
{
MoveReturn(Arg(0));
}
OpCase(move_return_n):
{
MoveReturn(NIL);
}
OpCase(move_return_x):
{
MoveReturn(xb(Arg(0)));
}
All the implementor has to do is to define the MoveReturn
macro in beam_emu.c
and
the instruction is complete.
The %macro
rules can take multiple different flags to modify the code that
gets generated.
The examples below assume that there is a specific instructions looking like this:
%macro move_call MoveCall move_call x f
without any flags to the %macro
we the following code will be generated:
BeamInstr* next;
PreFetch(2, next);
MoveCall(Arg(0));
NextPF(2, next);
Note
|
The PreFetch and NextPF macros make sure to load the address to jump to next before the instruction is executed. This trick increases performance on all architectures by a variying amount depending on cache architecture and super scalar properties of the CPU. |
- -nonext
-
Don’t emit a dispatch for this instructions. This is used for instructions that are known to not continue with the next instructions, i.e. return, call, jump.
%macro move_call MoveCall -nonext
MoveCall(xb(Arg(0)));
- -arg_*
-
Include the arguments of type * as arguments to the c-macro. Not all argument types are included by default in the c-macro. For instance the type
f
used for fail labels and local function calls is not included. So giving the option-arg_f
will include that as an argument to the c-macro.
%macro move_call MoveCall -arg_f
MoveCall(xb(Arg(0)), Arg(1));
- -size
-
Include the size of the instruction as an argument to the c-macro.
%macro move_call MoveCall -size
MoveCall(xb(Arg(0)), 2);
- -pack
-
Pack any arguments if possible. This places multiple register arguments in the same word if possible. As register arguments can only be 0-1024, we only need 10 bits to store them + 2 for tagging. So on a 32-bit system we can put 2 registers in one word, while on a 64-bit we can put 4 registers in one word. Packing instruction can greatly decrease the memory used for a single instruction. However there is also a small cost to unpack the instruction, which is why it is not enabled for all instructions.
The example with the call cannot do any packing as f
cannot be packed and only one
other argument exists. So let’s look at the
put_list
instruction as an example instead.
%macro:put_list PutList -pack put_list x x x
BeamInstr tmp_packed1;
BeamInstr* next;
PreFetch(1, next);
tmp_packed1 = Arg(0);
PutList(xb(tmp_packed1&BEAM_TIGHT_MASK),
xb((tmp_packed1>>BEAM_TIGHT_SHIFT)&BEAM_TIGHT_MASK),
xb((tmp_packed1>>(2*BEAM_TIGHT_SHIFT))));
NextPF(1, next);
This packs the 3 arguments into 1 machine word, which halves the required memory for this instruction.
- -fail_action
-
Include a fail action as an argument to the c-macro. Note that the
ClauseFail()
macro assumes the fail label is in the first argument of the instructions, so in order to use this in the above example we should transform themove_call x f
tomove_call f x
.
%macro move_call MoveCall -fail_action
MoveCall(xb(Arg(0)), ClauseFail());
- -gen_dest
-
Include a store function as an argument to the c-macro.
%macro move_call MoveCall -gen_dest
MoveCall(xb(Arg(0)), StoreSimpleDest);
- -goto
-
Replace the normal next dispatch with a jump to a c-label inside beam_emu.c
%macro move_call MoveCall -goto:do_call
MoveCall(xb(Arg(0)));
goto do_call;
The loader performs many peephole optimizations when loading the code. The most important ones are instruction combining and instruction specialization.
Instruction combining is the joining of two or more smaller instructions into one larger instruction. This can lead to a large speed up of the code if the instructions are known to follow each other most of the time. The speed up is achieved because there is no longer any need to do a dispatch inbetween the instructions, and also the C compiler gets more information to work with when it is optimizing that instruction. When to do instruction combining is a trade-off where one has to consider the impact the increased size of the main emulator loop has vs the gain when the instruction is executed.
Instruction specialization removes the need to decode the arguments in an instruction.
So instead of having one move_sd
instruction, move_xx
, move_xy
etc are generated
with the arguments already decoded. This reduces the decode cost of the instructions,
but again it is a tradeoff vs emulator code size.
The select_val
instruction is emitted by the compiler to do control flow handling
of many functions or case clauses. For instance:
select(1) -> 3;
select(2) -> 3;
select(_) -> error.
compiles to:
{function, select, 1, 2}.
{label,1}.
{line,[{location,"select.erl",5}]}.
{func_info,{atom,select},{atom,select},1}.
{label,2}.
{test,is_integer,{f,4},[{x,0}]}.
{select_val,{x,0},{f,4},{list,[{integer,2},{f,3},{integer,1},{f,3}]}}.
{label,3}.
{move,{integer,3},{x,0}}.
return.
{label,4}.
{move,{atom,error},{x,0}}.
return.
The values in the condition are only allowed to be either integers
or atoms. If the value is of any other type the compiler will not emit a
select_val
instruction. The loader uses a couple of hearistics to figure
out what type algorithm to use when doing the select_val
.
- jump_on_val
-
Create a jump table and use the value as the index. This if very efficient and happens when a group of close together integers are used as the value to select on. If not all values are present, the jump table is padded with extra fail label slots.
- select_val2
-
Used when only two values are to be selected upon and they to not fit in a jump table.
- select_val_lins
-
Do a linear search of the sorted atoms or integers. This is used when a small amount of atoms or integers are to be selected from.
- select_val_bins
-
Do a binary search of the sorted atoms or integers.
When a literal is loaded and used as an argument to any of the bifs or instructions that need a hashed value of the literal, instead of hashing the literal value every time, the hash is created by the loader and used by the instructions.
Examples of code using this technique is maps instructions and also the process dictionary bifs.