Beam has two different instructions sets, an internal instructions set, called specific, and an external instruction set, called generic.
The generic instruction set is what could be called the official instruction set, this is the set of instructions used by both the compiler and the Beam interpreter. If there was an official Erlang Virtual Machine specification it would specify this instruction set. If you want to write your own compiler to the Beam, this is the instruction set you should target. If you want to write your own EVM this is the instruction set you should handle.
The external instruction set is quite stable, but it does change between Erlang versions, especially between major versions.
This is the instruction set which we will cover in this chapter.
The other instruction set, the specific, is an optimized instruction set used by the Beam to implement the external instruction set. To give you an understanding of how the Beam works we will cover this instruction set in [CH-Internal_instructions]. The internal instruction set can change without warning between minor version or even in patch releases. Basing any tool on the internal instruction set is risky.
In this chapter I will go through the general syntax for the instructions and some instruction groups in detail, a complete list of instructions with short descriptions can be found in [AP-Instructions].
The names and opcodes of the generic instructions are defined in lib/compiler/src/genop.tab.
The file contains a version number for the Beam instruction format, which also is written to .beam files. This number has so far never changed and is still at version 0. If the external format would be changed in a non backwards compatible way this number would be changed.
The file genop.tab is used as input by beam_makeops which is a perl script which generate code from the ops tabs. The generator is used both to generate Erlang code for the compiler (beam_opcodes.hrl and beam_opcodes.erl) and to generate C code for the emulator ( TODO: Filenames).
Any line in the file starting with "#" is a comment and ignored by beam_makeops. The file can contain definitions, which turns into a binding in the perl script, of the form:
NAME=EXPR
Like, e.g.:
BEAM_FORMAT_NUMBER=0
The Beam format number is the same as the instructionset field in the external beam format. It is only bumped when a backwards incompatible change to the instruction set is made.
The main content of the file are the opcode definitions of the form:
OPNUM: [-]NAME/ARITY
Where OPNUM and ARITY are integers, NAME is an identifier starting with a lowercase letter (a-z), and :, -, and / are literals.
For example:
1: label/1
The minus sign (-) indicates a deprecated function. A deprecated function keeps its opcode in order for the loader to be sort of backwards compatible (it will recognize deprecated instructions and refuse to load the code).
In the rest of this Chapter we will go through some BEAM instructions in detail. For a full list with brief descriptions see: [AP-Instructions].
As we saw in [CH-Compiler] we can give the option 'S' to the Erlang compiler to get a .S file with the BEAM code for the module in a human and machine readable format (actually as Erlang terms).
Given the file beamexample1.erl:
-module(beamexample1).
-export([id/1]).
id(I) when is_integer(I) -> I.
When compiled with erlc -S beamexample.erl we get the following beamexmaple.S file:
{module, beamexample1}. %% version = 0
{exports, [{id,1},{module_info,0},{module_info,1}]}.
{attributes, []}.
{labels, 7}.
{function, id, 1, 2}.
{label,1}.
{line,[{location,"beamexample1.erl",5}]}.
{func_info,{atom,beamexample1},{atom,id},1}.
{label,2}.
{test,is_integer,{f,1},[{x,0}]}.
return.
{function, module_info, 0, 4}.
{label,3}.
{line,[]}.
{func_info,{atom,beamexample1},{atom,module_info},0}.
{label,4}.
{move,{atom,beamexample1},{x,0}}.
{line,[]}.
{call_ext_only,1,{extfunc,erlang,get_module_info,1}}.
{function, module_info, 1, 6}.
{label,5}.
{line,[]}.
{func_info,{atom,beamexample1},{atom,module_info},1}.
{label,6}.
{move,{x,0},{x,1}}.
{move,{atom,beamexample1},{x,0}}.
{line,[]}.
{call_ext_only,2,{extfunc,erlang,get_module_info,2}}.
In addition to the actual beam code for the integer identity function we also get some meta instructions.
The first line {module, beamexample1}. %% version = 0 tells us the module name "beamexample1" and the version number for the instruction set "0".
Then we get a list of exported functions "id/1, module_info/0, module_info/1". As we can see the compiler has added two auto generated functions to the code. These two functions are just dispatchers to the generic module info BIFs (erlang:module_info/1 and erlang:module_info/2) with the name of the module added as the first argument.
The line {attributes, []} list all defined compiler attributes, none in our case.
Then we get to know that there are less than 7 labels in the module, {labels, 7}, which makes it easy to do code loading in one pass.
The last type of meta instruction is the function instruction on the format {function, Name, Arity, StartLabel}. As we can see with the id function the start label is actually the second label in the code of the function.
The instruction {label, N} is not really an instruction, it does not take up any space in memory when loaded. It is just to give a local name (or number) to a position in the code. Each label potentially marks the beginning of a basic block since it is a potential destination of a jump.
The first two instructions following the first label ({label,1}) are actually error generating code which adds the line number and module, function and arity information and throws an exception. That are the instructions line and func_info.
The meat of the function is after {label,2}, the instruction {test,is_integer,{f,1},[{x,0}]}. The test instruction tests if its arguments (in the list at the end, that is variable {x,0} in this case) fulfills the test, in this case is an integer (is_integer). If the test succeeds the next instruction (return) is executed. Otherwise the functions fails to label 1 ({f,1}), that is, execution continues at label one where a function clause exception is thrown.
The other two functions in the file are auto generated. If we look at the second function the instruction {move,{x,0},{x,1}} moves the argument in register x0 to the second argument register x1. Then the instruction {move,{atom,beamexample1},{x,0}} moves the module name atom to the first argument register x1. Finally a tail call is made to erlang:get_module_info/2 ({call_ext_only,2,{extfunc,erlang,get_module_info,2}}). As we will see in the next section there are several different call instructions.
As we have seen in [CH-Calls] there are several different types of calls in Erlang. To distinguish between local and remote calls in the instruction set, remote calls have _ext in their instruction names. Local calls just have a label in the code of the module, while remote calls takes a destination of the form {extfunc, Module, Function, Arity}.
To distinguish between ordinary (stack building) calls and tail-recursive calls, the latter have either _only or _last in their name. The variant with _last will also deallocate as many stack slot as given by the last argument.
There is also a call_fun Arity instruction that calls the closure stored in register {x, Arity}. The arguments are stored in x0 to {x, Arity-1}.
For a full listing of all types of call instructions see [AP-Instructions].
The stack and the heap of an Erlang process on Beam share the same memory area see [CH-Processes] and [CH-Memory] for a full discussion. The stack grows toward lower addresses and the heap toward higher addresses. Beam will do a garbage collection if more space than what is available is needed on either the stack or the heap.
- A leaf function
-
A leaf function is a function which doesn’t call any other function.
- A non leaf function
-
A non leaf function is a function which may call another function.
On entry to a non leaf function the continuation pointer (CP) is saved on the stack, and on exit it is read back from the stack. This is done by the allocate and deallocate instructions, which are used for setting up and tearing down the stack frame for the current instruction.
A function skeleton for a leaf function looks like this:
{function, Name, Arity, StartLabel}.
{label,L1}.
{func_info,{atom,Module},{atom,Name},Arity}.
{label,L2}.
...
return.
A function skeleton for a non leaf function looks like this:
{function, Name, Arity, StartLabel}.
{label,L1}.
{func_info,{atom,Module},{atom,Name},Arity}.
{label,L2}.
{allocate,Need,Live}.
...
call ...
...
{deallocate,Need}.
return.
The instruction allocate StackNeed Live saves the continuation pointer (CP) and allocate space for StackNeed extra words on the stack. If a GC is needed during allocation save Live number of X registers. E.g. if Live is 2 then registers X0 and X1 are saved.
When allocating on the stack, the stack pointer (E) is decreased.
Before After | xxx | | xxx | E -> | xxx | | xxx | | | | ??? | caller save slot ... E -> | CP | ... ... HTOP -> | | HTOP -> | | | xxx | | xxx | options: - ".*": {fill: [[0.7, 0.7, 0.7], no-shadow], frame: [[0.9, 0.9, 0.9], line]} - ".*": {text : ["Monospace 10", no-shadow]}
For a full listing of all types of allocate and deallocate instructions see [AP-Instructions].
Sending a message is straight forward in beam code. You just use the send instruction. Note though that the send instruction does not take any arguments, it is more like a function call. It assumes that the arguments (the destination and the message) are in the argument registers X0 and X1. The message is also copied from X1 to X0.
Receiving a message is a bit more complicated since it involves both selective receive with pattern matching and introduces a yield/resume point within a function body. (There is also a special feature to minimize message queue scanning using refs, more on that later.)
A minimal receive loop, which accepts any message and has no timeout (e.g. receive _ -> ok end) looks like this in BEAM code:
{label,2}.
{wait,{f,1}}.
{label,1}.
{loop_rec,{f,2},{x,0}}.
remove_message.
{jump,{f,3}}.
{label,2}.
{wait,{f,1}}.
{label,3}.
...
The loop_rec L2 x0 instruction first checks if there is any message in the message queue. If there are no messages execution jumps to L2, where the process will be suspended waiting for a message to arrive.
If there is a message in the message queue the loop_rec instruction also moves the message from the m-buf to the process heap. See [CH-Memory] and [CH-Processes] for details of the m-buf handling.
For code like receive _ -> ok end, where we accept any messages, there is no pattern matching needed, we just do a remove_message which unlinks the next message from the message queue. (It also removes any timeout, more on this soon.)
For a selective receive like e.g. receive [] -> ok end we will loop over the message queue to check if any message in the queue matches.
{label,1}.
{loop_rec,{f,3},{x,0}}.
{test,is_nil,{f,2},[{x,0}]}.
remove_message.
{jump,{f,4}}.
{label,2}.
{loop_rec_end,{f,1}}.
{label,3}.
{wait,{f,1}}.
{label,4}.
...
In this case we do a pattern match for Nil after the loop_rec instruction if there was a message in the mailbox. If the message doesn’t match we end up at L3 where the instruction loop_rec_end advances the save pointer to the next message (p->msg.save = &(*p->msg.save)->next) and jumps back to L2.
If there are no more messages in the message queue the process is suspended by the wait instruction at L4 with the save pointer pointing to the end of the message queue. When the processes is rescheduled it will only look at new messages in the message queue (after the save point).
If we add a timeout to our selective receive the wait instruction is replaced by a wait_timeout instruction followed by a timeout instruction and the code following the timeout.
{label,1}.
{loop_rec,{f,3},{x,0}}.
{test,is_nil,{f,2},[{x,0}]}.
remove_message.
{jump,{f,4}}.
{label,2}.
{loop_rec_end,{f,1}}.
{label,3}.
{wait_timeout,{f,1},{integer,1000}}.
timeout.
{label,4}.
...
The wait_timeout instructions sets up a timeout timer with the given time (1000 ms in our example) and it also saves the address of the next instruction (the timeout) in p->def_arg_reg[0] and then when the timer is set, p->i is set to point to def_arg_reg.
This means that if no matching message arrives while the process is suspended a timeout will be triggered after 1 second and execution for the process will continue at the timeout instruction.
Note that if a message that doesn’t match arrives in the mailbox, the process is scheduled for execution and will run the pattern matching code in the receive loop, but the timeout will not be canceled. It is the remove_message code which also removes any timeout timer.
The timeout instruction resets the save point of the mailbox to the first element in the queue, and clears the timeout flag (F_TIMO) from the PCB.
We have now come to the last version of our receive loop, where we use the ref trick alluded to earlier to avoid a long message box scan.
A common pattern in Erlang code is to implement a type of "remote call" with send and a receive between two processes. This is for example used by gen_server. This code is often hidden behind a library of ordinary function calls. E.g., you call the function counter:increment(Counter) and behind the scene this turns into something like Counter ! {self(), inc}, receive {Counter, Count} -> Count end.
This is usually a nice abstraction to encapsulate state in a process. There is a slight problem though when the mailbox of the calling process has many messages in it. In this case the receive will have to check each message in the mailbox to find out that no message except the last matches the return message.
This can quite often happen if you have a server that receives many messages and for each message does a number of such remote calls, if there is no back throttle in place the servers message queue will fill up.
To remedy this there is a hack in ERTS to recognize this pattern and avoid scanning the whole message queue for the return message.
The compiler recognizes code that uses a newly created reference (ref) in a receive (see [ref_trick_code]), and emits code to avoid the long inbox scan since the new ref can not already be in the inbox.
Ref = make_ref(),
Counter ! {self(), inc, Ref},
receive
{Ref, Count} -> Count
end.
This gives us the following skeleton for a complete receive, see [ref_receive].
{recv_mark,{f,3}}.
{call_ext,0,{extfunc,erlang,make_ref,0}}.
...
send.
{recv_set,{f,3}}.
{label,3}.
{loop_rec,{f,5},{x,0}}.
{test,is_tuple,{f,4},[{x,0}]}.
...
{test,is_eq_exact,{f,4},[{x,1},{y,0}]}.
...
remove_message.
...
{jump,{f,6}}.
{label,4}.
{loop_rec_end,{f,3}}.
{label,5}.
{wait,{f,3}}.
{label,6}.
The recv_mark instruction saves the current position (the end msg.last) in msg.saved_last and the address of the label in msg.mark
The recv_set instruction checks that msg.mark points to the next instruction and in that case moves the save point (msg.save) to the last message received before the creation of the ref (msg.saved_last). If the mark is invalid ( i.e. not equal to msg.save) the instruction does nothing.