The Erlang RunTime System (ERTS) is a complex system with many interdependent components. It is written in a very portable way so that it can run on anything from a gum stick computer to the largest multicore system with terabytes of memory. In order to be able to optimize the performance of such a system for your application, you need to not only know your application, but you also need to have a thorough understanding of ERTS itself.
There is a difference between any Erlang Runtime System and a specific implementation of an Erlang Runtime System. "Erlang/OTP" by Ericsson is the de facto standard implementation of Erlang and the Erlang Runtime System. In this book I will refer to this implementation as ERTS or spelled out Erlang RunTime System with a capital T. (See ERTS for a definition of OTP).
There is no official definition of what an Erlang Runtime System is, or what an Erlang Virtual Machine is. You could sort of imagine what such an ideal Platonic system would look like by taking ERTS and removing all the implementation specific details. This is unfortunately a circular definition, since you need to know the general definition to be able to identify an implementation specific detail. In the Erlang world we are usually too pragmatic to worry about this.
We will try to use the term Erlang Runtime System to refer to the general idea of any Erlang Runtime System as opposed to the specific implementation by Ericsson which we call the Erlang RunTime System or usually just ERTS.
Note This book is mostly a book about ERTS in particular and only to a small extent about any general Erlang Runtime System. If you assume that we talk about the Ericsson implementation unless it is clearly stated that we are talking about a general principle you will probably be right.
In [P-Running] of this book we will look at how to tune the runtime system for your application and how to profile and debug your application and the runtime system. In order to really know how to tune the system you also need to know the system. In [P-ERTS] of this book you will get a deep understanding of how the runtime system works.
The following chapters of [P-ERTS] will go over each component of the system by itself. You should be able to read any one of these chapters without having a full understanding of how the other components are implemented, but you will need a basic understanding of what each component is. The rest of this introductory chapter should give you enough basic understanding and vocabulary to be able to jump between the rest of the chapters in part one in any order you like.
However, if you have the time, read the book in order the first time. Words that are specific to Erlang and ERTS or used in a specific way in this book are usually explained at their first occurrence. Then, when you know the vocabulary, you can come back and use Part I as a reference whenever you have a problem with a particular component.
In this there is a basic overview of the main components of ERTS and some vocabulary needed to understand the more detailed descriptions of each component in the following chapters.
When you start an Elixir or Erlang application or system, what you really start is an Erlang node. The node runs the Erlang RunTime System and the virtual machine BEAM. (or possibly another implementation of Erlang (see Other Erlang Implementations)).
Your application code will run in an Erlang node, and all the layers of the node will affect the performance of your application. We will look at the stack of layers that makes up a node. This will help you understand your options for running your system in different environments.
In OO terminology one could say that an Erlang node is an object of the Erlang Runtime System class. The equivalent in the Java world is a JVM instance.
All execution of Elixir/Erlang code is done within a node. An Erlang node runs in one OS process, and you can have several Erlang nodes running on one machine.
To be completely correct according to the Erlang OTP documentation a
node is actually an executing runtime system that has been given a
name. That is, if you start Elixir without giving a name through one
of the command line switches --name NAME@HOST
or --sname NAME
(or
-name
and -sname
for an Erlang runtime.) you will have a runtime
but not a node. In such a system the function Node.alive?
(or in Erlang is_alive()
) returns false.
$ iex Erlang/OTP 19 [erts-8.1] [source-0567896] [64-bit] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false] Interactive Elixir (1.4.0) - press Ctrl+C to exit (type h() ENTER for help) iex(1)> Node.alive? false iex(2)>
The runtime system itself is not that strict in its use
of the terminology. You can ask for the name of the node even
if you didn’t give it a name. In Elixir you use the function
Node.list
with the argument :this
, and in Erlang you
call nodes(this).
:
iex(2)> Node.list :this [:nonode@nohost] iex(3)>
In this book we will use the term node for any running instance of the runtime whether it is given a name or not.
Your program (application) will run in one or more nodes, and the performance of your program will depend not only on your application code but also on all the layers below your code in the ERTS stack. In ERTS Stack you can see the ERTS Stack illustrated with two Erlang nodes running on one machine.
Node1 Node2 +------+ +------+ | APP | | APP | +------+ +------+ | OTP | | OTP | +------+ +------+ | BEAM | | BEAM | +------+ +------+ | ERTS | | ERTS | +------+--+------+ | OS | +----------------+ | HW or VM | +----------------+
If you are using Elixir there is yet another layer to the stack.
Node1 Node2 +------+ +------+ | APP | | APP | +------+ +------+ |Elixir| |Elixir| +------+ +------+ | OTP | | OTP | +------+ +------+ | BEAM | | BEAM | +------+ +------+ | ERTS | | ERTS | +------+ +------+ +----------------+ | OS | +----------------+ | HW or VM | +----------------+
Let’s look at each layer of the stack and see how you can tune them to your application’s need.
At the bottom of the stack there is the hardware you are running on. The easiest way to improve the performance of your app is probably to run it on better hardware. You might need to start exploring higher levels of the stack if economical or physical constraints or environmental concerns won’t let you upgrade your hardware.
The two most important choices for your hardware is whether it is multicore and whether it is 32-bit or 64-bit. You need different builds of ERTS depending on whether you want to use multicore or not and whether you want to use 32-bit or 64-bit.
The second layer in the stack is the OS level. ERTS runs on most versions of Windows and most POSIX "compliant" operating systems, including Linux, VxWorks, FreeBSD, Solaris, and Mac OS X. Today most of the development of ERTS is done on Linux and OS X, and you can expect the best performance on these platforms. However, Ericsson has been using Solaris internally in many projects and ERTS has been tuned for Solaris for many years. Depending on your use case you might actually get the best performance on a Solaris system. The OS choice is usually not based on performance requirements, but is restricted by other factors. If you are building an embedded application you might be restricted to Raspbian or VxWork, and if you for some reason are building an end user or client application you might have to use Windows. The Windows port of ERTS has so far not had the highest priority and might not be the best choice from a performance or maintenance perspective. If you want to use a 64-bit ERTS you of course need to have both a 64-bit machine and a 64-bit OS. We will not cover many OS specific questions in this book, and most examples will be assuming that you run on Linux.
The third layer in the stack is the Erlang Runtime System. In our case this will be ERTS. This and the fourth layer, the Erlang Virtual Machine (BEAM), is what this book is all about.
The fifth layer, OTP, supplies the Erlang standard libraries. OTP
originally stood for "Open Telecom Platform" and was a number of
Erlang libraries supplying building blocks (such as supervisor
,
gen_server
and gen_tcp
) for building robust applications (such as
telephony exchanges). Early on, the libraries and the meaning of OTP
got intermingled with all the other standard libraries shipped with
ERTS. Nowadays most people use OTP together with Erlang in
"Erlang/OTP" as the name for ERTS and all Erlang libraries shipped by
Ericsson. Knowing these standard libraries and how and when to use
them can greatly improve the performance of your application. This
book will not go into any details of the standard libraries and
OTP, there are many other books that cover these aspects.
If you are running an Elixir program the sixth layer provides the Elixir environment and the Elixir libraries.
Finally, the seventh layer (APP) is your application, and any third party libraries you use. The application can use all the functionality provided by the underlying layers. Apart from upgrading your hardware this is probably the place where you most easily can improve your application’s performance. In [CH-Tracing] there are some hints and some tools that can help you profile and optimize your application. In [CH-Debugging] we will look at how to find the cause of crashing applications and how to find bugs in your application.
For information on how to build and run an Erlang node see [AP-BuildingERTS], and read the rest of the book to learn all about the components of an Erlang node.
One of the key insights by the Erlang language designers was that in order to build a system that works 24/7 you need to be able to handle hardware failure. Therefore you need to distribute your system over at least two physical machines. You do this by starting a node on each machine and then you can connect the nodes to each other so that processes can communicate with each other across the nodes just as if they were running in the same node.
Node1 Node2 Node3 Node4 +------+ +------+ +------+ +------+ | APP | | APP | | APP | | APP | +------+ +------+ +------+ +------+ |Elixir| |Elixir| |Elixir| |Elixir| +------+ +------+ +------+ +------+ | OTP | | OTP | | OTP | | OTP | +------+ +------+ +------+ +------+ | BEAM | | BEAM | | BEAM | | BEAM | +------+ +------+ +------+ +------+ | ERTS | | ERTS | | ERTS | | ERTS | +------+ +------+ +------+ +------+ +----------------+ +----------------+ | OS | | OS | +----------------+ +----------------+ | HW or VM | | HW or VM | +----------------+ +----------------+ +-------------------------------------+ | Network | +-------------------------------------+
The Erlang Compiler is responsible for compiling Erlang source code, from .erl files into virtual machine code for BEAM (the virtual machine). The compiler itself is written in Erlang and compiled by itself to BEAM code and usually available in a running Erlang node. To bootstrap the runtime system there are a number of precompiled BEAM files, including the compiler, in the bootstrap directory.
For more information about the compiler see [CH-Compiler].
BEAM is the Erlang virtual machine used for executing Erlang code, just like the JVM is used for executing Java code. BEAM runs in an Erlang Node.
BEAM: The name BEAM originally stood for Bogdan’s Erlang Abstract Machine, but nowadays most people refer to it as Björn’s Erlang Abstract Machine, after the current maintainer.
Just as ERTS is an implementation of a more general concept of a Erlang Runtime System so is BEAM an implementation of a more general Erlang Virtual Machine (EVM). There is no definition of what constitutes an EVM but BEAM actually has two levels of instructions Generic Instructions and Specific Instructions. The generic instruction set could be seen as a blueprint for an EVM.
For a full description of BEAM see [CH-BEAM], [CH-beam_modules] and [CH-Instructions].
An Erlang process basically works like an OS process. Each process has its own memory (a mailbox, a heap and a stack) and a process control block (PCB) with information about the process.
All Erlang code execution is done within the context of a process. One Erlang node can have many processes, which can communicate through message passing and signals. Erlang processes can also communicate with processes on other Erlang nodes as long as the nodes are connected.
To learn more about processes and the PCB see [CH-Processes].
The Scheduler is responsible for choosing the Erlang process to execute. Basically the scheduler keeps two queues, a ready queue of processes ready to run, and a waiting queue of processes waiting to receive a message. When a process in the waiting queue receives a message or gets a time out it is moved to the ready queue.
The scheduler picks the first process from the ready queue and hands it to BEAM for execution of one time slice. BEAM preempts the running process when the time slice is used up and adds the process to the end of the ready queue. If the process is blocked in a receive before the time slice is used up, it gets added to the waiting queue instead.
Erlang is concurrent by nature, that is, each process is conceptually running at the same time as all other processes, but in reality there is just one process running in the VM. On a multicore machine Erlang actually runs more than one scheduler, usually one per physical core, each having their own queues. This way Erlang achieves true parallelism. To utilize more than one core ERTS has to be built (see [AP-BuildingERTS]) in SMP mode. SMP stands for Symmetric MultiProcessing, that is, the ability to execute a processes on any one of multiple CPUs.
In reality the picture is more complicated with priorities among processes and the waiting queue is implemented through a timing wheel. All this and more is described in detail in [CH-Scheduling].
Erlang is a dynamically typed language, and the runtime system needs a way to keep track of the type of each data object. This is done with a tagging scheme. Each data object or pointer to a data object also has a tag with information about the data type of the object.
Basically some bits of a pointer are reserved for the tag, and the emulator can then determine the type of the object by looking at the bit pattern of the tag.
These tags are used for pattern matching and for type test and for primitive operations as well as by the garbage collector.
The complete tagging scheme is described in [CH-TypeSystem].
Erlang uses automatic memory management and the programmer does not have to worry about memory allocation and deallocation. Each process has a heap and a stack which both can grow, and shrink, as needed.
When a process runs out of heap space, the VM will first try to reclaim free heap space through garbage collection. The garbage collector will then go through the process stack and heap and copy live data to a new heap while throwing away all the data that is dead. If there still isn’t enough heap space, a new larger heap will be allocated and the live data is moved there.
The details of the current generational copying garbage collector, including the handling of reference counted binaries can be found in [CH-Memory].
In a system which uses HiPE compiled native code, each process actually has two stacks, a BEAM stack and a native stack, the details can be found in [CH-Native].
When you start an Erlang node with erl you get a command prompt. This is the Erlang read eval print loop (REPL) or the command line interface (CLI) or simply the Erlang shell.
You can actually type in Erlang code and execute it directly from the shell. In this case the code is not compiled to BEAM code and executed by the BEAM, instead the code is parsed and interpreted by the Erlang interpreter. In general the interpreted code behaves exactly as compiled code, but there are a few subtle differences, these differences and all other aspects of the shell are explained in [CH-Ops].
This book is mainly concerned with the "standard" Erlang implementation by Ericsson/OTP called ERTS, but there are a few other implementations available and in this section we will look at some of them briefly.
Erlang on Xen (https://github.com/cloudozer/ling) is an Erlang implementation running directly on server hardware with no OS layer in between, only a thin Xen client.
Ling, the virtual machine of Erlang on Xen is almost 100% binary compatible with BEAM. In xref:the_eox_stack you can see how the Erlang on Xen implementation of the Erlang Solution Stack differs from the ERTS Stack. The thing to note here is that there is no operating system in the Erlang on Xen stack.
Since Ling implements the generic instruction set of BEAM, it can reuse the BEAM compiler from the OTP layer to compile Erlang to Ling.
Node1 Node2 Node2 Node3 +------+ +------+ +------+ +------+ | APP | | APP | | APP | | APP | +------+ +------+ +------+ +------+ | OTP | | OTP | | OTP | | OTP | +------+ +------+ +------+ +------+ | Ling | | Ling | | BEAM | | BEAM | +------+ +------+ +------+ +------+ | EoX | | EoX | | ERTS | | ERTS | +------+--+------+ +------+--+------+ | XEN | | OS | +----------------+ +----------------+ | HW | | HW or VM | +----------------+ +----------------+
Erjang (https://github.com/trifork/erjang) is an Erlang implementation which runs on the JVM. It loads .beam files and recompiles the code to Java .class files. Erjang is almost 100% binary compatible with (generic) BEAM.
In xref:the_erjang_stack you can see how the Erjang implementation of the Erlang Solution Stack differs from the ERTS Stack. The thing to note here is that JVM has replaced BEAM as the virtual machine and that Erjang provides the services of ERTS by implementing them in Java on top of the VM.
Node1 Node2 Node3 Node4 +------+ +------+ +------+ +------+ | APP | | APP | | APP | | APP | +------+ +------+ +------+ +------+ | OTP | | OTP | | OTP | | OTP | +------+ +------+ +------+ +------+ |Erjang| |Erjang| | BEAM | | BEAM | +------+ +------+ +------+ +------+ | JVM | | JVM | | ERTS | | ERTS | +------+--+------+ +------+--+------+ | OS | | OS | +----------------+ +----------------+ | HW or VM | | HW or VM | +----------------+ +----------------+
Now that you have a basic understanding of all the major pieces of ERTS, and the necessary vocabulary you can dive into the details of each component. If you are eager to understand a certain component, you can jump directly to that chapter. Or if you are really eager to find a solution to a specific problem you could jump to the right chapter in [P-Running], and try the different methods to tune, tweak, or debug your system.