One guiding principle behind the design of the runtime system is that bugs are more or less inevitable. Even if through an enormous effort you manage to build a bug free application you will soon learn that the world or your user changes and your application will need to be "fixed."
The Erlang runtime system is designed to facilitate change and to minimize the impact of bugs.
The impact of bugs is minimized by compartmentalization. This is done from the lowest level where each data structure is separate and immutable to the highest level where running systems are dived into separate nodes. Change is facilitated by making it easy to upgrade code and interacting and examining a running system.
We will look at many different ways to monitor and maintain a running system. There are many tools and techniques available but we must not forget the most basic tool, the shell and the ability to connect a shell to node.
In order to connect two nodes they need to share or
know a secret pass phrase, called a cookie. As long
as you are running both nodes on the same machine
and the same user starts them they will automatically
share the cookie (in the file $HOME/.erlang.cookie
).
We can see this in action by starting two nodes, one Erlang node
and one Elixir node. First we start an Erlang node called
node1
.
$ erl -sname node1
Erlang/OTP 19 [erts-8.1] [source-0567896] [64-bit] [smp:4:4]
[async-threads:10] [hipe] [kernel-poll:false]
Eshell V8.1 (abort with ^G)
(node1@GDC08)1> nodes().
[]
(node1@GDC08)2>
Then we start an Elixir node called node2
:
$ iex --sname node2
Erlang/OTP 19 [erts-8.1] [source-0567896] [64-bit] [smp:4:4]
[async-threads:10] [hipe] [kernel-poll:false]
Interactive Elixir (1.4.0) - press Ctrl+C to exit (type h() ENTER for help)
iex(node2@GDC08)1>
In Elixir we can connect the nodes by running the command Node.connect
name. In Erlang you do this with net_kernel:connect(Name)
.
The node connection is bidirectional so you only
need to run the command on one of the nodes.
iex(node2@GDC08)1> Node.connect :node1@GDC08 true iex(node2@GDC08)2>
In the distributed case this is somewhat more complicated since we
need to make sure that all nodes know or share the cookie. This can
be done in three ways. You can set the cookie used when talking to a
specific node, you can set the same cookie for all systems at start up
with the -set_cookie
parameter, or you can copy the file
.erlang.cookie
to the home directory of the user running the system
on each machine.
The last alternative, to have the same cookie in the cookie file of each machine in the system is usually the best option since it makes it easy to connect to the nodes from a local OS shell. Just set up some secure way of logging in to the machine either through VPN or ssh. In the next section we will see how to then connect a shell to a running node.
Using the second option it might look like this:
happi@GDC08:~$ cat ~/.erlang.cookie
pepparkaka
happi@GDC08:~$ ssh gds01
happi@gds01:~$ erl -sname node3 -setcookie pepparkaka
Erlang/OTP 18 [erts-7.3] [source-d2a6d81] [64-bit] [smp:8:8]
[async-threads:10] [hipe] [kernel-poll:false]
Eshell V7.3 (abort with ^G)
(node3@gds01)1> net_kernel:connect('node1@GDC08').
true
(node3@gds01)2> nodes().
[node1@GDC08,node2@GDC08]
(node3@gds01)3>
Note
|
A Potential Problem with Different Cookies
Note that the default for the Erlang distribution is to create a
fully connected network. That is, all nodes are connected to all
other nodes in the network. In the example, once node3 connects to
node1 it also is connected to node2.
If each node has its own cookie you will have to tell each node the
cookies of each other node before you try to connect them. You can
start up a node with the flag -connect_all false in order to
prevent the system from trying to make a fully connected network.
Alternatively, you can start a node as hidden with the flag
-hidden , which makes node connections to that node non transitive.
|
Now that we know how to connect nodes, even on different machines, to each other, we can look at how to connect a shell to a node.
The Elixir and the Erlang shells works much the same way as a shell or a terminal window on your computer, except that they give you a terminal window directly into your runtime system. This gives you an extremely powerful tool, a basically CLI with full access to the runtime. This is fantastic for operation and maintenance.
In this section we will look at different ways of connecting to a node through the shell and some of the shell’s perhaps less known but more powerful features.
Both the Elixir shell and the Erlang shell can be configured to provide you with shortcuts for functions that you often use.
The Elixir shell will look for the file .iex.exs
first in
the local directory and then in the users home directory.
The code in this file is executed in the shell process
and all variable bindings will be available in the shell.
In this file you can configure aspects such as the syntax coloring and the size of the history. [See hexdocs for a full documentation.](https://hexdocs.pm/iex/IEx.html#module-the-iex-exs-file)
You can also execute arbitrary code in the shell context.
When the Erlang runtime system starts, it first interprets the
code in the Erlang configuration file. The default location
of this file is in the users home directory ~/.erlang
.
This file is usually used to load the user default settings for the shell by adding the line
code:load_abs("/home/happi/.config/erlang/user_default").
Replace "/home/happi/.config/erlang/" with the absolute path you want to use.
If you call a local function from the shell it will try to
call this function first in the module user_default
and
then in the module shell_default
(located in stdlib
).
This is how command such as ls()
and help()
are implemented.
When running a production system you will want to start the nodes in
daemon mode through run_erl
. We will go through how to start a node
and some of the best practices for deployment and running in
production in [xxx](#ch.live). Fortunately, even when you have started
a system in daemon mode, without a shell, you can connect a shell to
the system. There are actually several ways to do that. Most of these
methods rely on the normal distribution mechnaisms and hence require
that you have the same Erlang cookie on both machines as described
in the previous section.
The easiest and probably the most common way to connect to an Erlang
node is by starting a named node that connects to the system node
through a remote shell. This is done with the erl
command line flag
-remsh Name
. Note that you need to start a named node in order to
be able to connect to another node, so you also need the -name
or
-sname
flag. Also, note that these are arguments to the Erlang
runtime so if you are starting an Elixir shell you need to add an
extra -
to the flags, like this:
$ iex --sname node4 --remsh node2@GDC08
Erlang/OTP 19 [erts-8.1] [source-0567896] [64-bit] [smp:4:4]
[async-threads:10] [hipe] [kernel-poll:false]
Interactive Elixir (1.4.0) - press Ctrl+C to exit (type h() ENTER for help)
iex(node2@GDC08)1>
Another thing to note here is that in order to start a remote Elixir shell you need to have IEx running on that node. There is no problem to connect Elixr and Erlang nodes to each other as we saw in the previous section, but you need to have the code of the shell you want to run loaded on the node you connect to.
It is also worth noting that there is no security built into either
the normal Erlang distribution or to the remote shell implementation.
You do not want to have your system node exposed to the internet and
you do not want to connect from your local machine to a node. The safest
way is probably to have a VPN tunnel to your live environment and use ssh
to connect a machine running one of your live nodes. Then you can connect
to one of the nodes using remsh
.
It is important to understand that there are actually two nodes
involved when you start a remote shell. The local node, named node4
in the previous example and the remote node node2
. These nodes can
be on the same machine or on different machines. The local node is
always running on the machine on which you gave the iex
or erl
command. On the local node there is a process running the tty
program which interacts with the terminal window. The actual shell
process runs on the remote node. This means, first of all, that the
code for the shell you want to run (i.e. iex or the Erlang shell) has
to exist at the remote node. It also means that code is executed on
the remote node. And it also means that any shell default settings are
taken from the settings of the remote machine.
Imagine that we have the following .erlang
file in our home directory
on the machine GDC08.
And the <filename>user_default.erl</filename> file looks like this:
Then we create two directories ~/example/dir1
and ~/example/dir2
and we put two different .iex.exs
files in those directories.
Now if we start four different nodes from these directories we will see how the shell configurations are loaded.
GDC08:~/example/dir1$ iex --sname node1
Erlang/OTP 19 [erts-8.1] [source-0567896] [64-bit]
[smp:4:4] [async-threads:10] [hipe] [kernel-poll:false]
ERTS is starting in /home/happi/example/dir1
on [node1@GDC08]
Interactive Elixir (1.4.0) - press Ctrl+C to exit (type h() ENTER for help)
iEx starting in
/home/happi/example/dir1
iEx starting on
node1@GDC08
(node1@GDC08)iex<d1>
GDC08:~/example/dir2$ iex --sname node2
Erlang/OTP 19 [erts-8.1] [source-0567896] [64-bit]
[smp:4:4] [async-threads:10] [hipe] [kernel-poll:false]
ERTS is starting in /home/happi/example/dir2
on [node2@GDC08]
Interactive Elixir (1.4.0) - press Ctrl+C to exit (type h() ENTER for help)
iEx starting in
/home/happi/example/dir2
iEx starting on
node2@GDC08
(node2@GDC08)iex<d2>
GDC08:~/example/dir1$ iex --sname node3 --remsh node2@GDC08
Erlang/OTP 19 [erts-8.1] [source-0567896] [64-bit] [smp:4:4]
[async-threads:10] [hipe] [kernel-poll:false]
ERTS is starting in /home/happi/example/dir1
on [node3@GDC08]
Interactive Elixir (1.4.0) - press Ctrl+C to exit (type h() ENTER for help)
iEx starting in
/home/happi/example/dir2
iEx starting on
node2@GDC08
(node2@GDC08)iex<d2>
GDC08:~/example/dir2$ erl -sname node4
Erlang/OTP 19 [erts-8.1] [source-0567896] [64-bit] [smp:4:4]
[async-threads:10] [hipe] [kernel-poll:false]
ERTS is starting in /home/happi/example/dir2
on [node4@GDC08]
Eshell V8.1 (abort with ^G)
(node4@GDC08)1> tt().
test
(node4@GDC08)2>
The shell configuration is loaded from the node running the shell, as you can see from the previous examples. If we were to connect to a node on a different machine, these configurations would not be present.
You can actually change which node and shell you are connected to by going into job control mode.
By pressing control+G (ctrl-G) you enter the job control mode (JCL). You are then greeted by another prompt:
User switch command -->
By typing h
(followed by enter)
you get a help text with the available commands in JCL:
c [nn] - connect to job i [nn] - interrupt job k [nn] - kill job j - list all jobs s [shell] - start local shell r [node [shell]] - start remote shell q - quit erlang ? | h - this message
The interesting command here is the r
command which
starts a remote shell. You can give it the name of the
shell you want to run, which is needed if you want to start
an Elixir shell, since the default is the standard Erlang shell.
Once you have started a new job (i.e. a new shell)
you need to connect to that job with the c
command.
You can also list all jobs with j
.
(node2@GDC08)iex<d2> User switch command --> r node1@GDC08 'Elixir.IEx' --> c Interactive Elixir (1.4.0) - press Ctrl+C to exit (type h() ENTER for help) iEx starting in /home/happi/example/dir1 iEx starting on node1@GDC08
See the [Erlang Shell manual](http://erlang.org/doc/man/shell.html) for a full description of JCL mode.
You can quit your session by typing ctrl+G q [enter]
. This
shuts down the local node. You do not want to quit with
any of q().
, halt()
, init:stop()
, or System.halt.
All of these will bring down the remote node which seldom
is what you want when you have connected to a live server.
Instead use ctrl+\
, ctrl+c ctrl+c
, ctrl+g q [enter]
or ctrl+c a [enter]
.
If you do not want to use a remote shell, which requires you to have two instances of the Erlang runtime system running, there are actually two other ways to connect to a node. You can also connect either through a Unix pipe or directly through ssh, but both of these methods require that you have prepared the node you want to connect to by starting it in a special way or by starting an ssh server.
By starting the node through the command run_erl
you will
get a named pipe for IO and you can attach a shell to that
pipe without the need to start a whole new node. As we shall
see in the next chapter there are some advantages to using
run_erl
instead of just starting Erlang in daemon mode,
such as not losing standard IO and standard error output.
The run_erl command is only available on Unix-like operating systems that implement pipes. If you start your system with run_erl, something like:
> run_erl -daemon log/erl_pipe log "erl -sname node1"
or
> run_erl -daemon log/iex_pipe log "iex --sname node2"
You can then attach to the system through the named pipe (the first argument to run_erl).
> to_erl dir1/iex_pipe
iex(node2@GDC08)1>
You can exit the shell by sending EOF (ctrl+d
) and leave the system
running in the background. Note that with to_erl
the terminal is
connected directly to the live node so if you exit with ctrl-G q
[enter]
you will bring down that node, probably not what you want.
The last method for connecting to the node is through ssh.
Erlang comes with a built in ssh server which you can start on your node and then connect to directly. The [documentation for the ssh module](http://erlang.org/doc/man/ssh.html) explains all the details. For a quick test all you need is a server key which you can generate with ssh-keygen:
> mkdir ~/ssh-test/
> ssh-keygen -t rsa -f ~/ssh-test/ssh_host_rsa_key
Then you start the ssh daemon on the Erlang node:
gds01> erl
Erlang/OTP 18 [erts-7.3] [source-d2a6d81] [64-bit] [smp:8:8]
[async-threads:10] [hipe] [kernel-poll:false]
Eshell V7.3 (abort with ^G)
1> ssh:start().
{ok,<0.47.0>}
2> ssh:daemon(8021, [{system_dir, "/home/happi/.ssh/ehost/"},
{auth_methods, "password"},
{password, "pwd"}]).
You can now connect from another machine:
happi@GDC08:~> ssh -p 8021 happi@gds01
happi@gds01's password: [pwd]
Eshell V7.3 (abort with ^G)
1>
In a real world setting you would want to set up your server and user ssh keys as described in the documentation. At least you would want to have a better password.
To disconnect from the shell you need to shut down your terminal
window. Using q()
or init:stop()
would bring down the node.
In this shell you do not have access to neither JCL mode (ctrl+g
)
nor the BREAK mode (ctrl+c
).
The break mode is really powerful when developing, profiling and debugging. We will take a look at it next.
When you press ctrl+c
you enter BREAK mode. This is most
often used just to break out of the shell by either tying
a [enter]
for abort or by hitting ctrl+c
once more.
But you can actually use this mode to break in to the
internals of the Erlang runtime system.
When you enter BREAK mode you get a short menu:
BREAK: (a)bort (c)ontinue (p)roc info (i)nfo (l)oaded (v)ersion (k)ill (D)b-tables (d)istribution
Abort exits the node and continue takes you back in to
the shell. Hitting p [enter]
will give you internal
information about all processes in the system. We
will look closer at what this information means
in the next chapter (See [xxx](#ch.processes)).
You can also get information about the memory and
the memory allocators in the node through the
info choice (i [enter]
). In [xxx](#ch.memory)
we will look at how to decipher this information.
You can see all loaded modules and their sizes with
l [enter]
and the system version with v [enter]
,
while k [enter]
will let you step through all processes
and inspect them and kill them. Capital D [enter]
will
show you information about all the ETS tables in the
system and lower case d [enter]
will show you
information about the distribution. That is basically
just the node name.
If you have built your runtime with OPPROF or DEBUG you will be able to get even more information. We will look at how to do this in [AP-BuildingERTS]. The code for the break mode can be found in <filename>[OTP_SOURCE]/erts/emulator/beam/break.c</filename>.
Note that going into break mode freezes the node. This is not something you want to do on a production system. But when debugging or profiling in a test system, this mode can help us find bugs and bottlenecks, as we will see later in this book.