By default, all files created inside a container are stored on a writable container layer. This means that the data doesn't persist when that container no longer exists.
Docker has two options for containers to store files on the host machine, so that the files are persisted even after the container stops: volumes, and bind mounts.
Bind mounts provide a way to mount a directory or file from the host machine into a container. Bind mounts directly map a directory or file on the host machine to a directory in the container.
Let's take the old good nginx
image:
docker run --rm -p 8080:80 -v /path/on/host:/usr/share/nginx/html --name my-nginx nginx:latest
In this example, we're running an Nginx container while the -v /path/on/host:/usr/share/nginx/html
flag specifies the bind mount.
It maps the directory /path/on/host
on the host machine to the /usr/share/nginx/html
directory inside the container.
Whenever the my-nginx
container is accessing /usr/share/nginx/html
path, the directory it actually sees is /path/on/host
on the host machine.
More than that, every change the container does to the /usr/share/nginx/html
directory will reflect in the corresponding location on the host machine, /path/on/host
, due to the bind mount.
The other way also applies, changes made on the host machine in /path/on/host
will be visible inside the container under /usr/share/nginx/html
.
As you may know, /usr/share/nginx/html
is the location from which Nginx serves static content files.
In that way, files you'll place in /path/on/host
on your host machine, will be served by the running nginx container, without the need to copy those files to the container during build time.
Bind mounts are commonly used for development workflows, where file changes on the host are immediately reflected in the container without the need to rebuild or restart the container. They also allow for easy access to files on the host machine, making it convenient to provide configuration files, logs, or other resources to the container.
- In you host machine, create a directory and put some sample
index.html
file in it (search for sample file in the internet).- Run an nginx container while mapping the path
/usr/share/nginx/html
within the container, to your created directory.- Open up your browser and visit the nginx webserver:
http://localhost:8080
. Make sure the nginx serves your file.- On your host machine, delete
index.hmtl
.- Refresh the server and make sure the nginx is responding with
404
page not found error.
Docker volumes is another way to persist data in containers. While bind mounts are dependent on the directory structure and OS of the host machine, volumes are logical space that completely managed by Docker. Volumes offer a higher level of abstraction, allow us to work with different kind of storages, e.g. volumes stored on remote hosts or cloud providers. Volumes can be shared among multiple containers.
Unlike a bind mount, you can create and manage volumes outside the scope of any container.
docker volume create my-vol
Let's inspect the created volume:
$ docker volume inspect my-vol
[
{
"CreatedAt": "2023-05-10T14:28:15+03:00",
"Driver": "local",
"Labels": {},
"Mountpoint": "/var/lib/docker/volumes/my-vol/_data",
"Name": "my-vol",
"Options": {},
"Scope": "local"
}
]
Inspecting the volume discovers the real location that the data will be stored on the host machine: /var/lib/docker/volumes/my-vol/_data
.
The local
volume driver is the default built-in driver that stores data on the local host machine.
But docker offers many more drivers that allow you to create different types of volumes that can be mapped to your container.
For example, the Azure File Storage plugin lets you mount Microsoft Azure File Storage shares to Docker containers as volumes.
The following example mounts the volume my-vol
into /usr/share/nginx/html
in the container.
docker run --rm --name my-nginx -v my-vol:/usr/share/nginx/html nginx:latest
Essentially we've got the same effect as the above example, but now the mounted volume is not just a directory in the OS, but logically managed by docker.
Can you see how elegant is it? volumes provide a seamless abstraction layer for persisting data within containers.
The container believe it reads and writes data from some location in his file system (/usr/share/nginx/html
), without any knowledge of the underlying storage details (of the my-vol
volume).
If we would have used the Azure File Storage plugin, the data would be actually stored in the cloud.
Remove a volume:
docker volume rm my-vol
- A given volume can be mounted into multiple containers simultaneously.
- When no running container is using a volume, the volume is still available to Docker and is not removed automatically.
- If you mount an empty volume into a directory in the container in which files or directories exist, these files or directories are propagated (copied) into the volume.
- Similarly, if you start a container and specify a volume which does not already exist, an empty volume is created for you. This is a good way to pre-populate data that another container needs.
- If you mount a bind mount or non-empty volume into a directory in the container in which some files or directories exist, these files or directories are obscured by the mount.
Sometimes, the container only needs read access to the data, is such way multiple containers can safely mount the same volume without write race condition. You can simultaneously mount a single volume as read-write for some containers and as read-only for others.
The following example mounts the directory as a read-only volume, by adding ro
to the -v
flag:
docker run -d \
--name=nginxtest \
-v nginx-vol:/usr/share/nginx/html:ro \
nginx:latest
Other possible options are:
ro
(Read-Only): Mounts the volume in read-only mode, allowing only read operations.rw
(Read-Write): Mounts the volume in read-write mode, allowing both read and write operations (the default).Z
(Shared): Marks the volume as shared, allowing it to be safely shared between multiple containers.nocopy
(No Copy): Indicates that the volume should not be copied from the container image but instead be created as an empty volume.delegated
(Delegated Copy): Specifies that the volume should be created as an empty volume but be populated with data from the container image on-demand.
Where multiple options are present, you can separate them using commas.
Volumes and bind mounts let you share files between the host machine and container so that you can persist data even after the container is stopped.
As opposed to volumes and bind mounts, a tmpfs mount is temporary file system, and only persisted in the host memory (RAM). When the container stops, the tmpfs mount is removed, and files written there won't be persisted.
This is useful to temporarily store sensitive files that you don't want to persist in either the host or the container writable layer.
To use a tmpfs mount in a container, use the --tmpfs
flag.
There is no source for tmpfs mounts. The following example creates a tmpfs mount at /ap
p in a Nginx container.
docker run -d \
-it \
--name tmptest \
--tmpfs /app \
nginx:latest
No matter which type of mount you choose to use, the data looks the same from within the container. It is exposed as either a directory or an individual file in the container's filesystem.
An easy way to visualize the difference among volumes, bind mounts, and tmpfs mounts is to think about where the data lives on the host machine.
-
Volumes are stored in a part of the host filesystem which is managed by Docker (
/var/lib/docker/volumes/
on Linux). Non-Docker processes should not modify this part of the filesystem. -
Bind mounts may be stored anywhere on the host system. They may even be important system files or directories. Non-Docker processes on the host machine or a Docker container can modify them at any time.
-
tmpfs
mounts are stored in the host system's memory only, and are never written to the host system's filesystem.
Enter the interactive self-check page
When running a container, the default user inside the container is often set to the root
user, and this user has full control of the container's processes.
Since containers are isolated process in general, we don't really care that root
is the user operating within the container.
But, when mounting directories from the host machine using the -v
command, it is important to be cautious when using the root user in a container. Why?
We will investigate this case in this exercise...
- On your host machine, create a directory under
~/test_docker
. - Run the
ubuntu
container while mounting/test
within the container, into~/test_docker
in the host machine:
docker run -it -v ~/test_docker:/test ubuntu /bin/bash
- From your host machine, create a file within
~/test_docker
directory. - From the
ubuntu
container, list the mounted directory (/test
), can you see the file you've created from your host machine? Who are the UID (user ID) and GID (group ID) owning the file? - From within the container, create a file within
/test
. - List
~/test_docker
from your host machine. Who are user and group owning the file created from the container? - Try to indicate the potential vulnerability: "If an attacker gains control of the container, they may be able to..."
- Repeat the above scenario, but instead of using
-v
, use docker volumes. Starts by creating a new volume by:docker create volume testvol
. Describe how using docker managed volume can reduce the potential risk.
Your goal is to run MongoDB container persistently. Running a Mongodb container persistently means that data stored in the container will be preserved even if the container is stopped or restarted.
You need to store the data in a location outside the container, using either bind-mounts or volumes, to your choice.
Here is general guidelines:
- Review the Mongo official image in DockerHub. Specifically, read the Start a mongo server instance section to get an idea of how to run the Mongo container. Then read the Where to Store Data section to see where Mongo stores its data, and think how to persist the data stored in this path.
- Run the container with the relevant environment variables and volumes mounting.
- Watch the container logs, make sure the database is up and running (search for the
ready for connections
log). - Exec to your running container by:
docker exec -it <container-id-or-name> /bin/bash
: - Within the container, start a Mongo shell session by:
mongosh
. Themongosh
terminal session is used for interacting with a Mongo database server from the command line. - Switch to work against some experimental database:
use mydatabase
- Write some data by:
db.myCollection.insertOne({ name: "John Doe", age: 30 })
. - Exit the terminal.
- Kill the container:
docker kill <container-id-or-name>
. - Start a new Mongo container, with the same volume mapping.
- Exec to your new running container by:
docker exec -it <container-id-or-name> /bin/bash
: - Within the container, start a Mongo shell:
mongosh
. - Switch to work against your previous db:
use mydatabase
. - Verify that the data was successfully survived container kill, by:
db.myCollection.find()
. - You should see the inserted data.
Use the ubuntu
container to experiment with volume mounting and answer the following questions:
- What happens when you mount an empty docker volume into a directory in the container in which files or directories exist?
- What happens when you start a container and specify a docker volume name that does not exist?
- What happens when you mount a path using bind mount of non-empty directory in the container?
- What happens when you start a container and specify paths (both in the host machine and the container) that does not exist?