title | author | tags |
---|---|---|
Multi-stage docker build of Haskell webapp |
Oleg Grenrus |
engineering |
Since Docker 17.05, there is support for multi-stage builds.
This is an example and tutorial for using it to build simple Haskell webapp.
The setup is simple: single Dockerfile
, yet the resulting docker image
is only megabytes large.
Essentially, I read through Best practices for writing Dockerfiles
and made a Dockerfile
.
A word of warning: If you think Nix is the tool to use, I'm fine with that.
But there's nothing for you in this tutorial.
This is an opinionated setup: Ubuntu and cabal-install
's nix-style build.
Also all non-Haskell dependencies are assumed to be avaliable for install
through apt-get
. If that's not true in your case, maybe you should
check Nix.
The files are on GitHub: phadej/docker-haskell-example. I refer to files by names, not paste them here.
Assuming you have docker tools installed, there are seven steps to build a docker image:
-
Write your application. Any web-app would do. The assumptions are that the app
- listens at port 8000
- doesn't daemonize itself
I use a minimal servant app:
docker-haskell-example.cabal
andMain.hs
. If you want to learn about servant, its tutorial is a good starting point. -
Write
cabal.project
containing at leastindex-state: 2019-06-17T09:52:09Z with-compiler: ghc-8.4.4 packages: .
- Pinning
index-state
makes builds reproducible enough with-compiler
select the compiler so it's not the defaultghc
- Pinning
-
Add
.dockerignore
. The more stuff you can ignore, the better. Less things to copy to docker build context. Less things would invalidate docker cache. Especially hidden files are not hidden from docker, like editors' temporary files. I hide.git
directory. If you want to burn git-hash look at known issues section. -
Add
Dockerfile
anddocker.cabal.config
.docker.cabal.config
is used inDockerfile
. In most cases you don't need to editDockerfile
. You need, if you need some additional system dependencies. The next step will tell, if you need something. -
Build an image with
docker build --build-arg EXECUTABLE=docker-haskell-example --tag docker-haskell-example:latest .
If it fails, due missing library, see next section. You'll need to edit
Dockerfile
, and iterate until you get a successful build. -
After successful build, you can run the container locally
docker run -ti --publish 8000:8000 docker-haskell-example:latest
This step is important, to test that all runtime dependencies are there.
-
And try it from another terminal
curl -D - localhost:8000
It should respond something like:
HTTP/1.1 200 OK Transfer-Encoding: chunked Date: Thu, 04 Jul 2019 16:15:37 GMT Server: Warp/3.2.27 Content-Type: application/json;charset=utf-8 ["hello","world"]
The Dockerfile
is written with a monorepo setup in mind. In other words
setup, where you could build different docker images from a single repository.
That explains the --build-arg EXECUTABLE=
in a docker build
command.
It's also has some comments explaining why particular steps are done.
There are two stages in the Dockerfile
, builder and deployment.
In builder stage we install all build dependencies,
separating them into different RUN
s, so we could avoid
cache invalidation as much as possible.
A general rule: Often changing things have to installed latter.
We install few dependencies from Ubuntu's package repositories. That list is something you'll need to edit once in a while. The assumption is that all non-Haskell stuff comes from there (or some PPA). There's also a corresponding list in deployment stage, there we install only non-dev versions.
# More dependencies, all the -dev libraries
# - some basic collection of often needed libs
# - also some dev tools
RUN apt-get -yq --no-install-suggests --no-install-recommends install \
build-essential \
ca-certificates \
curl \
git \
libgmp-dev \
liblapack-dev \
liblzma-dev \
libpq-dev \
libyaml-dev \
netbase \
openssh-client \
pkg-config \
zlib1g-dev
At some point we reach a point, where we add *.cabal
file. This is something
you might need to edit as well, if you have multiple cabal files in different
directories.
# Add a .cabal file to build environment
# - it's enough to build dependencies
COPY *.cabal cabal.project /build/
We only add these, so we can build dependencies.
# Build package dependencies first
# - beware of https://github.com/haskell/cabal/issues/6106
RUN cabal v2-build -v1 --dependencies-only all
and their cache
won't be violated by changes in the actual implementation of the webapp.
This is common idiom in Dockerfile
s.
Issue 6106 might
be triggered if you vendor some dependencies. In that case
change the build command to
RUN cabal v2-build -v1 --dependencies-only some-dependencies
listing as many dependencies (e.g. servant
, warp
) as possible.
After dependencies are built, the rest of the source files are added
and the executables are built, stripped, and moved to known location
out of dist-newstyle
guts.
The deployment image is slick. We pay attention and don't install
development dependencies anymore. In other words we install only runtime dependencies.
E.g. we install libgmp10
, not libgmp-dev
.
I also tend to install curl
and some other cli tool to help debugging.
In deployment environments where you can shell into the running containers, it
helps if there's something you can do. That feature is useful to debug network
problems for example.
The resulting image is not the smallest possible, but it's not huge either:
REPOSITORY TAG SIZE
docker-haskell-example latest 137MB
Cold build is slow.
Rebuilds are reasonably fast,
if you don't touch .cabal
or cabal.project
files.
-
If you have
data-files
, situation is tricky: Consider usingfile-embed-lzma
orfile-embed
packages. I.e. avoiddata-files
. -
Cabal issue #6106 may require you to edit
--dependencies-only
build step, as explained above. -
Git Hash into built executable. My approach is to ignore whole
.git
directory, as it might grow quite large. Maybe unignoring (with!
) of.git/HEAD
and.git/refs
(which are relatively small) will makegitrev
and alike work. Please tell me if you try! -
Caching of Haskell dependencies is very rudimentary. It could be improved largely, if
/cabal/store
could be copied out after the build, and in before the build. I don't really know how to that in Docker. Any tips are welcome. For example withdocker run
one could use volumes, but not withdocker build
.
Look at the example repository. I hope this is useful for you.