Docker: Best practices

What to aim for

Docker builds should:

Be reliable
Be reproducible
Minimise image size
Minimise image build time
Minimise image complexity

The recommendations below are intended to help developers apply these principles when deploying new services.

Preparing your code

Make it modular

Writing modular code is always good practice, but in 'dockerizing' a service it becomes a necessity. You code should be written to perform exactly one of the WA service types. For example, if your service retrieves data from the knowledge graph, derives other data from it and then plots them on a map, the visualisation component will need to be extracted, and deployed using a separate Dockerfile.

Version control all dependencies

If your image has dependencies (e.g. model binaries) that will be updated periodically, they should be under version control and readily accessible to WA developers. One way to achieve this is to use Maven, which allows you to package dependencies into an artifact with metadata, including a version number. The artifact can then be deployed to the WA Maven repository (https://maven.pkg.github.com/cambridge-cares/TheWorldAvatar/), so that others can use it to build your service. See the examples of deploying and using Maven dependencies for more details.

Creating a Docker service

Directory structure

To promote consistency across different WA services, and to make it easier to test containers via an IDE, the following directory structure is recommended for any services that need to be built from source:

<service dir>/
    <source dir>                        : directory containing source code, named according to language conventions
    <other language-specific files>     : e.g. "requirements.txt" and "setup.py" for a Python app, or "pom.xml" for a Java servlet
    docker                              : directory containing any files used exclusively for the Docker build
    Dockerfile
    docker-compose.yml                  : an optional docker-compose configuration file to facilitate development and testing

Choose the right base image

Avoid OS-level images - It rarely makes sense to start from an OS-level (e.g. ubuntu:20.04) base image and then install all of the other software you need on top. Instead, try and choose a base that already includes the primary piece of software that your service uses, then install anything else you require.
Use official images - Well-establised tools and services (e.g. Python, Tomcat, Maven) have official images on Docker Hub which are usually well maintained and supported - these should be preferred over unofficial versions or forks wherever possible.
Use lightweight image versions - Official images very often have lightweight versions (tags) that are as compact as possible but include all of the core functionality - these should be preferred unless there's a good reason not to. For example, if you're building a Python agent, use something like python:3.8-slim-buster rather than python:3.8 . You can also use images based on the super-lightweight Alpine linux distribution. This cuts image size even further, but there are some complications to be aware of.
Comparison of official Python images sizes:

Tag Size / MB

3.8 332

3.8-slim-buster 42

3.8-alpine3.12 16
Don't use 'latest' image tags - Most images have a 'latest' tag. These are convenient, but can produce unexpected behaviour after an update; use a fixed version tag instead.

Order Docker directives to take advantage of caching

The general principle here is install prequisites and setup the environment first, copy in source code last.

Since your source code will change frequently during development, copying it into the image as one of the first commands in your Dockerfile will force Docker to regenerate all subsequent layers each time you build. If you instead add it as late as possible, Docker can cache all the previous layers, greatly speeding up the build. For example:

Bad - Docker reinstalls requirements every time the source code changes:

FROM python:3.9.5-slim-buster
WORKDIR /app
COPY . .
RUN pip install -r requirements.txt
ENTRYPOINT["python","my_app.py"]

Good - Docker caches installed requirements when the code changes, speeding up subsequent builds:

FROM python:3.9.5-slim-buster
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
ENTRYPOINT["python","my_app.py"]

Use multi-stage builds

Docker's multi-stage builds are an incredibly useful tool. Two of the main applications are:

to reduce image size by restricting build-time-only files to an earlier, intermediate Docker stage
to avoid adding build-time credentials and other secrets to final images

The Dockerfile for the Java agent example illustrates both of these applications. The first stage builds a Java WAR file, copying in the source code and some Maven credentials in the process. The second stage discards everything from the first stage other than the WAR file, meaning that the credentials and source code are not included in the final image.

Caution - Even with multi-stage builds, it remains possible to push intermediate stages to registries/other machines, though it's not easy to do by accident.

Other recommendations

Use "WORKDIR" in preference to "RUN mkdir ..." - it creates the directory and cd's to it using a single directive
Name all volumes, even if they're not used - this helps tell other developers what the volume should contain, particularly when running multiple containers.
Remove debugging statements to reduce the layer count - layers like 'RUN echo "Build complete"' don't usually add much, since the build will report the execution of each Docker directive anyway.
Set up basic startup tests (Specifications TBD) - being able to run tests to check that a container has started and is providing basic functionality is extremely useful when deploying large systems/stacks of containers.
Add "ENV DISPLAY=host.docker.internal:0.0" to the Dockerfile if you need to run a graphical app e.g. showing a plot. This requires the VcXsrv Windows server to be running (Can be downloaded here.) It should be noted that this solution will only work on Windows with Docker Desktop, and currently there is no universal solution that works on all platform.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly