Persisting data in Docker: Docker volumes

ยท 691 words ยท 4 minute read

Docker images as layers ๐Ÿ”—

Before getting into the details of persisting data in Docker, we need to take a step back and look at images and layers in Docker.

A Docker image is made of a series of layers:

  • each instruction in the Dockerfile is a different layer
  • each new layer will stack on top of the previous layer
  • each new layer is only the difference between this new layer and the previous layers

Creating a new container can be seen as adding a distinct writable layer (also called “container layer”) on top of the image. So any change that you perform on a container (writing new data, deleting data, etc.) will only affect this thin layer that sits on top of your image layers.

When your container gets deleted the writable layer also gets deleted but the image remains unchanged. This also means that multiple containers can actually share the same image and manage their own data independently.

This has the main advantages of limiting the start time and the disk usage of new containers but it also has a few limits:

  • how do you share data between containers?
  • how do you actually persist data beyond the container lifecycle?

This is basically why Docker volumes are there. They also have two positive side effects:

  • they can be optimised for I/O intensive applications
  • they reduce the size of the containers writable layers

This is why they are the recommended way for persisting data within Docker containers.

Bind mounts ๐Ÿ”—

From the early days of Dockers, there was a thing called bind mounts -and there are still some use cases for bind mounts- but you should only consider this option when volumes actually don’t allow you to achieve what you need to do.

If you want to read more about bind mounts, you can find more details here.

Docker volumes ๐Ÿ”—

Let’s get into the actual topic of this blog post!

The main thing about Docker volumes is that they live outside of the container lifecycle. We’ll try out a few commands to prove this.

Mounting a Docker volume into a running container ๐Ÿ”—

# create a Docker volume named test-volume
docker volume create test-volume

# list the Docker volumes
# test-volume should now be within the response
docker volume ls

# inspect the Docker volume test-volume
docker volume inspect test-volume

So we just created a Docker volume without creating any container, confirming that volume can live outside the container lifecycle. The next step is to actually mount a volume

# run a container and mount test-volume into app/
docker run -d --name test --mount source=test-volume,target=/app python3.8:latest

If you now run docker inspect test. you should see the following in the Mounts section

"Mounts": [
    {
        "Type": "volume",
        "Name": "test-volume",
        "Source": "/var/lib/docker/volumes/test-volume/_data",
        "Destination": "/app",
        "Driver": "local",
        "Mode": "z",
        "RW": true,
        "Propagation": ""
    }
]

The docker run would create the volume for you if it didn’t exist before.

Sharing data between multiple replicas of the same service ๐Ÿ”—

When deploying containers in a production environment, you are very likely to deploy more than one replica of the same service, either for fault-tolerance or for performance improvements (horizontal scaling). As discussed before, volumes data can then be shared between the different replicas of the same service.

docker service create -d \
  --replicas=2 \
  --name test \
  --mount source=test-volume,target=/app \
python3.8:latest

The above command creates a new test service with 2 replicas and both sharing the test-volume volume.

If any of the replicas actually goes down because of a problem or because the service actually gets scale down, then the volume will still be there for the remaining replicas. If we eventually remove all the replicas, the volume will also always be there and you’d need to explicitly remove it.

Conclusion ๐Ÿ”—

We just had a first look into the world of Docker volumes. They are a nice separation of concerns between your Docker containers and the way you persist and share data between containers. That’s why they are now recommended by Docker as the best way of persisting data. Hope you enjoyed this post, see you soon for your 8 min code workout.

References ๐Ÿ”—