Maintain Docker daemon without terminating the containers


Photo by Ian Taylor on Unplash

Introduction

Docker was born to help developers build microservices applications. However, going back in time to when Docker was first released, it was a monolith itself. That sounds ironic!

Like the way things happen to almost any monolithic app, Docker was unhappy so they worked hard to bring out a new version of their engine. At the time of writing, Docker has been successfully re-designed with microservices architecture.

One of the things that Docker leverages microservices is the ability to decouple its containers from the daemon. In other words, restarting the daemon does not affect the running containers. These containers are called daemon-less containers and the feature that makes them possible is introduced as live restore.

In this blog, let’s have a look at how we can restart the docker daemon while keeping the container running with live restore.

Live restore

Docker does not have live restore enabled by default. To toggle that, you will need to update the daemon configuration.

The file location varies by your Docker version:

Docker on Linux: /etc/docker/daemon.json
Docker Desktop (Windows/Mac): open the app, hit Settings or Preferences -> Docker Engine

Now, you might want to take a copy of the file as a backup just in case your action breaks the daemon so you can restore the previous version seamlessly. This is highly recommended if you are in production.

Next, add the below option to the file:

{
    "live-restore": true
}

To take effect, your daemon needs to be restarted.

$ sudo service docker restart
Redirecting to /bin/systemctl restart docker.service

How the live restore works

When you stop the Docker daemon, it terminates active containers by default. Turning on live restore allows your containers to be alive during daemon downtime events like upgrades, restarts and even crashes. Once the daemon is up and running again, it will restore its connection to the containers. This is really a big deal for production workloads.

The deep dive

Let’s take a deep dive into how Docker makes daemon-less containers. Before we start, remember to enable live restore along with your Docker by following the procedure in the previous section.

Once everything is ready, it’s time to create a new container.

$ docker run --name nginx-test -d nginx
eb97280508f5ce3122df693355f27aa9d7924be02b190cd56a233eaa037776c1

How a container is created by Docker

When you start “docker run …” command, the Docker client (/usr/bin/docker) will get your input parameters to create a proper payload then send it along with a POST request to the Docker daemon (dockerd).
After receiving the request, dockerd makes a call to container daemon (containerd).
Rather than creating the container right away, containerd pulls the image, converts it into an OCI bundle then passes it to runc.
runc talks to the OS kernel to create new namespaces and configure cgroups for a new process in which all the things we defined in the Docker image will be executed accordingly. This isolated process genuinely is our container.
As soon as runc finishes the job, it exists and leaves the container process to shim for further monitoring and management.

If things go smoothly, you will see the following processes are running:

$ ps -xf
1233 ?        Ssl   12:02 /usr/bin/containerd
30856 ?        Ssl    0:02 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
31047 ?        Sl     0:00 /usr/bin/containerd-shim-runc-v2 -namespace moby -id eb97280508f5ce3122df693355f27aa9d7924be02b190cd
31065 ?        Ss     0:00  \_ nginx: master process nginx -g daemon off;

dockerd and containerd are there and

One more thing…

Take a look! My nginx container app is running along with its parent process “containerd-shim-runc-v2” (shim).

Simulate a daemon downtime

To verify if the live restore is working, you might need to simulate a daemon downtime. I would suggest we try this in a test environment.

Stop the daemon

$ sudo service docker stop
Redirecting to /bin/systemctl stop docker.service
Warning: Stopping docker.service, but it can still be activated by:
docker.socket

Check again and make sure the docker service is inactive.

$ sudo service docker status
Redirecting to /bin/systemctl status docker.service
● docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
Active: inactive (dead) since Sun 2022-10-09 07:22:48 +07; 7s ago

Now, let’s call the ps command again to see the differences.

$ ps -xf
1233 ?        Ssl   12:11 /usr/bin/containerd
31047 ?        Sl     0:00 /usr/bin/containerd-shim-runc-v2 -namespace moby -id eb97280508f5ce3122df693355f27aa9d7924be02b190cd
31065 ?        Ss     0:00  \_ nginx: master process nginx -g daemon off;

We no longer see dockerd but the others still. My nginx container process is still alive. Bingo!

That’s why microservices architecture is such a beautiful thing. Stopping the daemon does not mean everything is down. Docker has full control to leave its microservice shim running to:

Keep any STDIN/STDOUT streams open for the container process.
Monitor the container’s status and report back to the daemon once it is available again.

Conclusion

The live restore feature is a great deal to run containers in production. Thanks to the efforts of the Docker team. Anyways, there are some limitations that you may concern about. For more details, see the official documentation at Live restore.