Successfully deployed!

Getting services deployed using minimal containers

So I’ve finally managed to get Plume installed and deployed on my VPS. The idea is to write fairly regularly, even if it is a small update.

Today though, I’ll give some backstory on how I ended up on the infrastructure I’m running today. I plan on writing more various things in detail in the future.

TL;DR: podman pod is pretty awesome. It doesn’t fit my ideal deployment for my infrastructure, but it’s pretty darn close. It also works today which is a good deal better than what I had been trying to get working before.

Background

First, some background. I’ve come to enjoy having up my own infrastructure for my technical needs. I like to understand more about how these things work and I’ve definitely learned a lot along the way. There are some things that I have finally ended up migrating to “the cloud” for, but that’s because those things have ended up being more trouble than it’s worth in the long run (namely email for which I have almost fully migrated over to Fastmail from Google’s offerings).

My first foray into hosting services myself started in college (2007) where I had the old computer my parents had purchased in 2000. This ended up being a spare machine by then and it was the standard “single Linux install that hosts everything” experience. I mainly hosted Git repositories that I used for classes and collaboration with friends on projects.

Using FreeBSD jails

Once I graduated and moved out on my own in 2010, I used a box a friend had given to me to use FreeBSD jails to keep each service separate from each other. Each only saw what was needed through the filesystem, exposed ports as far as was needed, had DNS set up, and so on. This is where most of my (useful) networking knowledge comes from. Adding IP addresses to virtual interfaces, netmasks, DNS setup, NAT, etc. I ran my own DNS server which registered new clients from the DHCP server, used static routing for the jailed services, and various other bits. It was complicated to get set up, but it made it really easy to move the network underneath whatever Internet I had at the time (tethering, apartment building wifi, etc.) without disruption otherwise. Eventually in 2017 or so, I got tired of updating and rebuilding jails across FreeBSD releases and decided to start looking at Linux containers to start bring everything together since I could at least develop somewhere other than the deployment hardware. It also would mean that building the image to run could be more easily separated from the instance in which it runs.

Building service images

Alas, I was not very thrilled with the tooling that was available at the time. Docker was, more or less, the only game in town for building and running containers at the (small) scale I wanted at the time. Kubernetes existed, but for running fewer than a dozen containers, none of which needed dynamic scaling, it was just absolute overkill. I also looked at docker-compose, but it wasn’t meshing with what I wanted either because I enjoy using systemd for managing services. I wanted each container to use systemd for managing the services and tasks inside of the container because it would also help to centralize logging and make it easier to ensure that the service started up reliably within the container.

This ended up being a high bar. Docker doesn’t like starting systemd inside of a container and also doesn’t play well with systemd in that its central daemon design makes it tough to ensure that each container itself is being reliably run and managed. I wasted quite a bit of effort trying to use Docker’s runtime functionality to handle the systemd integration, but as far as I know, this is still not all that well supported. Luckily, tools such as podman had become more viable by late 2018. Around this time, I had set up a DigitalOcean droplet (my referral link if you’re interested in trying it out) in order to actually deploy this somewhere publicly reachable. After experimenting around with various ways of building images, I decided on using buildah to actually build the images because the Docker layering system and RUN design makes it hard to factor out things at the end of a deployment setup.

The image builds are part of a CMake project to build the images as necessary so that I can control things via cache variables, have images rebuild as necessary, easily push images as needed, and make an “install” which I can deploy to the host for the bits that need to live there. There is also some declarative bits in so that a service can state “I need a backup” and the restic container knows to add it to its set of directories to backup.

systemd-nspawn is oh-so-close

In the systemd-nspawn world, there are .nspawn files which declare things like networking, how the journal should be handled, which directories should be mounted where, and so on. It’s really nicely done and keeps the “how the host interacts” separate from “the content of the container”. Docker and podman both prefer to store such information with the container instance itself through run arguments. In order to try and ensure that there’s no state inside of the containers themselves, the containers are run using ReadOnly=true which ensures that anything that is written is mounted into the container through a writable volume. This helps to ensure that I can update the containers without worrying that there’s some important state left inside of the container.

I had eventually decided to try and use systemd-nspawn in order to get the maximum use out of the systemd integrations. The host can get logs from the containers easily using journalctl -M, machinectl is a decent interface for inspecting things, bind mounts and port exposures are in declarative .nspawn files, read-only images are well supported, and more. However, it seems that inter-container networking with systemd-nspawn is not really a thing. I didn’t want the internal bits (e.g., the database) to be publicly exposed, so I couldn’t use the top-level DNS to do this routing for me. I could get all of the containers running and such just fine, but without being able to communicate between each other, nginx couldn’t be a reverse proxy, nothing could find the database, and all the other problems one has when services can’t talk to each other. I would try to get the containers to communicate to each other for a few hours over a few days, get maybe a little bit farther, then put it off for a few months until I got the urge to try again. This last time, I decided to just get it done even it meant I couldn’t get logging exactly right.

Using podman to actually get things working

With podman, instead of managing .nspawn files, I now have a list of arguments to pass to podman run to manage how each container runs. These are basically the volume mounts and ports that need exposed. While configuring within the CMake project, this information is gathered and then at the end a script is written to create the pod and add the containers to it. First, there is some boilerplate to get things set up:

add_to_pod () {
    local name="$1"
    readonly name
    shift

    podman run -d --pod "$pod" \
        --name "$pod-$name" \
        --read-only \
        --read-only-tmpfs=false \
        --tmpfs /var/tmp \
        --tmpfs /var/log/journal \
        --tmpfs /var/lib/systemd \
        "$@" \
        "localhost/$name:latest"
}

podman pod create -n "$pod" \
    --replace=true \
    --infra=true \
    --network=slirp4netns:enable_ipv6=true \
    # port flags

Then each container can describe what it needs from the host:

podman_pod_add_container(NAME plume
    VOLUMES
        # Password required for the machine.
        /var/lib/containers/secrets/plume:/run/secrets/plume:ro,Z
        # Search storage.
        /var/lib/containers/volumes/plume/search_index:/var/lib/plume/search_index:rw,Z
        # Media storage.
        /var/lib/containers/volumes/plume/media:/var/lib/plume/static/media:rw,Z)

which then turns into:

add_to_pod plume \
    -v /var/lib/containers/secrets/plume:/run/secrets/plume:ro,Z \
    -v /var/lib/containers/volumes/plume/search_index:/var/lib/plume/search_index:rw,Z \
    -v /var/lib/containers/volumes/plume/media:/var/lib/plume/static/media:rw,Z

This add_to_pod function call is generated for each container in the deployment seen during the configure step. At the end, actually running the pod is installed into the host using podman generate systemd to create service files for the pod:

# Generate systemd files for the pod.
cd /etc/systemd/system
podman generate systemd --name "$pod" --files
systemctl enable "pod-$pod.service"

With this, I can use systemd to bring the services up at boot and ensure that everything works and deploys together. There are a few issues left to tackle (namely that the containers do not start in the right order, so deployment requires some manual service restarts right now), but I’ll handle those if deployment updates become more common.