This is a text-only version of the following page on https://raymii.org: --- Title : Proxmox VE 7 Corosync QDevice in a Docker container Author : Remy van Elst Date : 17-04-2022 URL : https://raymii.org/s/tutorials/Proxmox_VE_7_Corosync_QDevice_in_Docker.html Format : Markdown/HTML --- At home I have a 2 node Proxmox VE cluster consisting of 2 HP EliteDesk Mini machines, both running with 16 GB RAM and both an NVMe and SATA SSD with ZFS on root (256 GB). It's small enough (physically) and is just enough for my homelab needs specs wise. I have a few services at home, the rest runs on 'regular' virtual private servers online. Proxmox VE, a virtualization stack based on Debian, has support for clustering. Clustering can mean different things to different people, in this case the cluster support means a group of physical servers, but Proxmox also supports actual high-availability with fail over and (live) migration. I also have a small NAS with spinning disks for shared storage via NFS, which is helpful when experimenting with live migration. ![cluster][8] > My modest two node Proxmox VE cluster, with extra quorum device For a cluster (in any sense of the word), you need at least 3 nodes, otherwise there is no quorum. Meaning, if one node goes down, it (and the other node) cannot know if the problem is at their side or the other side. With an uneven number of nodes, one node can always ask another node, hey, it is just me or do so see the issue as well? If it receives no reply, it knows it's their problem, if the other node does reply, they know it's the third node that has the problem. Corosync, the cluster software used by Proxmox, supports an external Quorum device. This is a small piece of software running on a third node which provides an extra vote for the quorum (the extra vote) without being a Proxmox VE server. Any cluster with a even number of nodes can get such a split-brain situation, and in my experience, those are bad. A two node proxmox cluster shuts down all virtual machines and containers in the case of one node failure, the cluster is in read-only mode. So if you power down one of your proxmox servers, the other one is offline as well, even if you've just migrated all machines to that node. With the extra quorum device, you can safely power down one node. <p class="ad"> <b>Recently I removed all Google Ads from this site due to their invasive tracking, as well as Google Analytics. Please, if you found this content useful, consider a small donation using any of the options below:</b><br><br> <a href="https://leafnode.nl">I'm developing an open source monitoring app called Leaf Node Monitoring, for windows, linux & android. Go check it out!</a><br><br> <a href="https://github.com/sponsors/RaymiiOrg/">Consider sponsoring me on Github. It means the world to me if you show your appreciation and you'll help pay the server costs.</a><br><br> <a href="https://www.digitalocean.com/?refcode=7435ae6b8212">You can also sponsor me by getting a Digital Ocean VPS. With this referral link you'll get $100 credit for 60 days. </a><br><br> </p> In my case I wanted to run this on my NAS, since (physical) space is a premium. The NAS supports Docker, this guide explains how to run the QDevice for Proxmox in a Docker container. The docker container is a Debian 11 container with systemd and openssh, since the proxmox QDevice setup requires that. More like an LXC container than a docker container. The NAS itself does not run Debian and has no corosync packages, which is why I resorted to this route. If you can run an extra debian server, there is no need to go the Docker route, you can just follow the regular guide. There is a [qdevice docker image][3] but that guide does not work for Proxmox VE 7 and requires a lot of manual setup. Using my method involves a lot less steps, since you're basically running an extra debian VPS (a container with systemd and openssh). I've been working with Corosync since 2012, have written [a few posts in 2013] [7] so I consider myself experienced in corosync usage. The Proxmox web interface makes clustering very easy. ### Proxmox VE cluster QDevice technical explanation My cluster has 2 nodes, both use ZFS as local storage. The [Proxmox Docs][1] explains all the details on running a cluster and the GUI makes it very easy to setup. For this guide I'm assuming you already have the 2 node cluster setup. Quoting the [documentation page][2] with more information on the QDevice: #### Corosync External Vote Support This section describes a way to deploy an external voter in a Proxmox VE cluster. When configured, the cluster can sustain more node failures without violating safety properties of the cluster communication. For this to work, there are two services involved: - A `QDevice daemon` which runs on each Proxmox VE node - An external vote daemon which runs on an independent server As a result, you can achieve higher availability, even in smaller setups (for example 2+1 nodes). #### QDevice Technical Overview The Corosync Quorum Device (`QDevice`) is a daemon which runs on each cluster node. It provides a configured number of votes to the cluster's quorum subsystem, based on an externally running third-party arbitrator's decision. Its primary use is to allow a cluster to sustain more node failures than standard quorum rules allow. This can be done safely as the external device can see all nodes and thus choose only one set of nodes to give its vote. This will only be done if said set of nodes can have quorum (again) after receiving the third-party vote. Currently, only `QDevice Net` is supported as a third-party arbitrator. This is a daemon which provides a vote to a cluster partition, if it can reach the partition members over the network. It will only give votes to one partition of a cluster at any time. It's designed to support multiple clusters and is almost configuration and state free. New clusters are handled dynamically and no configuration file is needed on the host running a `QDevice`. The only requirements for the external host are that it needs network access to the cluster and to have a `corosync-qnetd` package available. We provide a package for Debian based hosts, and other Linux distributions should also have a package available through their respective package manager. Note In contrast to corosync itself, a `QDevice` connects to the cluster over TCP/IP. The daemon may even run outside of the cluster's LAN and can have longer latencies than 2 ms. We support `QDevices` for clusters with an even number of nodes and recommend it for 2 node clusters, if they should provide higher availability. For clusters with an odd node count, **we currently discourage the use of QDevices.** The reason for this is the difference in the votes which the `QDevice` provides for each cluster type. Even numbered clusters get a single additional vote, which only increases availability, because if the `QDevice` itself fails, you are in the same position as with no `QDevice` at all. On the other hand, with an odd numbered cluster size, the `QDevice` provides (N-1) votes, where N corresponds to the cluster node count. This alternative behavior makes sense; if it had only one additional vote, the cluster could get into a split-brain situation. This algorithm allows for all nodes but one (and naturally the `QDevice` itself) to fail. However, there are two drawbacks to this: - If the QNet daemon itself fails, no other node may fail or the cluster immediately loses quorum. For example, in a cluster with 15 nodes, 7 could fail before the cluster becomes inquorate. But, if a QDevice is configured here and it itself fails, no single node of the 15 may fail. The QDevice acts almost as a single point of failure in this case. - The fact that all but one node plus `QDevice` may fail sounds promising at first, but this may result in a mass recovery of HA services, which could overload the single remaining node. Furthermore, a Ceph server will stop providing services if only `((N-1)/2)` nodes or less remain online. (end quote) ### Proxmox VE host setup, part 1 On all your Proxmox VE 7 machines you must manually install an extra package: apt install corosync-qdevice Again, I'm assuming you have your cluster configured already. I'm also using different VLAN's (extra USB 3 gigabit nic's) for the different networks, but I'm leaving that outside of the scope of this guide to keep it simple. After installing the package we can continue on configuring our Docker container. When the docker container is running, we finish of the setup by configuring the new `qdevice` from one of our Proxmox VE hosts. ### Docker container setup The Docker container cannot run on one of your proxmox servers or inside a VM on proxmox. You must run it on a different, external server. (You technically could run it on proxmox, but that defeats the whole point.) I'm using my NAS, which supports Docker. The container image is based on [this image][4], which builds a debian image with OpenSSH and systemd. Mine is simplified to only run Debian Bullseye (11). As opposed to the example command, `CAP_SYS_ADMIN` is not required, the systemd version in Debian 11 is recent enough. The Docker image also installs the `corosync-qnetd` including changing the permissions on the `/etc/corosync` folder to the correct user. The container should automatically start at boot of the docker host, but if you want to run it on another host, copy the data volume folder and start it there. It will then have all the config files required. Create a folder on the Docker host where the Corosync container will store it's corosync cluster config: mkdir -p /volume1/docker/qnetd/corosync-data This can be any path you like, the container will mount it as a volume. Navigate to the folder one level above, where I'll store the Dockerfile and some other info: cd /volume1/docker/qnetd On my docker host, all named containers get a folder under `/volume1/docker/$container-name`, with subfolders for each volume. Create a `Dockerfile`: vim Dockerfile Place the following contents inside: FROM debian:bullseye RUN echo 'debconf debconf/frontend select teletype' | debconf-set-selections RUN apt-get update RUN apt-get dist-upgrade -qy RUN apt-get install -qy --no-install-recommends systemd systemd-sysv corosync-qnetd openssh-server RUN apt-get clean RUN rm -rf /var/lib/apt/lists/* /var/log/alternatives.log /var/log/apt/history.log /var/log/apt/term.log /var/log/dpkg.log RUN sed -i 's/#PermitRootLogin prohibit-password/PermitRootLogin yes/' /etc/ssh/sshd_config RUN echo 'root:password' | chpasswd RUN chown -R coroqnetd:coroqnetd /etc/corosync/ RUN systemctl mask -- dev-hugepages.mount sys-fs-fuse-connections.mount RUN rm -f /etc/machine-id /var/lib/dbus/machine-id FROM debian:bullseye COPY --from=0 / / ENV container docker STOPSIGNAL SIGRTMIN+3 VOLUME [ "/sys/fs/cgroup", "/run", "/run/lock", "/tmp" ] CMD [ "/sbin/init" ] Change the `password` part in the line `'root:password'` to a secure password which will be used for SSH root login. Build the docker image: docker build . -t debian-qdevice The `-t` flag will give the image a recognizable name. You can now start a container based on that image: docker run -d -it \ --name qnetd \ --net=macvlan \ --ip=192.0.2.20 \ -v /volume1/docker/qnetd/corosync-data:/etc/corosync \ -v /sys/fs/cgroup:/sys/fs/cgroup:ro \ --restart=always \ debian-qdevice:latest I'm using [macvlan][5] to give my containers their own IP address. The [Proxmox QDevice Setup][6] code does not show support for a different SSH port, which is why I gave the container it's own IP. You could put a name/port combo in your `~/.ssh/config` file on the Proxmox hosts like so: Host 192.0.2.19 HostName 192.0.2.19 Port 2222 User root Then you can start the container without `--net=macvlan --ip=192...` but you must specify the ports used by corosync and SSH: `-p 5403:5403 -p 2222:22`. The setup command later on can use the IP of the docker host, but be extra careful that you have setup your SSH config file correctly. `192.0.2.19` is the example IP of my docker host, not of the docker container. You could unexpectedly run commands as root on your Docker host. The start command will print out a UUID of the new container, you can check if it's running with the `docker ps` command: docker ps Example output: CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 45d861ce6acf debian-qdevice:latest "/sbin/init" About a minute ago Up About a minute qnetd d07f990f245e pihole/pihole:latest "/s6-init" 8 days ago Up 8 days (healthy) pihole2 Test if you can SSH into the container with the username `root` and your password. If SSH works, continue on with the guide. ### Proxmox VE host setup, part 2 This part has to be done on one of your Proxmox VE servers, it is synced automatically to the other servers. Execute the following command: pvecm qdevice setup 192.0.2.20 You're asked for the root password: /bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub" /bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed /bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys root@192.0.2.20's password: The rest of the output is all setup of corosync: Number of key(s) added: 1 Now try logging into the machine, with: "ssh 'root@192.0.2.20'" and check to make sure that only the key(s) you wanted were added. INFO: initializing qnetd server Certificate database (/etc/corosync/qnetd/nssdb) already exists. Delete it to initialize new db INFO: copying CA cert and initializing on all nodes node 'pve1': Creating /etc/corosync/qdevice/net/nssdb password file contains no data node 'pve1': Creating new key and cert db node 'pve1': Creating new noise file /etc/corosync/qdevice/net/nssdb/noise.txt node 'pve1': Importing CA node 'pve2': Creating /etc/corosync/qdevice/net/nssdb password file contains no data node 'pve2': Creating new key and cert db node 'pve2': Creating new noise file /etc/corosync/qdevice/net/nssdb/noise.txt node 'pve2': Importing CA INFO: generating cert request Creating new certificate request Generating key. This may take a few moments... Certificate request stored in /etc/corosync/qdevice/net/nssdb/qdevice-net-node.crq INFO: copying exported cert request to qnetd server INFO: sign and export cluster cert Signing cluster certificate Certificate stored in /etc/corosync/qnetd/nssdb/cluster-cluster1.crt INFO: copy exported CRT INFO: import certificate Importing signed cluster certificate Notice: Trust flag u is set automatically if the private key is present. pk12util: PKCS12 EXPORT SUCCESSFUL Certificate stored in /etc/corosync/qdevice/net/nssdb/qdevice-net-node.p12 INFO: copy and import pk12 cert to all nodes node 'pve1': Importing cluster certificate and key node 'pve1': pk12util: PKCS12 IMPORT SUCCESSFUL node 'pve2': Importing cluster certificate and key node 'pve2': pk12util: PKCS12 IMPORT SUCCESSFUL INFO: add QDevice to cluster configuration INFO: start and enable corosync qdevice daemon on node 'pve1'... Synchronizing state of corosync-qdevice.service with SysV service script with /lib/systemd/systemd-sysv-install. Executing: /lib/systemd/systemd-sysv-install enable corosync-qdevice Created symlink /etc/systemd/system/multi-user.target.wants/corosync-qdevice.service -> /lib/systemd/system/corosync-qdevice.service. INFO: start and enable corosync qdevice daemon on node 'pve2'... Synchronizing state of corosync-qdevice.service with SysV service script with /lib/systemd/systemd-sysv-install. Executing: /lib/systemd/systemd-sysv-install enable corosync-qdevice Created symlink /etc/systemd/system/multi-user.target.wants/corosync-qdevice.service -> /lib/systemd/system/corosync-qdevice.service. Reloading corosync.conf... Done You can check the status of the cluster and quorum device with the following command: pvecm status Example output with the Quorum device setup: Cluster information ------------------- Name: cluster1 Config Version: 7 Transport: knet Secure auth: on Quorum information ------------------ Date: Sun Apr 17 21:31:06 2022 Quorum provider: corosync_votequorum Nodes: 2 Node ID: 0x00000001 Ring ID: 1.98 Quorate: Yes Votequorum information ---------------------- Expected votes: 3 Highest expected: 3 Total votes: 3 Quorum: 2 Flags: Quorate Qdevice Membership information ---------------------- Nodeid Votes Qdevice Name 0x00000001 1 A,V,NMW 192.0.2.10 (local) 0x00000002 1 A,V,NMW 192.0.2.11 0x00000000 1 Qdevice If you have not setup the QDevice correctly, the last part of the output will be different, showing no votes for the QDevice: Votequorum information ---------------------- Expected votes: 3 Highest expected: 3 Total votes: 2 Quorum: 2 Flags: Quorate Qdevice Membership information ---------------------- Nodeid Votes Qdevice Name 0x00000001 1 A,V,NMW 192.0.2.10 (local) 0x00000002 1 A,V,NMW 192.0.2.11 0x00000000 0 Qdevice (votes 1) Check the Docker container (login via SSH), you should see the daemon running via `ps auxf`: root@d74eb68b6507:~# ps auxf USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.0 0.1 17676 9568 ? Ss 19:30 0:00 /sbin/init coroqne+ 30 0.0 0.2 17692 13572 ? Ss 19:30 0:00 /usr/bin/corosync-qnetd -f root 33 0.0 0.0 4332 2140 pts/0 Ss+ 19:30 0:00 /sbin/agetty -o -p -- \u --noclear --keep-baud console 115200,38400 root 34 0.0 0.1 13272 7644 ? Ss 19:30 0:00 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups root 97 6.5 0.1 13736 8256 ? Ss 19:34 0:00 \_ sshd: root@pts/1 root 103 2.0 0.0 3964 3420 pts/1 Ss 19:34 0:00 \_ -bash root 106 0.0 0.0 6696 3000 pts/1 R+ 19:34 0:00 \_ ps auxf At my first attempt of Dockerizing the QDevice, I forgot to set the correct permissions on the `/etc/corosync` folder. The daemon failed to start. There is no logging inside the container, but after running the daemon manually as root (which worked) and inspecting the systemd unit file, the cause of the issue was clear. The QDevice is not visible inside the web interface as far as I know. [1]: https://pve.proxmox.com/wiki/Cluster_Manager [2]: https://web.archive.org/web/20220323202732/https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_corosync_external_vote_support [3]: https://github.com/modelrockettier/docker-corosync-qnetd [4]: https://github.com/alehaa/docker-debian-systemd [5]: https://docs.docker.com/network/macvlan/ [6]: https://web.archive.org/web/20220417190820/https://github.com/proxmox/pve-cluster/blob/master/data/PVE/CLI/pvecm.pm#L149 [7]: /s/tags/corosync.html [8]: /s/inc/img/pve-cluster.png --- License: All the text on this website is free as in freedom unless stated otherwise. This means you can use it in any way you want, you can copy it, change it the way you like and republish it, as long as you release the (modified) content under the same license to give others the same freedoms you've got and place my name and a link to this site with the article as source. This site uses Google Analytics for statistics and Google Adwords for advertisements. You are tracked and Google knows everything about you. Use an adblocker like ublock-origin if you don't want it. All the code on this website is licensed under the GNU GPL v3 license unless already licensed under a license which does not allows this form of licensing or if another license is stated on that page / in that software: This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>. Just to be clear, the information on this website is for meant for educational purposes and you use it at your own risk. I do not take responsibility if you screw something up. Use common sense, do not 'rm -rf /' as root for example. If you have any questions then do not hesitate to contact me. See https://raymii.org/s/static/About.html for details.