I screw up the services in my homelab quite often, so I rely on having regular Proxmox backups that I can recover from.

Once or twice though, I’ve screwed up the networking between my hypervisors and Proxmox Backup Server. This has led to silently failing backup jobs. To make those failures less silent, I’ve now set up Healthchecks.io to monitor my backup jobs and alert me if they don’t finish on time.

How it works

vzdump is the utility that Proxmox uses to run backup jobs. It lets us configure a hook script that is called every time the job transitions between different stages. We’ll make a hook script that pings a specific URL every time a job is started, aborted and successfully finished.

Healthchecks.io is a free, push-bashed monitoring service. It’s very straight-forward: You set up a so called monitor and tell it how often you expect your service/script/whatever to ping it. In return, you get a URL. If a ping does not arrive on time, or if you send an explicit “fail ping”, the service notifies you that something is wrong.

If you are already running Uptime Kuma, the same thing can be achieved by setting up a push monitor there. The script below has some commented out examples of using this, but I don’t use it myself so YMMV.

How to (by hand)

  1. Set up a monitor in Healthchecks.io or Uptime Kuma, with the appropriate schedule and notification settings.
  2. Add the script below to your PVE hypervisors, and make it executable (chmod +x filename.sh). In our example, we are saving the script as /root/backup-hook.sh.
  3. Edit /etc/vzdump.conf, adding script: <path-to-script> at the end. In our case, this will be script: /root/backup-hook.sh.
  4. Run a backup job and check Healthchecks or Uptime Kuma, making sure that pings are being registered properly. Don’t forget to also test a failed scenario by e.g. turning your PBS server off, or disabling your scheduled backups for a day.

How to (using Ansible)

Since I end up re-installing my PVE hypervisors from time to time, I try to only configure them using Ansible. Here are two tasks that will set up the hook script for you:

- name: "Backup script: Add to vzdump config"
  ansible.builtin.lineinfile:
    dest: /etc/vzdump.conf
    regexp: "^script"
    line: "script: /root/backup-hook.sh"
    state: present
- name: "Backup script: Add script file"
  ansible.builtin.copy:
    src: backup-hook.sh
    dest: /root/backup-hook.sh
    mode: '0700'

The hook script

#!/bin/bash
case "$@" in
    job-start)
        ## Healthchecks
        curl -m 10 --retry 5 <your hc-ping.com URL>/start
        ## Uptime Kuma doesn't support "start pings"
        ;;
    job-end)
        ## Healthchecks
        curl -m 10 --retry 5 <your hc-ping.com URL>
        ## Uptime Kuma
        # curl -m 10 --retry 5 https://uptimekuma-ct.paradise-drum.ts.net/api/push/5ym0rCfx67?status=up&msg=OK&ping=
        ;;
    job-abort)
        ## Healthchecks
        curl -m 10 --retry 5 <your hc-ping.com URL>/fail
        ## Uptime Kuma
        # curl -m 10 --retry 5 https://uptimekuma-ct.paradise-drum.ts.net/api/push/5ym0rCfx67?status=down&msg=aborted&ping=
        ;;
esac