Monitoring Strategy
How the homelab is monitored, alerted, and observed.
Monitoring Stack Overview
| Component |
Location |
Purpose |
| Uptime Kuma |
VPS |
External health checks (HTTP, TCP, ping) |
| ntfy |
VPS |
Push notifications (alerts, backup status) |
| VictoriaMetrics (Papa) |
Docker VM |
Time-series metrics database |
| vmagent |
Docker VM |
Prometheus-compatible metrics scraper |
| Grafana (Papa) |
Docker VM |
Dashboards and visualization |
| Dozzle (Ysyry) |
Docker VM |
Real-time container log viewer |
| Glances |
NAS |
System resource monitor (HA integration) |
| Watchtower |
Docker VM |
Auto-update monitoring (Sunday 4 AM) |
Metrics Pipeline
node_exporter (Docker VM :9100) ──┐
node_exporter (NAS :9100) ────────┤
VictoriaMetrics (:8428) ──────────┤
vmagent (:8429) ──────────────────┤
Grafana (:3000) ──────────────────┼──► vmagent ──► VictoriaMetrics ──► Grafana
vmalert (:8880) ──────────────────┤ (30s scrape) (90d retention) (papa.cronova.dev)
Alertmanager (:9093) ─────────────┤ │
cAdvisor (:8080) ─────────────────┤ vmalert ──► Alertmanager ──► ntfy
Home Assistant (/api/prometheus) ──┘ (30s eval) (group/dedup) (push)
Config: docker/fixed/docker-vm/monitoring/prometheus.yml
Scrape Targets
| Job |
Target |
Labels |
| node-docker-vm |
host.docker.internal:9100 |
instance: docker-vm |
| node-nas |
100.82.77.97:9100 |
instance: nas |
| victoriametrics |
victoriametrics:8428 |
instance: victoriametrics |
| vmagent |
vmagent:8429 |
instance: vmagent |
| grafana |
grafana:3000 |
instance: grafana |
| cadvisor |
cadvisor:8080 |
instance: docker-vm |
| vmalert |
vmalert:8880 |
instance: vmalert |
| alertmanager |
alertmanager:9093 |
instance: alertmanager |
| home-assistant |
host.docker.internal:8123/api/prometheus |
instance: home-assistant |
VictoriaMetrics
- Image:
victoriametrics/victoria-metrics:latest
- Port: 8428 (localhost only)
- Retention: 90 days
- Memory limit: 1GB
- Data volume:
vm-data
Grafana
- Image:
grafana/grafana:latest
- Port: 3000 (localhost only, behind Caddy + Authelia)
- URL:
https://papa.cronova.dev
- Plugin:
victoriametrics-metrics-datasource
- Dashboards provisioned via
grafana/provisioning/dashboards/json/
Grafana Dashboards
| Dashboard |
Grafana ID |
Source |
What it shows |
| Node Exporter Full |
1860 |
node-docker-vm, node-nas |
Host CPU, RAM, disk, network |
| VictoriaMetrics Single |
10229 |
victoriametrics |
TSDB health, ingestion rate, storage |
| cAdvisor Docker |
19792 |
cadvisor |
Per-container CPU, memory, network |
| vmagent |
12683 |
vmagent |
Scrape stats, target health, remote write |
| Grafana Internals |
3590 |
grafana |
API response times, sessions, memory |
| Homelab Overview |
— (custom) |
all targets |
Host health, containers, network, monitoring health |
Uptime Kuma Monitors
Uptime Kuma runs on the VPS and monitors all services via Tailscale mesh. 35 monitors managed via scripts/setup-uptime-kuma.py (single source of truth). Alerts route to ntfy topics by priority tier.
Critical (60s interval, ntfy urgent)
| Monitor |
Type |
Target |
| Headscale |
HTTP |
https://hs.cronova.dev/health |
| Vaultwarden |
HTTP |
https://vault.cronova.dev/alive |
| Pi-hole DNS |
TCP |
100.68.63.168:53 |
| Caddy (Docker VM) |
HTTP |
https://cronova.dev |
| OPNsense Gateway |
Ping |
192.168.0.1 |
| Uptime Kuma |
HTTP |
https://status.cronova.dev |
| ntfy |
HTTP |
https://notify.cronova.dev |
| Caddy (VPS) |
TCP |
100.77.172.46:443 |
| VPS Pi-hole |
TCP |
127.0.0.1:53 |
| cronova.dev |
HTTP |
https://cronova.dev (900s interval) |
Warning (60-300s interval, ntfy high)
| Monitor |
Type |
Target |
| Home Assistant (Jara) |
HTTP |
https://jara.cronova.dev (300s) |
| Frigate (Taguato) |
HTTP |
https://taguato.cronova.dev/api/version (60s) |
| Forgejo |
HTTP |
http://100.82.77.97:3000 (60s) |
| NAS Samba |
TCP |
100.82.77.97:445 (300s) |
| Restic REST |
Keyword |
http://100.82.77.97:8000 (keyword: "Unauthorized", expect 401) |
| Coolify (Tajy) |
HTTP |
https://tajy.cronova.dev (300s) |
| Authelia (Okẽ) |
HTTP |
https://auth.cronova.dev (300s) |
| Javya |
HTTP |
https://javya.cronova.dev (60s) |
| Javya API |
HTTP |
https://javya-api.cronova.dev/health (60s) |
| NAS |
Ping |
100.82.77.97 (60s) |
| Docker VM |
Ping |
100.68.63.168 (300s) |
| Watchtower |
Ping |
100.68.63.168 (60s) |
Info (300-900s interval, ntfy default)
| Monitor |
Type |
Target |
| Jellyfin (Yrasema) |
HTTP |
https://yrasema.cronova.dev/health |
| Grafana (Papa) |
HTTP |
https://papa.cronova.dev |
| Immich (Vera) |
HTTP |
https://vera.cronova.dev |
| Syncthing |
HTTP |
http://100.82.77.97:8384/rest/noauth/health |
| Glances |
Keyword |
http://100.82.77.97:61208/api/4/cpu (keyword: "total") |
| Pi-hole Fixed |
TCP |
100.68.63.168:53 (300s) |
| DNS - cronova.dev |
DNS |
cronova.dev via 1.1.1.1 |
| Beryl AX |
Ping |
100.102.244.131 (120s, may be offline) |
| Beryl AX - Admin |
TCP |
100.102.244.131:80 (120s) |
| hermosilla.me |
HTTP |
https://hermosilla.me/ (900s) |
|
HTTP |
` (900s) |
ntfy Notification Architecture
URL: https://notify.cronova.dev (VPS, Caddy reverse proxy)
Topics
| Topic |
Purpose |
Priority |
cronova-critical |
Service down, data loss risk |
Urgent (wakes phone) |
cronova-warning |
Degraded performance |
High |
cronova-info |
Backups completed, maintenance |
Default (silent) |
cronova-test |
Testing notifications |
Low |
Auth
- Anonymous access: deny-all
- Service tokens for automation (backup sidecars, scripts)
- User
augusto has full read/write on all topics
Integration Points
| Source |
Topic |
Trigger |
| Uptime Kuma |
cronova-critical / cronova-warning |
Service down/degraded |
| Backup sidecars |
cronova-critical / cronova-info |
Backup failure / success |
scripts/backup-notify.sh |
Per-service routing |
Backup event notifications |
Subscribe on Phone
Android/iOS ntfy app → Subscribe to:
https://notify.cronova.dev/cronova-critical
https://notify.cronova.dev/cronova-warning
Container Log Monitoring — Dozzle (Ysyry)
- URL:
https://ysyry.cronova.dev (Caddy + Authelia)
- Real-time Docker log viewer for all containers on Docker VM
- No persistent storage — live view only
- Useful for debugging container startup issues, watching Frigate detections, checking backup logs
Auto-Update Monitoring — Watchtower
- Schedule: Sunday 4:00 AM (label-enabled, opt-in via
com.centurylinklabs.watchtower.enable=true)
- Image:
nicholas-fedor/watchtower:1.14.2 (maintained fork — official containrrr is abandoned/Docker 29+ incompatible)
- Behavior: Rolling restarts, old image cleanup
- Excluded from auto-update (manual only): vaultwarden, frigate, headscale
| Integration |
Source |
What It Monitors |
| System Monitor |
Docker VM |
CPU, RAM, disk usage |
| Glances |
NAS (100.82.77.97:61208) |
NAS system metrics |
| Proxmox VE (HACS) |
Oga (100.78.12.241) |
Host and VM status |
| Frigate |
MQTT (mqtt-net) |
Camera events, detection counts |
Monitoring Checklist
Weekly
- [ ] Check Uptime Kuma dashboard — all monitors green
- [ ] Review ntfy alert history — any unexpected alerts
- [ ] Spot-check Dozzle for container error logs
Monthly (1st Sunday)
- [ ] Spot-check NAS restic repos for snapshot freshness (until the alerting plan —
docs/plans/backup-success-alerting-2026-04-22.md — lands and this becomes automatic)
- [ ] Check Grafana dashboards — disk usage trends, RAM pressure
- [ ] Verify vmagent scrape targets are all up (
/targets endpoint)
- [ ] Review Watchtower update logs
- [ ] Check NAS Purple 2TB usage (97% — monitor closely)
Quarterly
- [ ] Manual backup restore drill per
docs/guides/backup-test-procedure.md (pending Task #18 to automate)
- [ ] Review and update Uptime Kuma monitors for new/removed services
- [ ] Test ntfy notification delivery (all priority levels)
- [ ] Review VictoriaMetrics retention and disk usage
References