Monitoring Strategy¶

How the homelab is monitored, alerted, and observed.

Monitoring Stack Overview¶

Component	Location	Purpose

Metrics Pipeline¶

node_exporter (Docker VM :9100) ──┐
node_exporter (NAS :9100) ────────┤
VictoriaMetrics (:8428) ──────────┤
vmagent (:8429) ──────────────────┤
Grafana (:3000) ──────────────────┼──► vmagent ──► VictoriaMetrics ──► Grafana
vmalert (:8880) ──────────────────┤    (30s scrape)    (90d retention)    (papa.cronova.dev)
Alertmanager (:9093) ─────────────┤                         │
cAdvisor (:8080) ─────────────────┤                    vmalert ──► Alertmanager ──► ntfy
Home Assistant (/api/prometheus) ──┘                   (30s eval)    (group/dedup)    (push)

Config: docker/fixed/docker-vm/monitoring/prometheus.yml

Scrape Targets¶

Job	Target	Labels

VictoriaMetrics¶

Image: victoriametrics/victoria-metrics:latest
Port: 8428 (localhost only)
Retention: 90 days
Memory limit: 1GB
Data volume: vm-data

Grafana¶

Image: grafana/grafana:latest
Port: 3000 (localhost only, behind Caddy + Authelia)
URL: https://papa.cronova.dev
Plugin: victoriametrics-metrics-datasource
Dashboards provisioned via grafana/provisioning/dashboards/json/

Grafana Dashboards¶

Dashboard	Grafana ID	Source	What it shows

Uptime Kuma Monitors¶

Uptime Kuma runs on the VPS and monitors all services via Tailscale mesh. 35 monitors managed via scripts/setup-uptime-kuma.py (single source of truth). Alerts route to ntfy topics by priority tier.

Critical (60s interval, ntfy urgent)¶

Monitor	Type	Target

Warning (60-300s interval, ntfy high)¶

Monitor	Type	Target

Info (300-900s interval, ntfy default)¶

Monitor	Type	Target

ntfy Notification Architecture¶

URL: https://notify.cronova.dev (VPS, Caddy reverse proxy)

Topics¶

Topic	Purpose	Priority

Auth¶

Anonymous access: deny-all
Service tokens for automation (backup sidecars, scripts)
User augusto has full read/write on all topics

Integration Points¶

Source	Topic	Trigger

Android/iOS ntfy app → Subscribe to:
  https://notify.cronova.dev/cronova-critical
  https://notify.cronova.dev/cronova-warning

Container Log Monitoring — Dozzle (Ysyry)¶

URL: https://ysyry.cronova.dev (Caddy + Authelia)
Real-time Docker log viewer for all containers on Docker VM
No persistent storage — live view only
Useful for debugging container startup issues, watching Frigate detections, checking backup logs

Auto-Update Monitoring — Watchtower¶

Schedule: Sunday 4:00 AM (label-enabled, opt-in via com.centurylinklabs.watchtower.enable=true)
Image: nicholas-fedor/watchtower:1.14.2 (maintained fork — official containrrr is abandoned/Docker 29+ incompatible)
Behavior: Rolling restarts, old image cleanup
Excluded from auto-update (manual only): vaultwarden, frigate, headscale

Integration	Source	What It Monitors

Monitoring Checklist¶

Weekly¶

[ ] Check Uptime Kuma dashboard — all monitors green
[ ] Review ntfy alert history — any unexpected alerts
[ ] Spot-check Dozzle for container error logs

Monthly (1st Sunday)¶

[ ] Run backup-verify.sh on Docker VM
[ ] Check Grafana dashboards — disk usage trends, RAM pressure
[ ] Verify vmagent scrape targets are all up (/targets endpoint)
[ ] Review Watchtower update logs
[ ] Check NAS Purple 2TB usage (97% — monitor closely)

Quarterly¶

[ ] Full backup restore drill (backup-verify.sh --full)
[ ] Review and update Uptime Kuma monitors for new/removed services
[ ] Test ntfy notification delivery (all priority levels)
[ ] Review VictoriaMetrics retention and disk usage