Platform

From alert to action — automatically

Most tools stop at "it's down". Qualimonitor attaches the why — and, behind guardrails, the next step.

qualimonitor — alerts
api.client.com LOSS 18% 14:32 · 5/6 regions

Probable cause

Upstream transit — loss starts at hop 7 (AS26599), confirmed from 5 of 6 regions.

  • private probe: local network clean
  • route changed: Level3 → Cogent, 40 min ago
  • 3 other monitors degraded in the same AS

delivered via webhook · e-mail · status-page incident

Alert well

Webhooks, e-mail and an automatic status-page incident the second a monitor changes state. The alert carries the diagnosis with it — your team opens the message, not the laptop.

Diagnose on its own

On degradation — packet loss, latency drift, a failing check — the platform fires the full battery by itself: MTR from every region, DNS and SSL checks, and a diff against the healthy baseline. The verdict says where the problem lives: your server, your DNS or someone else's transit.

Fix on its own soon

Corrective playbooks: swap a DNS record to the standby firewall, restart a service through the private probe, call your runbook. Always behind quorum, cooldown and — if you want — a human approval button.

Run the process

Incidents open, update and resolve together with the monitor, and the status page follows on its own. Ticket integrations and escalation chains are next on the roadmap.

Guardrails by default

An automation that can touch production has to earn trust first. Every corrective action runs behind four gates: quorum (only acts when enough regions agree it is really down), anti-flapping cooldowns with an hourly action cap, a dry-run mode that shows what would happen without doing it, and a per-action audit log with optional human approval.

The enriched alert itself is a plain webhook — point it at Slack, your SIEM or your own scripts:

POST https://hooks.yourteam.com/alerts
{
  "monitor": "api.client.com",
  "event": "degraded",
  "verdict": {
    "summary": "Loss starts at hop 7 (AS26599) in 5 of 6 regions",
    "layer": "transit",
    "confidence": 0.92
  },
  "evidence": {
    "mtr_reports": 6,
    "dns": "ok",
    "ssl": "ok",
    "baseline_diff": "route_change"
  },
  "report_url": "https://app.qualimonitor.com/r/abc123"
}

Questions

FAQ

Can an automation act without a human?

Only if you configure it that way. Every corrective playbook supports an approval step — the alert arrives with an approve button, and nothing touches production until someone presses it. Fully automatic mode is opt-in, per playbook.

How do you avoid acting on a false positive?

Quorum. A corrective action only fires when the failure is confirmed from multiple regions — one probe with a bad route is a routing observation, not a trigger. You choose how many regions must agree.

What happens if the monitor flaps?

Cooldowns and an action cap: after acting, the playbook waits before it may act again, and repeated triggers escalate to humans instead of repeating the action in a loop.

What exists today and what is roadmap?

Today: webhooks, e-mail, automatic status-page incidents and multi-region diagnosis. In development: enriched verdicts with baseline diff, corrective playbooks (DNS failover, restarts via private probe) and ticket integrations. This page marks roadmap items as such.

Does this replace my RMM or my CI pipeline?

No — it feeds them. Qualimonitor is the network layer that detects, diagnoses and documents; the webhook payload is designed to be consumed by whatever already runs your operation.