MTTR · 3 min read

Incident response without VPN access: a practical guide

Your pager just went off and the VPN is down. Here is a practical runbook for getting to the affected system, gathering context, and fixing it without tunnels.

It’s 2:47 a.m. The pager goes off. You roll over, open your laptop — and the VPN won’t connect. Either the concentrator is having a moment, or your ISP is doing something creative, or the on-call playbook from 2021 still assumes you’re at the office. Here’s how to keep responding anyway.

The core problem

VPN is load-bearing for most incident response runbooks. When it’s down, you can’t reach the affected system — and frequently the affected system is what is taking the VPN down. Fixing a failed VPN concentrator while ops is paging you is the opposite of a fast recovery.

The substitute: outbound-agent access

LynxTrac (and similar outbound-tunnel tools) don’t depend on your VPN because the target’s agent is already connected outbound to a relay. You authenticate to the relay via SSO, and you get a shell or a desktop regardless of your VPN state.

Practical consequence: if your VPN is down, you can still recover services that matter.

The runbook

  1. Open the dashboard. You need monitoring data first — without context, you are flailing.
  2. Confirm the alert. Is it a real outage or a noisy monitor? Five seconds saved here costs nothing.
  3. Get a shell. Click the affected host, get a terminal. You are now as able as you would have been on the VPN.
  4. Collect before you fix. Grab logs, metrics, process tree. You will want this for the post-mortem.
  5. Act. Run your remediation. Document what you did in the session (LynxTrac auto-captures the keystrokes anyway).
  6. Verify. Monitor the host for 5 minutes after the fix — premature declaration of recovery is the leading cause of reopens.
  7. Hand off or sleep. Update the ticket, tag the on-call follow-up, go back to bed.

What to watch

If your access depends on a single relay region, a relay-region outage breaks your response. LynxTrac relays run multi-region with automatic failover, but verify this on a non-incident day with a tabletop exercise.

Also: the control plane is now part of your critical path. Treat it with the same uptime rigor you’d want for your status page.

The meta-lesson

Every piece of infrastructure in your incident response runbook is itself subject to incidents. The goal isn’t to remove dependencies — you can’t — it’s to make sure the dependencies are more reliable than what you’re responding to.

Outbound tunnels are not immune to outages. They are, empirically, much more reliable than self-hosted VPN concentrators, because the failure modes that plague concentrators (NAT traversal, IP rotation, client version drift) are simply not part of the model.

Try it yourself

LynxTrac is free forever for 2 servers — no credit card, no sales call. Start in under 2 minutes →

Related posts