#01 · 2026-05-20 · 11 min · failure
A 28-minute network outage caused by a $19 unmanaged switch, a single line of "fix-it" config from 2019, and a backup job. The full post-mortem — symptoms, false leads, the tell that cracked it, the fix, and the Claude prompt that would have saved 15 minutes.
03:14 ALERT ping loss 47% · all VLANs
03:14 ALERT core cpu 98% · mac-flap 2400/s
03:31 "why is bpdufilter set on Gi1/0/14?"
03:33 RESOLVED port [BLK] · cpu 5%
[!] root cause: rogue switch + bpdufilter
#02 · 2026-05-31 · 10 min · failure
An app server lost its database the instant the DB was migrated to a new IP — and DNS pointed at the right place the entire time. A two-hour outage caused by one line in a hosts file, added as a "temporary" override in 2018. The post-mortem: the false lead, the tell, the fix, and the Claude prompt that would have ended it in five minutes.
14:02 db migrated → 10.0.6.40
14:03 app01: connection refused
nslookup db.corp → 10.0.6.40 (right!)
app still connects to 10.0.6.12
[!] hosts file pinned it since 2018
#03 · 2026-05-31 · 11 min · failure
Logins failing building-wide, services throwing errors, the identity team in a war room — and the domain controllers were up, replication was healthy. The root cause was a 47-minute clock skew on the PDC emulator and Kerberos doing exactly what it is designed to do. The post-mortem: the false lead, the tell, the fix, and the prompt that names it.
08:31 auth failing: OWA/VPN/shares
DCs up · replication healthy
every error: KRB_AP_ERR_SKEW
w32tm offset +2847s on the PDC
[!] clock drifted 47m; Kerberos >5m = no
2026-04-24 · 6 min · failure
When DNS breaks, everything looks broken — but the real cause is rarely obvious. This step-by-step guide takes you from "the internet is down" to root cause using nslookup, dig, and a handful of resolver checks.
$ dig @192.168.1.1 corp.local +stats
;; ANSWER SECTION:
corp.local. 60 IN A 10.0.4.20
;; Query time: 412 ms ← way too slow
↳ check resolver upstream chain
2026-04-24 · 8 min · failure
A saturated WAN feels like the entire network is broken — but the cause is usually one app, one host, or one runaway backup. This step-by-step guide takes you from "everything is slow" to the exact source using interface stats, NetFlow, DPI, and Wireshark.
TOP TALKERS · last 5m
10.0.4.42 480 Mb/s ████████████
10.0.4.20 96 Mb/s ███
10.0.7.11 44 Mb/s ██
→ host .42 = nightly backup window
2026-04-18 · 9 min · failure
Packet loss kills voice calls, video, and file transfers — but the cause is rarely obvious. This step-by-step guide walks you from complaint to root cause using ping, traceroute, switch commands, and Wireshark.
$ ping 192.168.1.1 -n 100
Sent=100 Received=83 Lost=17 (17%)
SW01# show interfaces Gi0/4
CRC: 3,201 ← bad cable / duplex
[!] root cause: Gi0/4 cabling
2026-04-18 · 10 min · failure
A systematic methodology for diagnosing slow networks — from end-user complaint to root cause. Covers ping, traceroute, packet capture, interface stats, and common fixes.
LAYER 1 cable/port ✓ pass
LAYER 2 switching ✓ pass
LAYER 3 routing/MTU ✗ MTU 1492
LAYER 4 TCP retrans ✗ 4.1%
LAYER 7 app — N/A