SYSOPERATIONAL
rtt15.5ms
loss0.00%
jitter2.62ms
in794Mb/s
out621Mb/s
sessions1,226
UTC18:07:10
packetpilot.ai _
~ $
[ post · dns ]

How to Diagnose DNS Failures Fast

When DNS breaks, everything looks broken — but the real cause is rarely obvious. This step-by-step guide takes you from "the internet is down" to root cause using nslookup, dig, and a handful of resolver checks.

$ nslookup app.example.com

Server:   192.168.1.10
Address:  192.168.1.10#53

** server can't find app.example.com: SERVFAIL

$ nslookup app.example.com 8.8.8.8

Non-authoritative answer:
Name:     app.example.com
Address:  203.0.113.42

[!] Internal resolver fails. Public resolver works.
[→] The record is fine. Your DNS server isn't.

DNS failures show up as app errors, Outlook outages, and VPN authentication failures — almost never as “DNS is broken.” The fastest diagnosis: run nslookup <hostname> against your internal resolver, then against 8.8.8.8. If internal fails but public resolves, your DNS server is the problem — not the record. If both fail, the record is wrong or the authoritative nameserver is down. This guide walks the full step-by-step workflow from complaint to root cause.


What Does DNS Resolution Actually Do?

Every time something on your network connects to a name — office.com, update.windows.com, an internal app — it makes a sequence of DNS lookups:

  1. Check the local cache (yes/no answer in microseconds)
  2. Ask the configured resolver (your domain controller, your firewall, or a public resolver)
  3. The resolver walks the chain: root → TLD → authoritative nameserver
  4. Answer comes back, gets cached for the TTL

Any one of those four steps can fail. The trick is knowing which.


Step 1: How Do You Confirm the Problem Is DNS?

Before chasing resolvers, prove the symptom is name resolution. Ping the destination by name and by IP:

# Ping by name
ping app.example.com

# Ping by IP (use one you know works, like the gateway or a known internal host)
ping 192.168.1.1

Interpreting results:

What you seeWhat it means
Name fails, IP worksDNS — your problem
Name works, IP worksNot DNS — look elsewhere
Both failNot DNS — connectivity issue, go troubleshoot that first
Name resolves but app still brokenDNS resolved, but maybe to the wrong record — keep reading

If ping <name> returns “could not find host” or “request timed out” but the IP responds, you’ve confirmed DNS. Move on.


Step 2: How Do You Check Which DNS Server the Client Is Using?

The client may be asking the wrong server. Check what resolvers it’s using:

# Windows
ipconfig /all | findstr "DNS Servers"

# Linux / Mac
cat /etc/resolv.conf

# Or, on systemd-resolved systems:
resolvectl status

You’re looking for two things:

  • Which servers are listed — the order matters; the first is tried first
  • Are they reachable? Ping each:
ping 192.168.1.10

A common failure mode: the client has a stale DHCP lease pointing at a DNS server that no longer exists, or a manual override set during a long-ago troubleshooting session. If a listed resolver doesn’t ping, that’s already the problem.

Also flush the local cache before going further — stale negative caches will lie to you:

# Windows
ipconfig /flushdns

# Linux (systemd-resolved)
sudo resolvectl flush-caches

# Mac
sudo dscacheutil -flushcache; sudo killall -HUP mDNSResponder

Step 3: How Do You Test If the Problem Is Your DNS Server or the Record?

This is the fastest way to isolate “is it the record or is it my DNS server?”

# Ask Google's resolver directly
nslookup app.example.com 8.8.8.8

# Ask Cloudflare's resolver
nslookup app.example.com 1.1.1.1

What the answers tell you:

ResultMeaning
Public resolvers answer, internal doesn’tYour internal DNS server is broken or filtering
Both return the same wrong IPThe record itself is wrong (authoritative side)
Both return SERVFAILThe authoritative nameservers are down
Internal answers, public doesn’tInternal-only record (expected behavior)
Public answers, internal returns NXDOMAINInternal resolver has a bad zone or stale forwarder config

If public works and internal doesn’t, you’ve narrowed it to your resolver. Move to Step 4.


Step 4: Use dig +trace to Find the Failing Step

dig shows you the full resolution chain. +trace makes it walk the tree manually — root, TLD, authoritative — so you see exactly which step fails.

dig +trace app.example.com

dig ships with most Linux distros and Mac. On Windows, install BIND tools or use Resolve-DnsName:

Resolve-DnsName app.example.com -DnsOnly
Resolve-DnsName app.example.com -Server 8.8.8.8

What +trace output reveals:

  • Each level (., com., example.com.) returns the nameservers for the next level
  • A timeout or REFUSED at any level pinpoints the broken layer
  • If the chain succeeds but the final A record is wrong, the authoritative server has a stale or incorrect record

If you see connection timed out; no servers could be reached partway through, that’s a network/firewall issue between you and that nameserver — not a DNS configuration problem.


Step 5: Can a Firewall Break DNS? (Port 53 and Port 853)

DNS traffic uses UDP/53 for normal queries, TCP/53 for large responses (and zone transfers), and TCP/853 for DNS-over-TLS (DoT). A firewall blocking any of these breaks resolution in ways that aren’t always obvious.

Test port reachability from the client:

# Windows — test UDP/53 to your resolver
Test-NetConnection -ComputerName 192.168.1.10 -Port 53

# Linux — TCP/53 test
nc -zv 192.168.1.10 53

# DNS-over-TLS check (used by Android, modern resolvers)
nc -zv 1.1.1.1 853

Common firewall failures:

SymptomLikely cause
Most names resolve, big ones (DNSSEC) failTCP/53 blocked outbound (large responses fall back to TCP)
Some clients work, some don’tPer-VLAN or per-host ACL blocking 53
Resolution works on Wi-Fi but not VPNSplit-tunnel or DNS leak protection routing queries oddly
Mobile devices fail, desktops workAndroid/iOS DoT (port 853) blocked

If your firewall logs show drops to UDP/53 from the affected client, the answer is in front of you.


Step 6: What If the DNS Record Itself Is Wrong?

If the record itself is wrong, no amount of resolver troubleshooting fixes it. Query the authoritative nameservers directly:

# Find the authoritative nameservers for the zone
dig NS example.com

# Query each one directly
dig @ns1.example.com app.example.com

What to check:

  • Does each authoritative nameserver return the same answer? Inconsistency points to a failed zone replication.
  • Is the TTL very long? A record with TTL 86400 means a stale answer survives in caches for a full day after you fix it.
  • Was the record recently changed? If yes, intermediate caches (your ISP’s resolver, Cloudflare’s resolver, 8.8.8.8) may still hold the old value until the TTL expires.

For zones you control: cut the TTL down (300 seconds is a common pre-change value) at least 24 hours before any planned record change. For zones you don’t control: wait, or query an authoritative server directly to bypass cache.

To see what’s currently cached at a public resolver:

dig app.example.com @8.8.8.8 +noall +answer

The TTL in the answer line shows how much time is left on the cached record.


What Are the Most Common DNS Failure Root Causes?

SymptomMost likely causeWhere to look
Name fails, IP worksResolver config or resolver itselfipconfig /all, ping the resolvers
Public works, internal doesn’tInternal DNS server / forwarderDNS server logs, forwarder config
One zone broken, others fineZone replication or stale forwarderAuthoritative servers, conditional forwarders
Works on some clients, fails on othersFirewall ACL or DHCP option mismatchFirewall logs, DHCP scope DNS option
Worked yesterday, fails todayTTL expired with bad upstream recordQuery authoritative directly
Random intermittent failuresUDP fragmentation or rate limitingTry TCP/53, check ISP rate-limit policy
App works locally, fails on VPNDNS leak / split-horizon mismatchVPN client DNS settings

The Workflow at a Glance

User reports "site/app/service is down"

├─ ping <name> vs ping <IP> → name fails, IP works?
│  └─ Yes → confirmed DNS, continue

├─ ipconfig /all → are the configured resolvers reachable?
│  └─ No → fix resolver config or DHCP

├─ nslookup <name> 8.8.8.8 → does a public resolver answer?
│  ├─ Yes → your internal resolver is the problem
│  └─ No → record itself is broken (go to authoritative)

├─ dig +trace → which level fails?
│  └─ Pinpoint the broken layer

├─ Test port 53 (UDP and TCP) → firewall blocking?
│  └─ Yes → firewall rule

└─ Query authoritative nameservers directly → record correct?
   └─ No → zone file or replication problem

DNS has fingerprints. Follow them, and you’ll have your answer before the user finishes typing the ticket.


Frequently Asked Questions

How do I tell if DNS is causing my problem? Run nslookup <failing-hostname>. If it returns an error but resolves when you add 8.8.8.8 as the server (nslookup <hostname> 8.8.8.8), your internal DNS resolver is broken. If both fail, the record or authoritative nameserver is the problem — not your resolver.

What’s the difference between NXDOMAIN and SERVFAIL? NXDOMAIN means the record doesn’t exist. SERVFAIL means the resolver couldn’t complete the lookup — often because it can’t reach the authoritative nameserver or has a misconfigured zone. NXDOMAIN is a record problem; SERVFAIL is a server or network problem.

Why does nslookup work but my browser still fails? Browsers cache DNS independently. Flush the OS DNS cache (ipconfig /flushdns on Windows, sudo resolvectl flush-caches on Linux) and clear the browser’s own DNS cache — in Chrome: chrome://net-internals/#dns. If the browser is using a different resolver than the system (some do), that’s also a cause.

How do I flush the DNS cache on Windows? Run ipconfig /flushdns from an elevated command prompt. This clears the Windows resolver cache. If the problem returns immediately after flushing, the resolver itself is returning the wrong answer — the issue is upstream of the client.

Why does DNS work for some users but not others? Different machines may be configured to use different resolvers. Run ipconfig /all (Windows) or cat /etc/resolv.conf (Linux) on both an affected and an unaffected machine and compare the “DNS Servers” entries. If they differ, the broken resolver is the cause.

[ FAQ ]

Q: How do I diagnose DNS failures on my network?

A: Start with `nslookup <hostname> <dns-server>` or `dig @<dns-server> <hostname>` to test the specific resolver. If the resolver responds but slowly (over 100ms query time in the `+stats` output), the forwarder chain is the issue. If it does not respond at all, check that UDP and TCP port 53 is not blocked by a firewall rule.

Q: Why is DNS resolution slow on my corporate network?

A: Slow DNS is almost always a misconfigured or overloaded upstream forwarder. Run `dig @<your-resolver> corp.internal +stats` to measure query time at the resolver level. If the resolver is fast but clients are slow, verify clients are actually using the correct DNS server with `ipconfig /all` on Windows or `resolvectl status` on Linux.

Q: What is the dig command to test DNS from Linux?

A: Use `dig @<nameserver-ip> <hostname> +short` for a quick answer, or `dig @<nameserver-ip> <hostname> +stats` to include query latency and full response details. Example: `dig @192.168.1.1 corp.local +stats` tests your internal resolver directly and shows TTL, answer, and query time in milliseconds.