Table of Contents
Understanding DNS Problem Types
DNS issues usually fall into a few recognizable categories:
- Name not resolving at all
NXDOMAIN: the name doesn’t existSERVFAIL: the server failed to answer- Timeouts: no response from server/port
- Wrong answer
- Wrong IP / record type / old data (stale cache)
- Split‑horizon configuration mistakes
- Intermittent resolution
- Some resolvers work, some don’t
- Propagation / delegation / load balancing issues
- Performance issues
- Slow responses
- High latency from remote resolvers
- Protocol / security issues
- DNSSEC validation failures
- EDNS / truncation problems
- TCP vs UDP issues (firewalls, MTU)
When troubleshooting, try to classify the symptom into one of these. It helps you decide which tools and checks to use.
Basic Troubleshooting Approach
A disciplined process prevents you from chasing the wrong problem:
- Clarify the scope
- A single record, a whole zone, or all DNS?
- Only inside your network or also from the internet?
- Only some clients / subnets / ISPs?
- Check from multiple vantage points
- Local resolver (e.g.,
dig example.com) - Public resolvers (e.g.,
dig @1.1.1.1 example.com) - Authoritative servers directly (e.g.,
dig @ns1.example.com example.com) - Work from the client outward
- Client configuration → local resolver → upstream / recursive → authoritative
- Separate layers
- Name resolution vs. connectivity:
- Can you resolve the name?
- Can you reach the resolved IP?
Core Tools for DNS Troubleshooting
Using `dig`
dig is the primary diagnostic tool. Key patterns:
# Basic A record query using system resolver
dig example.com
# Query a specific DNS server
dig @8.8.8.8 example.com
# Query specific record type
dig example.com MX
dig example.com TXT
dig -t AAAA example.com
# Show all records in the answer section
dig example.com ANY
# Detailed info, including flags and authority
dig +multiline +nocmd +noall +answer example.com
# Show full trace from root down (similar to a resolver’s recursion path)
dig +trace example.com
# Debugging EDNS / DNSSEC behavior
dig +dnssec example.com
dig +edns=0 example.com
Key fields in dig output to read:
- status:
NOERROR,NXDOMAIN,SERVFAIL, etc. - flags:
qr,aa(authoritative),ra(recursion available),ad(DNSSEC validated),tc(truncated) - ANSWER SECTION vs AUTHORITY SECTION vs ADDITIONAL SECTION
Using `host` and `nslookup`
These are simpler / legacy alternatives:
host example.com
host -t MX example.com
nslookup example.com
nslookup -type=TXT example.com
host is cleaner and script‑friendly; nslookup is still around but dig is preferred for serious troubleshooting.
Low-level network tools
To confirm UDP/53 and TCP/53 connectivity:
# Simple reachability (ICMP, not DNS)
ping dns-server.example.com
# Check port 53 (TCP)
ss -tulpen | grep :53
# From client, check if the port is reachable (requires appropriate tools)
nc -vz dns-server.example.com 53Troubleshooting from the Client Side
Verify local resolver configuration
Check what DNS servers the client is using:
- On most Linux systems:
- View
resolv.conf:
cat /etc/resolv.conf- Check for local stub resolvers (e.g.,
systemd-resolved): resolv.confmight point to127.0.0.53- Inspect with:
systemd-resolve --status- On NetworkManager-managed systems:
nmcli device show | grep IP4.DNS
nmcli general statusCommon issues:
- Wrong DNS server IPs (old addresses, typos)
- DNS servers only reachable from some networks (VPN, split‑tunnel)
- Overwritten
resolv.confby VPNs, DHCP clients, or network managers
Distinguish DNS from connectivity issues
Steps:
- Test DNS directly:
dig example.com- Test connectivity to the resolved IP:
ping 93.184.216.34 # example IP
curl -v http://93.184.216.34/If DNS resolution works but the IP is unreachable, it’s not a DNS problem.
If IP connectivity is fine but name resolution fails, focus on DNS.
Test different resolvers
Compare behavior:
dig example.com # default resolver
dig @8.8.8.8 example.com # Google
dig @1.1.1.1 example.com # Cloudflare
dig @ns1.example.com example.com # authoritativePatterns:
- Works everywhere except your resolver → your recursive resolver or its policy
- Works on public resolvers but not on your authoritative server → zone/config problem
- Fails everywhere identically → delegation or authoritative misconfiguration
Troubleshooting Authoritative DNS (BIND / others)
Check zone loading and syntax
For BIND:
# Validate zone file
named-checkzone example.com /etc/bind/db.example.com
# Check overall configuration
named-checkconf
Typical issues named-checkzone finds:
- Missing trailing dots on FQDNs
- Duplicate records with incompatible types
- Bad SOA serial or formatting errors
- Illegal characters in names
For other DNS servers, use their equivalent validation tools or built‑in commands.
Inspect logs
Common locations (may vary by distro):
/var/log/messages/var/log/syslogjournalctl -u namedor-u bind9- For other daemons, replace with their service names.
Look for:
- Zone loading errors
- Permission issues with zone files
- Bindings to IP/port 53 failing
- DNSSEC signing errors (if enabled)
Verify that the server is listening and reachable
On the server:
ss -tulpen | grep ':53 'Confirm:
- Listening on UDP and TCP
- Listening on the expected interfaces (public IP vs localhost only)
From a client:
dig @your-dns-ip example.com
dig @your-dns-ip example.com SOAIf there’s a timeout:
- Check firewalls (host and network)
- Confirm no other service grabbed port 53
Check recursion and access control
If your server is both recursive and authoritative (or you rely on its recursion):
- Examine ACLs such as:
allow-queryallow-recursionallow-query-cache
Symptoms:
- Authoritative data answers but recursive queries time out or are refused
- Works from inside your LAN but not from outside (or the reverse)
Use:
dig @your-dns-ip external-name.com
If you get REFUSED or SERVFAIL, ACLs or forwarding may be involved.
Troubleshooting DNS Delegation and Public Zones
Validate NS records and delegation chain
- Check the domain’s NS records as seen by the world:
dig example.com NS- Check at the parent zone (e.g., TLD):
dig com NS example.com # incorrect; "com" isn't queried this way
dig +trace example.com # better: show delegation chain
With +trace, inspect:
- NS records provided by the TLD
- Whether they match your intended authoritative servers
- Any missing or dead authoritative servers
Common mistakes:
- NS in parent zone does not match NS in your zone
- Authoritative servers not reachable (firewall, routing)
- Glue record problems (A records for NS missing at parent)
Glue records and in-zone nameservers
If your NS records point to names inside the same zone:
- Example:
- Zone:
example.com - NS:
ns1.example.com,ns2.example.com
Then the parent zone (e.g., .com) must hold glue A/AAAA records for ns1.example.com and ns2.example.com.
Troubleshoot:
dig +trace ns1.example.com
dig com NS example.com # via whois / TLD-specific tools or web-based checkersWatch for:
- Missing or incorrect glue IPs
- Glue out of sync with your A/AAAA records
Verifying SOA and serials
Check SOA:
dig example.com SOAImportant fields:
- Primary nameserver
- Contact email (with
.instead of@) - Serial number (used for slave synchronization)
If you use secondary (slave) servers:
- Ensure you increment serial numbers on zone changes.
- On slaves, ensure zone transfers succeed and logs don’t show transfer failures.
Check from each authoritative server:
dig @ns1.example.com example.com SOA
dig @ns2.example.com example.com SOAVerify the serial is identical; if not, troubleshoot zone transfers.
Troubleshooting Caching, TTL, and Stale Records
Understanding TTL behavior
Each record has a TTL. Caches keep the record up to that TTL.
Common issues:
- You changed a record, but some clients still see old IPs.
- Different resolvers show different answers due to cache aging.
Use:
dig example.com A
dig example.com A +trace
dig @resolver-ip example.com A
Note the TTL column in the answer. To accelerate changes in the future, use a lower TTL (before planned migrations).
Forcing cache bypass
Some resolvers can bypass cache; with plain dig you cannot force upstream servers to ignore their cache, but you can:
- Query different resolvers.
- Use
+traceto query the authoritative servers directly. - Query your own authoritative nameserver explicitly:
dig @ns1.example.com example.com AIf authoritative shows the new value but some resolvers still have the old one and TTL hasn’t expired yet, you must wait; there is no way to “pull back” cached answers.
Negative caching (NXDOMAIN)
Nonexistent domain responses can be cached, too.
Check:
dig nonexistent.example.com
dig nonexistent.example.com SOA
The SOA’s minimum field or TTL in the negative response controls how long NXDOMAIN is cached.
If you later create that record, some caches will still respond NXDOMAIN until their negative cache expires.
DNSSEC Troubleshooting Basics
Recognizing DNSSEC issues
Signs:
- Some resolvers return SERVFAIL, others work.
dig +dnssecshowsstatus: SERVFAILand noad(Authenticated Data) bit.
Check DNSSEC-specific fields:
dig +dnssec example.com
dig +trace +dnssec example.comLook for:
- Presence of
RRSIG,DNSKEY,DSrecords - Inconsistent signatures between authoritative servers
Common DNSSEC misconfigurations
- DS record at parent does not match your zone’s DNSKEY (key rollover problems)
- Expired signatures (RRSIGs beyond their validity periods)
- Signed zone served by some nameservers but not others
Troubleshooting steps:
- Compare DNSKEY and DS:
dig example.com DNSKEY
dig com DS example.com # often best checked via online tools or TLD utilities- Validate with external online validators (useful to confirm if the chain validates).
- Ensure all authoritative servers serve the same, correctly signed zone.
Fixing DNSSEC issues often involves:
- Regenerating/signing the zone
- Updating DS at the registrar
- Coordinating key rollovers carefully
Split-Horizon and Internal vs External DNS
Split-horizon (different answers depending on client location) is a common source of confusion.
Typical patterns:
- Inside the network, name resolves to internal IPs; outside, to public IPs.
- VPN users get different DNS servers and see different data.
Troubleshooting tips:
- Always note from where you’re testing.
- Test both “views” explicitly:
dig @internal-dns example.com
dig @external-dns example.com- Verify each view has a complete, correct set of records (especially SOA and NS).
- Check that unintended overlap does not leak internal names externally.
Reverse DNS (PTR) Troubleshooting
Reverse lookups use PTR records in in-addr.arpa (IPv4) and ip6.arpa (IPv6) zones.
Common issues:
- Missing PTR records cause services (especially mail servers) to distrust your IP.
- Reverse doesn’t match forward lookup.
Check:
# Forward lookup
dig mail.example.com A
# Reverse lookup
dig -x 203.0.113.10For public IPs, reverse zones are usually controlled by the ISP or hosting provider; you often must configure PTRs via their portal, not your own DNS server.
Ensure:
- The PTR points to the correct hostname.
- The hostname’s A/AAAA points back to the same IP (forward-confirmed reverse mapping when required, e.g., for mail).
Performance and Load-related DNS Problems
Measuring DNS latency
Use +stats and query times:
dig example.com +statsLook at:
Query time: X msec- Compare between resolvers and from various locations.
If authoritative DNS is slow:
- Check server load, I/O, network congestion.
- Verify no external amplification/DoS is overwhelming port 53.
Amplification and rate limiting
DNS can be abused for amplification attacks. Countermeasures (like Response Rate Limiting, RRL) can introduce:
- Intermittent failures for heavy clients
- Truncated responses
Look for:
- Logs indicating RRL limiting
- Large responses that trigger truncation (
tcflag set)
If responses are frequently truncated:
- Ensure TCP port 53 is open as fallback.
- Consider reducing response size (e.g., minimal responses, DNSSEC considerations).
Firewall and Network-related DNS Issues
Firewalls blocking or altering DNS
Symptoms:
- Timeouts contacting certain DNS servers
- UDP works but TCP fails (or vice versa)
- Large responses fail due to MTU/fragmentation issues
Checks:
- Host firewalls (
iptables,nftables,firewalld,ufw) allow: - UDP/53
- TCP/53
- Network firewalls and routers do not:
- Drop fragmented UDP packets
- Perform unwanted protocol inspection
Use packet captures:
tcpdump -ni any port 53Analyze:
- Are queries leaving, and responses returning?
- Are responses larger than MTU, getting fragmented or dropped?
ISP / network interception
Some ISPs:
- Redirect port 53 to their own resolvers.
- Block external DNS over port 53.
If you see unexpected answers from a different resolver than you queried:
- Confirm the SERVER line in
digoutput. - Try alternative transports (DNS over TLS/HTTPS, if appropriate).
- Use non-standard port temporarily (for testing only; not a general solution).
Systematic Checklist for DNS Troubleshooting
When facing a DNS issue for a specific name:
- From affected client:
dig name- Check status, answer, and server used.
- If failure, try
dig @8.8.8.8 name. - From a different network or public DNS checker:
- Confirm whether the issue is global or local.
- From an admin host or the DNS server itself:
- Query authoritative server:
dig @ns1.example.com name ANY- If authoritative fails → check zone files, logs, and listening ports.
- Check delegation:
dig +trace name- Verify NS and glue are correct and reachable.
- Check caching / TTL:
- Compare authoritative vs cached answers.
- Note TTL and whether caches still serve old data.
- If DNSSEC is enabled:
dig +dnssec name- Verify RRSIG, DNSKEY, DS; look for SERVFAIL on validating resolvers.
- Confirm network path:
- Ensure UDP/TCP 53 is allowed end‑to‑end.
- Use
tcpdump/ss/firewall tools to verify. - Document findings and changes:
- Record serial numbers, TLD updates, and TTLs.
- Log when changes were made, to correlate with cache expiry.
Following this structured approach helps isolate whether the problem is client configuration, local resolver, authoritative configuration, delegation, DNSSEC, or underlying network transport.