DNS troubleshooting

Table of Contents

Understanding DNS Problem Types

DNS issues usually fall into a few recognizable categories:

Name not resolving at all

NXDOMAIN: the name doesn’t exist
SERVFAIL: the server failed to answer
Timeouts: no response from server/port

Wrong answer

Wrong IP / record type / old data (stale cache)
Split‑horizon configuration mistakes

Intermittent resolution

Some resolvers work, some don’t
Propagation / delegation / load balancing issues

Performance issues

Slow responses
High latency from remote resolvers

Protocol / security issues

DNSSEC validation failures
EDNS / truncation problems
TCP vs UDP issues (firewalls, MTU)

When troubleshooting, try to classify the symptom into one of these. It helps you decide which tools and checks to use.

Basic Troubleshooting Approach

A disciplined process prevents you from chasing the wrong problem:

Clarify the scope

A single record, a whole zone, or all DNS?
Only inside your network or also from the internet?
Only some clients / subnets / ISPs?

Check from multiple vantage points

Local resolver (e.g., dig example.com)
Public resolvers (e.g., dig @1.1.1.1 example.com)
Authoritative servers directly (e.g., dig @ns1.example.com example.com)

Work from the client outward

Client configuration → local resolver → upstream / recursive → authoritative

Separate layers

Name resolution vs. connectivity:

Can you resolve the name?
Can you reach the resolved IP?

Core Tools for DNS Troubleshooting

Using `dig`

dig is the primary diagnostic tool. Key patterns:

# Basic A record query using system resolver
dig example.com
# Query a specific DNS server
dig @8.8.8.8 example.com
# Query specific record type
dig example.com MX
dig example.com TXT
dig -t AAAA example.com
# Show all records in the answer section
dig example.com ANY
# Detailed info, including flags and authority
dig +multiline +nocmd +noall +answer example.com
# Show full trace from root down (similar to a resolver’s recursion path)
dig +trace example.com
# Debugging EDNS / DNSSEC behavior
dig +dnssec example.com
dig +edns=0 example.com

Key fields in dig output to read:

status: NOERROR, NXDOMAIN, SERVFAIL, etc.
flags: qr, aa (authoritative), ra (recursion available), ad (DNSSEC validated), tc (truncated)
ANSWER SECTION vs AUTHORITY SECTION vs ADDITIONAL SECTION

Using `host` and `nslookup`

These are simpler / legacy alternatives:

host example.com
host -t MX example.com
nslookup example.com
nslookup -type=TXT example.com

host is cleaner and script‑friendly; nslookup is still around but dig is preferred for serious troubleshooting.

Low-level network tools

To confirm UDP/53 and TCP/53 connectivity:

# Simple reachability (ICMP, not DNS)
ping dns-server.example.com
# Check port 53 (TCP)
ss -tulpen | grep :53
# From client, check if the port is reachable (requires appropriate tools)
nc -vz dns-server.example.com 53

Troubleshooting from the Client Side

Verify local resolver configuration

Check what DNS servers the client is using:

On most Linux systems:

View resolv.conf:

    cat /etc/resolv.conf

Check for local stub resolvers (e.g., systemd-resolved):

resolv.conf might point to 127.0.0.53
Inspect with:

      systemd-resolve --status

On NetworkManager-managed systems:

    nmcli device show | grep IP4.DNS
    nmcli general status

Common issues:

Wrong DNS server IPs (old addresses, typos)
DNS servers only reachable from some networks (VPN, split‑tunnel)
Overwritten resolv.conf by VPNs, DHCP clients, or network managers

Distinguish DNS from connectivity issues

Steps:

Test DNS directly:

   dig example.com

Test connectivity to the resolved IP:

   ping 93.184.216.34        # example IP
   curl -v http://93.184.216.34/

If DNS resolution works but the IP is unreachable, it’s not a DNS problem.

If IP connectivity is fine but name resolution fails, focus on DNS.

Test different resolvers

Compare behavior:

dig example.com                  # default resolver
dig @8.8.8.8 example.com         # Google
dig @1.1.1.1 example.com         # Cloudflare
dig @ns1.example.com example.com # authoritative

Patterns:

Works everywhere except your resolver → your recursive resolver or its policy
Works on public resolvers but not on your authoritative server → zone/config problem
Fails everywhere identically → delegation or authoritative misconfiguration

Troubleshooting Authoritative DNS (BIND / others)

Check zone loading and syntax

For BIND:

# Validate zone file
named-checkzone example.com /etc/bind/db.example.com
# Check overall configuration
named-checkconf

Typical issues named-checkzone finds:

Missing trailing dots on FQDNs
Duplicate records with incompatible types
Bad SOA serial or formatting errors
Illegal characters in names

For other DNS servers, use their equivalent validation tools or built‑in commands.

Inspect logs

Common locations (may vary by distro):

/var/log/messages
/var/log/syslog
journalctl -u named or -u bind9
For other daemons, replace with their service names.

Look for:

Zone loading errors
Permission issues with zone files
Bindings to IP/port 53 failing
DNSSEC signing errors (if enabled)

Verify that the server is listening and reachable

On the server:

ss -tulpen | grep ':53 '

Confirm:

Listening on UDP and TCP
Listening on the expected interfaces (public IP vs localhost only)

From a client:

dig @your-dns-ip example.com
dig @your-dns-ip example.com SOA

If there’s a timeout:

Check firewalls (host and network)
Confirm no other service grabbed port 53

Check recursion and access control

If your server is both recursive and authoritative (or you rely on its recursion):

Examine ACLs such as:

allow-query
allow-recursion
allow-query-cache

Symptoms:

Authoritative data answers but recursive queries time out or are refused
Works from inside your LAN but not from outside (or the reverse)

Use:

dig @your-dns-ip external-name.com

If you get REFUSED or SERVFAIL, ACLs or forwarding may be involved.

Troubleshooting DNS Delegation and Public Zones

Validate NS records and delegation chain

Check the domain’s NS records as seen by the world:

   dig example.com NS

Check at the parent zone (e.g., TLD):

   dig com NS example.com      # incorrect; "com" isn't queried this way
   dig +trace example.com      # better: show delegation chain

With +trace, inspect:

NS records provided by the TLD
Whether they match your intended authoritative servers
Any missing or dead authoritative servers

Common mistakes:

NS in parent zone does not match NS in your zone
Authoritative servers not reachable (firewall, routing)
Glue record problems (A records for NS missing at parent)

Glue records and in-zone nameservers

If your NS records point to names inside the same zone:

Example:

Zone: example.com
NS: ns1.example.com, ns2.example.com

Then the parent zone (e.g., .com) must hold glue A/AAAA records for ns1.example.com and ns2.example.com.

Troubleshoot:

dig +trace ns1.example.com
dig com NS example.com   # via whois / TLD-specific tools or web-based checkers

Watch for:

Missing or incorrect glue IPs
Glue out of sync with your A/AAAA records

Verifying SOA and serials

Check SOA:

dig example.com SOA

Important fields:

Primary nameserver
Contact email (with . instead of @)
Serial number (used for slave synchronization)

If you use secondary (slave) servers:

Ensure you increment serial numbers on zone changes.
On slaves, ensure zone transfers succeed and logs don’t show transfer failures.

Check from each authoritative server:

dig @ns1.example.com example.com SOA
dig @ns2.example.com example.com SOA

Verify the serial is identical; if not, troubleshoot zone transfers.

Troubleshooting Caching, TTL, and Stale Records

Understanding TTL behavior

Each record has a TTL. Caches keep the record up to that TTL.

Common issues:

You changed a record, but some clients still see old IPs.
Different resolvers show different answers due to cache aging.

Use:

dig example.com A
dig example.com A +trace
dig @resolver-ip example.com A

Note the TTL column in the answer. To accelerate changes in the future, use a lower TTL (before planned migrations).

Forcing cache bypass

Some resolvers can bypass cache; with plain dig you cannot force upstream servers to ignore their cache, but you can:

Query different resolvers.
Use +trace to query the authoritative servers directly.
Query your own authoritative nameserver explicitly:

  dig @ns1.example.com example.com A

If authoritative shows the new value but some resolvers still have the old one and TTL hasn’t expired yet, you must wait; there is no way to “pull back” cached answers.

Negative caching (NXDOMAIN)

Nonexistent domain responses can be cached, too.

Check:

dig nonexistent.example.com
dig nonexistent.example.com SOA

The SOA’s minimum field or TTL in the negative response controls how long NXDOMAIN is cached.

If you later create that record, some caches will still respond NXDOMAIN until their negative cache expires.

DNSSEC Troubleshooting Basics

Recognizing DNSSEC issues

Signs:

Some resolvers return SERVFAIL, others work.
dig +dnssec shows status: SERVFAIL and no ad (Authenticated Data) bit.

Check DNSSEC-specific fields:

dig +dnssec example.com
dig +trace +dnssec example.com

Look for:

Presence of RRSIG, DNSKEY, DS records
Inconsistent signatures between authoritative servers

Common DNSSEC misconfigurations

DS record at parent does not match your zone’s DNSKEY (key rollover problems)
Expired signatures (RRSIGs beyond their validity periods)
Signed zone served by some nameservers but not others

Troubleshooting steps:

Compare DNSKEY and DS:

   dig example.com DNSKEY
   dig com DS example.com  # often best checked via online tools or TLD utilities

Validate with external online validators (useful to confirm if the chain validates).
Ensure all authoritative servers serve the same, correctly signed zone.

Fixing DNSSEC issues often involves:

Regenerating/signing the zone
Updating DS at the registrar
Coordinating key rollovers carefully

Split-Horizon and Internal vs External DNS

Split-horizon (different answers depending on client location) is a common source of confusion.

Typical patterns:

Inside the network, name resolves to internal IPs; outside, to public IPs.
VPN users get different DNS servers and see different data.

Troubleshooting tips:

Always note from where you’re testing.
Test both “views” explicitly:

  dig @internal-dns example.com
  dig @external-dns example.com

Verify each view has a complete, correct set of records (especially SOA and NS).
Check that unintended overlap does not leak internal names externally.

Reverse DNS (PTR) Troubleshooting

Reverse lookups use PTR records in in-addr.arpa (IPv4) and ip6.arpa (IPv6) zones.

Common issues:

Missing PTR records cause services (especially mail servers) to distrust your IP.
Reverse doesn’t match forward lookup.

Check:

# Forward lookup
dig mail.example.com A
# Reverse lookup
dig -x 203.0.113.10

For public IPs, reverse zones are usually controlled by the ISP or hosting provider; you often must configure PTRs via their portal, not your own DNS server.

Ensure:

The PTR points to the correct hostname.
The hostname’s A/AAAA points back to the same IP (forward-confirmed reverse mapping when required, e.g., for mail).

Performance and Load-related DNS Problems

Measuring DNS latency

Use +stats and query times:

dig example.com +stats

Look at:

Query time: X msec
Compare between resolvers and from various locations.

If authoritative DNS is slow:

Check server load, I/O, network congestion.
Verify no external amplification/DoS is overwhelming port 53.

Amplification and rate limiting

DNS can be abused for amplification attacks. Countermeasures (like Response Rate Limiting, RRL) can introduce:

Intermittent failures for heavy clients
Truncated responses

Look for:

Logs indicating RRL limiting
Large responses that trigger truncation (tc flag set)

If responses are frequently truncated:

Ensure TCP port 53 is open as fallback.
Consider reducing response size (e.g., minimal responses, DNSSEC considerations).

Firewall and Network-related DNS Issues

Firewalls blocking or altering DNS

Symptoms:

Timeouts contacting certain DNS servers
UDP works but TCP fails (or vice versa)
Large responses fail due to MTU/fragmentation issues

Checks:

Host firewalls (iptables, nftables, firewalld, ufw) allow:

UDP/53
TCP/53

Network firewalls and routers do not:

Drop fragmented UDP packets
Perform unwanted protocol inspection

Use packet captures:

tcpdump -ni any port 53

Analyze:

Are queries leaving, and responses returning?
Are responses larger than MTU, getting fragmented or dropped?

ISP / network interception

Some ISPs:

Redirect port 53 to their own resolvers.
Block external DNS over port 53.

If you see unexpected answers from a different resolver than you queried:

Confirm the SERVER line in dig output.
Try alternative transports (DNS over TLS/HTTPS, if appropriate).
Use non-standard port temporarily (for testing only; not a general solution).

Systematic Checklist for DNS Troubleshooting

When facing a DNS issue for a specific name:

From affected client:

dig name
Check status, answer, and server used.
If failure, try dig @8.8.8.8 name.

From a different network or public DNS checker:

Confirm whether the issue is global or local.

From an admin host or the DNS server itself:

Query authoritative server:

dig @ns1.example.com name ANY

If authoritative fails → check zone files, logs, and listening ports.

Check delegation:

dig +trace name
Verify NS and glue are correct and reachable.

Check caching / TTL:

Compare authoritative vs cached answers.
Note TTL and whether caches still serve old data.

If DNSSEC is enabled:

dig +dnssec name
Verify RRSIG, DNSKEY, DS; look for SERVFAIL on validating resolvers.

Confirm network path:

Ensure UDP/TCP 53 is allowed end‑to‑end.
Use tcpdump/ss/firewall tools to verify.

Document findings and changes:

Record serial numbers, TLD updates, and TTLs.
Log when changes were made, to correlate with cache expiry.

Following this structured approach helps isolate whether the problem is client configuration, local resolver, authoritative configuration, delegation, DNSSEC, or underlying network transport.

Comments

Please login to add a comment.

Don't have an account? Register now!

DNS troubleshooting

Understanding DNS Problem Types

Basic Troubleshooting Approach

Core Tools for DNS Troubleshooting

Using `dig`

Using `host` and `nslookup`

Low-level network tools

Troubleshooting from the Client Side

Verify local resolver configuration

Distinguish DNS from connectivity issues

Test different resolvers

Troubleshooting Authoritative DNS (BIND / others)

Check zone loading and syntax

Inspect logs

Verify that the server is listening and reachable

Check recursion and access control

Troubleshooting DNS Delegation and Public Zones

Validate NS records and delegation chain

Glue records and in-zone nameservers

Verifying SOA and serials

Troubleshooting Caching, TTL, and Stale Records

Understanding TTL behavior

Forcing cache bypass

Negative caching (NXDOMAIN)

DNSSEC Troubleshooting Basics

Recognizing DNSSEC issues

Common DNSSEC misconfigurations

Split-Horizon and Internal vs External DNS

Reverse DNS (PTR) Troubleshooting

Performance and Load-related DNS Problems

Measuring DNS latency

Amplification and rate limiting

Firewall and Network-related DNS Issues

Firewalls blocking or altering DNS

ISP / network interception

Systematic Checklist for DNS Troubleshooting

Comments

Where to Move