DevOps2026-04-1415 min read

DNS Not Resolving: Troubleshoot and Fix DNS Failures Step by Step

Q: My Kubernetes pods cannot resolve external domains. What do I check first?

First, verify CoreDNS pods are running: kubectl get pods -n kube-system -l k8s-app=kube-dns. If they are crashing, check logs with kubectl logs. If CoreDNS is healthy, check the pod's /etc/resolv.conf to make sure it points to the CoreDNS ClusterIP. Also check ndots:5 setting, which causes Kubernetes to append search domains before trying the bare domain, leading to slow or failed lookups for external names.

You type a domain name into your browser, terminal, or application and nothing happens. No page loads. No connection is made. The name just will not resolve. DNS failures are one of the most common and most frustrating issues in infrastructure, and they can come from a dozen different places. Let us walk through every cause and fix it systematically.

TL;DR: Test with nslookup domain.com 8.8.8.8. If it works with 8.8.8.8 but not your default DNS, your local resolver is the problem. If it fails everywhere, the DNS records themselves are wrong.

Step 1: Is It Your Machine or the Domain?

Before you start changing anything, you need to figure out whether the problem is on your side or on the domain side. This is the single most important step and will save you hours of chasing the wrong thing.

Open a terminal and run these three commands:

# Test with your default resolver
nslookup example.com

# Test with Google's public DNS
nslookup example.com 8.8.8.8

# Test with Cloudflare's public DNS
nslookup example.com 1.1.1.1

If the first command fails but the second and third succeed, your local resolver is the problem. Your machine is configured to use a DNS server that is either down, misconfigured, or blocking the query. We will fix that in Step 2 and Step 3.

If all three fail, the domain itself has a DNS problem. Skip ahead to Step 4 (propagation) or Step 6 (NXDOMAIN).

You can also use dig for more detailed output:

# dig gives you the full response including TTL and flags
dig example.com A
dig example.com A @8.8.8.8

# host is the simplest tool - just gives you the answer
host example.com
host example.com 8.8.8.8

The dig output includes a status field in the header. NOERROR means the query succeeded. NXDOMAIN means the domain does not exist. SERVFAIL means the DNS server could not process the request. REFUSED means the server rejected your query. Each of these tells you something different about where the failure is happening.

Step 2: Flush Your Local DNS Cache

Your operating system caches DNS responses so it does not have to look up the same domain repeatedly. This is normally a good thing, but it becomes a problem when a DNS record changes and your machine keeps using the old, cached answer. Flushing the cache forces your machine to do a fresh lookup.

macOS

sudo dscacheutil -flushcache
sudo killall -HUP mDNSResponder

Both commands are needed on modern macOS. The first clears the DNS cache, and the second restarts the mDNS responder daemon that handles resolution.

Linux (systemd-resolved)

# Flush the cache
sudo systemd-resolve --flush-caches

# Verify it was flushed (cache size should be 0)
sudo systemd-resolve --statistics

On newer versions of Ubuntu and Fedora, you might need to use resolvectl instead:

sudo resolvectl flush-caches
resolvectl statistics

Windows

ipconfig /flushdns

You should see "Successfully flushed the DNS Resolver Cache." If flushing does not fix the issue, the problem is not in your local cache. Move on to the next step.

One thing people forget: browsers have their own DNS cache too. Chrome, for example, keeps a separate cache at chrome://net-internals/#dns. If you flushed the OS cache but the browser still shows the old result, clear the browser DNS cache as well or just test from the terminal instead.

Step 3: Check /etc/resolv.conf

On Linux systems, /etc/resolv.conf tells your machine which DNS servers to use. If this file is wrong, every DNS query from that machine will fail, regardless of whether the domain is perfectly healthy.

cat /etc/resolv.conf

A healthy resolv.conf looks something like this:

nameserver 8.8.8.8
nameserver 8.8.4.4
search example.com

Here are the common problems you will find:

Wrong nameserver IP: The nameserver line points to an IP that is not running a DNS service. This happens frequently when VMs are cloned or when DHCP leases change.
Missing nameserver entirely: If there is no nameserver line at all, your machine has nowhere to send DNS queries.
Pointing to 127.0.0.53: This is normal on systems running systemd-resolved. It is a local stub resolver that forwards queries. If resolution is failing, the problem is in the systemd-resolved configuration, not resolv.conf itself. Check /etc/systemd/resolved.conf for the upstream DNS settings.
Missing search domain: The search directive lets you resolve short hostnames. If you are trying to reach myservice instead of myservice.internal.company.com, you need the search domain configured.
File is immutable: Some systems (especially containers or cloud VMs) mark resolv.conf as immutable so it cannot be changed. Check with lsattr /etc/resolv.conf. If you see an i flag, remove it with sudo chattr -i /etc/resolv.conf before editing.

A quick fix to test whether resolv.conf is the problem:

# Temporarily override DNS to use Google
echo "nameserver 8.8.8.8" | sudo tee /etc/resolv.conf

If resolution starts working after that change, you know the original nameserver was the issue. Just be aware that NetworkManager, systemd-resolved, or DHCP may overwrite this file on the next network event. For a permanent fix, configure your DNS servers in the appropriate network manager.

Step 4: DNS Propagation

If you recently changed a DNS record and it is not resolving to the new value, you are probably dealing with DNS propagation delay. DNS is a globally distributed system, and changes do not take effect instantly everywhere.

When you update a DNS record, the old record continues to be served by caching resolvers worldwide until the TTL (Time To Live) expires. If your old record had a TTL of 3600 seconds (one hour), resolvers that cached it will keep serving the old value for up to one hour.

To check what the authoritative nameservers are returning right now:

# Find the authoritative nameservers
dig NS example.com

# Query the authoritative nameserver directly
dig example.com A @ns1.yourprovider.com

If the authoritative nameserver returns the correct record, the change is live. The delay you are seeing is just caching resolvers that have not expired their cached copy yet. You can verify worldwide propagation at whatsmydns.net, which queries DNS resolvers across dozens of locations.

Tips for avoiding propagation pain in the future:

Lower the TTL before making changes. At least 24 hours before a planned DNS change, drop the TTL to 60 or 300 seconds. This ensures that by the time you make the actual change, most resolvers have the short TTL cached and will refresh quickly.
Verify at the authoritative level first. If your authoritative nameserver has the correct record, propagation is just a matter of time.
Do not panic. Propagation can take anywhere from a few minutes to 48 hours depending on the old TTL. Most records propagate globally within an hour if the TTL was reasonable.

Step 5: Firewall Blocking Port 53

DNS uses port 53 for both TCP and UDP. If a firewall is blocking outbound traffic on port 53, DNS queries will silently fail. This is a sneaky one because the symptoms look identical to a DNS server being down.

Linux (iptables)

# Check for rules blocking port 53
sudo iptables -L -n | grep 53

# If blocked, allow DNS traffic
sudo iptables -A OUTPUT -p udp --dport 53 -j ACCEPT
sudo iptables -A OUTPUT -p tcp --dport 53 -j ACCEPT

Linux (ufw)

# Check UFW status
sudo ufw status verbose

# Allow DNS
sudo ufw allow out 53

AWS Security Groups

If you are running on AWS, check that your security group allows outbound traffic on port 53. The default security group allows all outbound traffic, but if you are using a custom, restrictive security group, DNS might be blocked. Go to EC2 > Security Groups > your SG > Outbound Rules and make sure UDP and TCP port 53 are allowed to 0.0.0.0/0 (or at least to your DNS resolver IPs).

A quick way to test if port 53 is being blocked:

# Try a TCP DNS query (easier to test than UDP)
dig example.com A @8.8.8.8 +tcp

# Or use netcat to test connectivity to port 53
nc -zv 8.8.8.8 53

If the TCP query times out but you can ping 8.8.8.8 successfully, a firewall is almost certainly blocking port 53.

Step 6: NXDOMAIN - The Domain Does Not Exist

An NXDOMAIN response means the authoritative DNS server is explicitly saying "this domain does not exist." This is different from a timeout or SERVFAIL. The server received your query and answered it clearly: no such domain.

Common causes:

Domain expired: Check the domain registration status with a Whois lookup. Expired domains stop resolving immediately. Renewal is the only fix.
Deleted DNS record: Someone deleted the A record (or whatever record type you need) from the DNS zone. Log into your DNS provider and verify the record exists.
Typo in the domain name: This is more common than anyone wants to admit. Double check spelling, especially with subdomains. api.example.com is not the same as aip.example.com.
Wrong DNS zone: If you manage multiple domains, make sure you added the record to the right zone. Adding an A record for app.example.com in the example.org zone will not work.
Registrar DNS not configured: You bought a domain but never pointed the nameservers at your DNS provider, or you moved providers and forgot to update the nameserver delegation at the registrar.

# Check if the domain exists at all
dig example.com SOA

# If you get NXDOMAIN, check whois
whois example.com | grep -i "expir"

Check Your DNS Records Instantly

Use SecureBin DNS Lookup to query A, AAAA, CNAME, MX, TXT, NS, and SOA records for any domain. Free, instant, no signup required.

Run DNS Lookup

Step 7: Docker DNS Issues

Docker containers have their own DNS resolution, and it does not always behave the way you expect. By default, Docker copies the host's /etc/resolv.conf into each container. But if the host uses 127.0.0.53 (systemd-resolved), that address is unreachable from inside the container because the container has a different network namespace.

Symptoms: your host can resolve domains fine, but containers get timeouts or SERVFAIL on every DNS query.

Quick fix: specify DNS per container

docker run --dns 8.8.8.8 --dns 8.8.4.4 myimage

Permanent fix: configure Docker daemon DNS

Edit /etc/docker/daemon.json:

{
  "dns": ["8.8.8.8", "8.8.4.4"]
}

Then restart Docker:

sudo systemctl restart docker

This sets the default DNS for all containers so you do not have to pass --dns every time.

Docker Compose

In a Compose file, you can set DNS per service:

services:
  myapp:
    image: myimage
    dns:
      - 8.8.8.8
      - 8.8.4.4

Reaching the host from a container

If your container needs to reach a service running on the Docker host (like a local database), use host.docker.internal on Docker Desktop (Mac/Windows). On Linux, you need to add --add-host=host.docker.internal:host-gateway to your run command or use the equivalent in Compose:

services:
  myapp:
    image: myimage
    extra_hosts:
      - "host.docker.internal:host-gateway"

Step 8: Kubernetes CoreDNS Issues

DNS inside Kubernetes is handled by CoreDNS, which runs as a Deployment in the kube-system namespace. When CoreDNS is unhealthy or misconfigured, every pod in the cluster loses the ability to resolve both internal service names and external domains.

Check if CoreDNS is running

# Check CoreDNS pod status
kubectl get pods -n kube-system -l k8s-app=kube-dns

# Check logs for errors
kubectl logs -n kube-system -l k8s-app=kube-dns --tail=50

If CoreDNS pods are in CrashLoopBackOff, check the logs for the specific error. Common causes include a malformed Corefile (ConfigMap coredns in kube-system), insufficient memory (CoreDNS defaults to very low limits), and upstream DNS being unreachable from the cluster network.

The ndots:5 problem

This one catches a lot of people. By default, Kubernetes sets ndots:5 in every pod's /etc/resolv.conf. This means that any domain name with fewer than 5 dots gets the search domains appended before the bare name is tried.

When a pod tries to resolve api.stripe.com (which has 2 dots, fewer than 5), Kubernetes will first try:

api.stripe.com.mynamespace.svc.cluster.local
api.stripe.com.svc.cluster.local
api.stripe.com.cluster.local
api.stripe.com. (the actual query)

That is four unnecessary DNS queries before the real one. This can cause slow resolution and, in some cases, failures if CoreDNS is overloaded. The fix is to either add a trailing dot to external domains in your application configuration (api.stripe.com.) or reduce ndots in your pod spec:

spec:
  dnsConfig:
    options:
      - name: ndots
        value: "2"

Testing DNS from inside a pod

# Spin up a debug pod
kubectl run dns-test --image=busybox --restart=Never -- sleep 3600

# Test internal service resolution
kubectl exec dns-test -- nslookup kubernetes.default.svc.cluster.local

# Test external resolution
kubectl exec dns-test -- nslookup google.com

# Check the pod's resolv.conf
kubectl exec dns-test -- cat /etc/resolv.conf

# Clean up
kubectl delete pod dns-test

If internal names resolve but external names do not, CoreDNS cannot reach upstream DNS servers. Check your cluster networking and make sure CoreDNS pods can reach the internet (or your corporate DNS) on port 53.

Pro Tip: Use dig +trace to Find Exactly Where It Breaks

dig +trace domain.com walks the entire DNS resolution path from root servers down to the authoritative nameserver, showing you exactly where the chain breaks.

This command starts at the root DNS servers (.), finds the TLD servers (.com), then finds the authoritative nameservers for your domain, and shows the response at each level. If the trace stops at a particular level or returns an unexpected result, that is where your problem is.

dig +trace example.com

The output will look something like this (abbreviated):

.                  518400  IN  NS  a.root-servers.net.
com.               172800  IN  NS  a.gtld-servers.net.
example.com.       172800  IN  NS  ns1.provider.com.
example.com.       300     IN  A   93.184.216.34

If the trace stops at the TLD level and returns NXDOMAIN, the domain is not registered or the nameserver delegation is broken at the registrar. If the trace reaches your authoritative nameserver but returns an empty answer, the record is missing from your DNS zone.

DNS Record Types: Quick Reference

Record Type	Purpose	Common Mistakes
A	Maps domain to IPv4 address	Wrong IP after server migration, forgotten update
AAAA	Maps domain to IPv6 address	Adding AAAA when your server does not support IPv6, causing dual-stack failures
CNAME	Alias one domain to another	Setting a CNAME on the root domain (not allowed by RFC), CNAME loops
MX	Mail server for the domain	Pointing MX at a CNAME instead of an A record, wrong priority order
TXT	Arbitrary text (SPF, DKIM, verification)	Duplicate SPF records, missing quotes, exceeding 255-char string limit
NS	Delegates zone to nameservers	NS records at registrar do not match NS records in zone, stale nameservers
SOA	Start of Authority, zone metadata	Serial number not incremented after zone changes, low refresh interval

Use the SecureBin DNS Lookup tool to quickly query all record types for any domain.

Prevention: Stop DNS Failures Before They Start

Fixing DNS issues reactively is painful. Here is how to prevent them proactively:

Monitor DNS with external checks

Set up external DNS monitoring that queries your domain from multiple global locations on a schedule. Services like UptimeRobot, Pingdom, or even a simple cron job running dig and alerting on failure will give you early warning before your users notice.

TTL strategy

Use a moderate TTL (300 to 3600 seconds) for records that might change. Critical records like MX should have higher TTLs (3600 to 86400) for reliability. Before any planned change, lower the TTL to 60 seconds at least 24 hours in advance so the old TTL has time to expire from caches worldwide.

Redundant nameservers

Never rely on a single nameserver. Use at least two, preferably from different providers or networks. If your primary DNS provider goes down and you only have their nameservers configured, your entire domain goes offline. Major DNS providers like Cloudflare, Route 53, and Google Cloud DNS all provide built-in redundancy, but having a secondary provider as a backup is even better for critical domains.

Automate record management

Manual DNS changes are error-prone. Use infrastructure-as-code tools like Terraform or Pulumi to manage DNS records. This gives you version control, peer review, and the ability to roll back changes if something goes wrong.

Common Mistakes That Cause DNS Failures

Forgetting to update DNS after a server migration. You move your application to a new server with a new IP address but never update the A record. Traffic keeps going to the old server until it is decommissioned, then the site goes down.
Setting a CNAME on the zone apex. The DNS specification does not allow a CNAME record on the root domain (example.com). Some providers offer workarounds (Cloudflare uses CNAME flattening, Route 53 uses ALIAS records), but a standard CNAME at the apex will break MX and other records.
Letting the domain expire. Domain registrations expire silently if auto-renew fails (expired credit card, changed email). Set up multiple notification channels and calendar reminders for domain renewals.
Editing resolv.conf on a managed system. If NetworkManager or systemd-resolved manages your DNS, editing resolv.conf directly will get overwritten. Configure DNS through the proper management tool instead.
Ignoring TTL before making changes. Changing a record that had a 86400-second TTL means up to 24 hours of some users seeing the old value and some seeing the new one. Always lower the TTL first.
Assuming DNS is instant. It is not. Even with low TTLs, propagation takes time. Plan DNS changes with enough lead time and verify propagation with tools like whatsmydns.net before declaring the change complete.

Frequently Asked Questions

Why does my DNS work with 8.8.8.8 but not my default resolver?

Your default DNS resolver (usually provided by your ISP or configured in /etc/resolv.conf) is either down, returning stale cached data, or blocking the domain. Switching to 8.8.8.8 (Google) or 1.1.1.1 (Cloudflare) bypasses your local resolver entirely. If those public resolvers work, the problem is your configured resolver, not the domain itself. Update your DNS settings to use a reliable public resolver, or contact your ISP if you suspect their DNS servers are having issues.

How long does DNS propagation take after changing records?

DNS propagation typically takes anywhere from a few minutes to 48 hours, depending on the TTL (Time To Live) value of the old record. If the previous TTL was 3600 (one hour), most resolvers worldwide will pick up the new record within one hour. If the TTL was 86400 (24 hours), it can take up to a full day. You can check propagation status using whatsmydns.net or by querying specific resolvers with dig. Pro tip: always lower the TTL before making changes so the switch happens faster.

My Kubernetes pods cannot resolve external domains. What do I check first?

First, verify CoreDNS pods are running: kubectl get pods -n kube-system -l k8s-app=kube-dns. If they are crashing, check logs with kubectl logs. If CoreDNS is healthy, check the pod's /etc/resolv.conf to make sure it points to the CoreDNS ClusterIP. Also check the ndots:5 setting, which causes Kubernetes to append search domains before trying the bare domain, leading to slow or failed lookups for external names. You can test from inside a pod with kubectl exec and nslookup to narrow down whether the issue is internal resolution, external resolution, or both.

Diagnose DNS and Security Issues

Run a DNS Lookup to query all record types, or use the Exposure Checker to scan for DNS, SSL, header, and file exposure issues across your domain.

DNS Lookup Tool Exposure Checker

Wrapping Up

DNS failures come in many flavors, but the troubleshooting process is always the same. Start by isolating whether the problem is your machine, your network, or the domain itself. Work through the steps: flush caches, check resolv.conf, verify propagation, look for firewall blocks, inspect the actual records, and if you are in Docker or Kubernetes, check the container-level DNS configuration.

The dig +trace command is your best friend for DNS debugging. It shows you the entire resolution chain from root servers to the authoritative answer, and it will point you directly at the layer where things go wrong.

Set up monitoring, keep your TTLs sensible, use redundant nameservers, and manage your records with infrastructure-as-code. Most DNS outages are preventable with a little planning.

Written by Usman Khan

DevOps Engineer | MSc Cybersecurity | CEH | AWS Solutions Architect

Usman has 10+ years of experience securing enterprise infrastructure, managing high-traffic servers, and building zero-knowledge security tools. Read more about the author.