Cloudflare's 1.1.1.1 DNS Outage (14 July) - 62 Minutes of DNS Downtime
On Monday 14 July 2025, Cloudflare's 1.1.1.1 public DNS resolver experienced a complete global outage lasting 62 minutes. From 21:52 UTC to 22:54 UTC, millions of users, devices and applications that depend on 1.1.1.1 for DNS resolution were unable to access the internet.
Not slow. Not degraded. Completely unavailable.
For anyone relying solely on 1.1.1.1 as their DNS resolver, the practical impact was total. Without DNS resolution, browsers cannot find websites, APIs cannot reach their endpoints, email cannot be delivered, and applications cannot function. DNS is the foundation that everything else sits on, and when it disappears, everything built on top of it stops working.
This is the third significant Cloudflare incident we have covered this year, following the R2 object storage outage in February and the R2 credential rotation failure in March. Each incident has a different root cause, but the recurring theme is the same: relying on a single provider for critical infrastructure creates risk.
What is 1.1.1.1 and why does it matter
Cloudflare launched its 1.1.1.1 public DNS resolver in 2018. It quickly became one of the most widely used DNS services globally, second only to Google's 8.8.8.8. The service handles hundreds of billions of DNS queries daily, used by individual devices, corporate networks, routers and IoT devices worldwide.
DNS - the Domain Name System - translates human-readable domain names like theapiguys.co.uk into the IP addresses that computers use to communicate. Every time you visit a website, make an API call, send an email or connect to any internet service, a DNS lookup happens first. If your configured DNS resolver is unavailable, none of those actions can complete.
Cloudflare serves 1.1.1.1 using anycast routing, meaning the same IP address is announced from data centres around the world. Your DNS query is automatically routed to the nearest available data centre, providing both performance and redundancy. However, this approach has a critical characteristic: if the routes to those addresses are withdrawn globally, the service disappears everywhere simultaneously.
What actually happened
The root cause traces back to 6 June 2025 - over five weeks before the outage itself. On that date, Cloudflare engineers were preparing configuration for a new Data Localisation Suite (DLS) service. During this work, a configuration error was introduced that inadvertently associated the 1.1.1.1 resolver's IP prefixes with the new, non-production DLS service.
Because the DLS service was not yet active, this error had no immediate effect. No traffic was impacted, no alerts fired, and nobody noticed. The misconfiguration sat dormant in production for five weeks.
On 14 July at 21:48 UTC, engineers made a separate, routine change to attach a test location to the same DLS service. This triggered a global refresh of Cloudflare's network configuration. During that refresh, the system processed the dormant error from June and concluded that the 1.1.1.1 resolver prefixes should only be served from a single offline test location rather than from every data centre worldwide.
The result was immediate. Cloudflare's network withdrew the BGP route announcements for 1.1.1.1 from every production data centre globally. Within four minutes, DNS queries to 1.1.1.1 were failing worldwide.
The affected address ranges were extensive - not just the well-known 1.1.1.1 address, but also 1.0.0.1, the associated IPv6 addresses (2606:4700:4700::1111 and 2606:4700:4700::1001), and several other prefixes used by Cloudflare's resolver infrastructure. Notably, DNS-over-HTTPS traffic through cloudflare-dns.com was largely unaffected, as it operates under a different routing model.
The timeline
The sequence of events unfolded rapidly:
- 21:48 UTC - Engineers apply the configuration change. The dormant error causes 1.1.1.1 route withdrawals globally.
- 21:52 UTC - DNS queries to 1.1.1.1 begin failing worldwide. Users lose the ability to resolve domain names.
- 21:54 UTC - An unrelated complication: Tata Communications India (AS4755) begins advertising the now-withdrawn 1.1.1.0/24 prefix, creating what appeared to be a BGP hijack. Cloudflare later confirmed this was not the cause of the outage but an opportunistic or accidental event that became visible once Cloudflare withdrew its own routes.
- 22:01 UTC - Cloudflare's internal monitoring detects the impact and a formal incident is declared.
- 22:20 UTC - Engineers revert the configuration, re-announcing the withdrawn BGP prefixes. Approximately 77% of traffic is restored immediately.
- 22:54 UTC - The remaining 23% of edge servers complete their IP binding reconfiguration. Full service is restored.
A coincidental BGP hijack
One of the more interesting aspects of this incident was the simultaneous BGP hijack by Tata Communications. When Cloudflare withdrew its route announcements for 1.1.1.0/24, Tata's network began advertising that prefix - meaning DNS traffic intended for Cloudflare was briefly being routed to Tata's infrastructure instead.
Cloudflare has stated this was not the cause of the outage and was likely related to the historical use of the 1.1.1.0/24 prefix, which was used for testing and internal purposes by various networks long before Cloudflare adopted it in 2018. The hijack resolved itself at 22:19 UTC when Tata withdrew the announcement.
Regardless of intent, this illustrates an important point about BGP security: when legitimate route announcements are withdrawn, the door opens for other networks - intentionally or not - to fill the gap. It is a reminder that the internet's core routing infrastructure still relies heavily on trust.
The dormant configuration problem
What makes this incident particularly instructive is the five-week gap between cause and effect. The actual error was introduced on 6 June. It had no impact at the time. It passed through whatever review and deployment processes Cloudflare had in place. And then five weeks later, an unrelated change activated it.
This class of failure - a dormant misconfiguration triggered by a later, seemingly safe change - is one of the hardest to defend against. Traditional testing catches errors that have immediate effects. Code review catches errors that are visible in the change being reviewed. But an error that only manifests when combined with a future change is invisible to both.
Cloudflare's post-mortem acknowledged this directly, noting that their legacy addressing system - which relies on hard-coded lists of data centre locations for BGP announcements - lacks progressive deployment methodology. They are migrating to a newer system, but during the transition period, both old and new systems coexist, creating exactly the kind of complexity that leads to these issues.
Why fallback DNS is not optional
The most immediate practical lesson from this outage is simple: never rely on a single DNS resolver.
Organisations that had configured multiple DNS providers experienced little to no disruption during this incident. If your primary resolver is 1.1.1.1 but your secondary is Google's 8.8.8.8 or Quad9's 9.9.9.9, your devices and applications will automatically fall back to the working resolver when the primary becomes unavailable.
This applies at every level of your infrastructure:
- Server configurations - Your
/etc/resolv.conf(or equivalent) should list at least two resolvers from different providers. On Ubuntu servers, this typically means configuringsystemd-resolvedwith both a primary and fallback DNS. - Router and network equipment - Your office and data centre routers should have multiple upstream DNS resolvers configured. Many consumer and enterprise routers default to a single provider.
- Application-level DNS - If your application makes DNS lookups programmatically (as many API integrations do), ensure your DNS client library supports fallback resolvers.
- Container and orchestration platforms - Docker, Kubernetes and similar platforms have their own DNS resolution chains. Verify that these are configured with redundancy, not just pointing to a single upstream resolver.
For Laravel and API developers
If you are running Laravel applications (as we do), DNS failures can manifest in ways that are not immediately obvious. Queued jobs that make external API calls will fail and retry. Scheduled tasks that depend on external services will error out. Webhook deliveries will be unable to resolve destination hosts. Cache drivers that connect to external services will lose connectivity.
Some practical steps for your Laravel applications:
- Handle DNS failures gracefully - Wrap external API calls in appropriate try-catch blocks with meaningful error handling. A DNS failure should not crash your application; it should trigger a retry with backoff or queue the operation for later.
- Monitor your queue workers - If your queue workers are making external HTTP requests, a DNS outage will cause a spike in failed jobs. Ensure you have alerting on queue failure rates so you know quickly when something is wrong.
- Check your server DNS configuration - If you use Laravel Forge or similar deployment tools, verify what DNS resolvers are configured on your servers. Many cloud providers set their own resolvers by default, which provides some resilience, but it is worth confirming.
- Test your error handling - Deliberately configure an invalid DNS resolver on a staging server and observe how your application behaves. The results are often surprising.
The bigger picture - provider concentration risk
This is the third major Cloudflare incident we have covered in 2025. The February R2 outage was caused by a human error in admin tooling. The March R2 outage was caused by a missing deployment flag during credential rotation. This July incident was caused by a dormant configuration error in BGP route management.
Three different root causes, three different services, but one consistent lesson: concentration risk is real. If your DNS, CDN, DDoS protection, object storage and edge computing all run through a single provider, a bad day for that provider becomes a very bad day for you.
We are not suggesting you abandon Cloudflare. Their transparency through detailed post-mortems is commendable, and their services remain excellent. But building resilient infrastructure means planning for the failure of any individual component, including the providers you trust most.
Practical checklist
Based on this incident and the previous Cloudflare outages this year, here is what we recommend reviewing:
- Audit your DNS resolver configuration across all servers, containers, network equipment and developer machines. Ensure at least two resolvers from different providers are configured.
- Document your DNS dependencies - Do you know every place in your infrastructure where a DNS resolver is configured? Routers, servers, containers, CI/CD pipelines, monitoring systems - all of them need working DNS.
- Consider DNS-over-HTTPS for critical services - DoH traffic through cloudflare-dns.com was largely unaffected during this outage because it uses domain-based routing rather than direct IP address routing. For critical services, DoH or DoT can provide an additional layer of resilience.
- Monitor DNS resolution as a metric - Most monitoring stacks check whether services respond but do not monitor whether DNS resolution itself is working. Add DNS resolution checks to your monitoring.
- Review your provider concentration - List every service you consume from each infrastructure provider. If one provider appears more than three times, consider whether you have adequate fallback options.
Moving forward
Cloudflare has committed to replacing their legacy addressing system, implementing configuration validation that detects service overlap, adding canarying and progressive deployment for routing changes, and building rapid rollback capabilities. These are the right steps.
But the responsibility for resilience does not sit solely with your providers. Your infrastructure needs to be designed so that when any single provider has a bad day - and they all do eventually - your applications keep running.
If you would like help reviewing your infrastructure's DNS configuration, resilience planning, or general architecture, get in touch with our team. We are always happy to talk through these challenges.
