Cloudflare Down Error Explained and Downdetector

Cloudflare Down error, While browsing the net on November 18, 2025, do you recall those websites that utilize Cloudflare’s services? Yes, around that date, multiple websites that use the services of Cloudflare were suddenly unreachable. Around 11:20 UTC, Cloudflare’s network started to experience a serious internal issue which triggered a wave of HTTP 5xx errors across the Internet. From your user vantage point, it appeared as though endless sites were “down” when, in fact, Cloudflare just could not relay the site’s content to you.

With the scale of the disruption, it would be considered a cyber attack, right? Well, during the first relays to the internal system, Cloudflare’s network began to experience a real disruption, which, on the surface, looked almost identical to a massive DDOS attack based on the symptom profile. But the real cause of the disruption was internal, prompted by a change to the database’s permission settings. This change had taken place earlier that same day.

What actually was the problem in this case? Cloudflare has a Bot Management system in place. Cloudflare has a feature that involves a file that contains numerous data points (referred to as “features”). This file contains information that the machine-learning algorithms utilize to determine whether incoming traffic to the system originates from a bot or a human. This feature file is updated on a real-time basis, distributed every few minutes to every machine across Cloudflare’s global network.

At 11:05 UTC, a modification pertaining to the permission settings on a ClickHouse database by Cloudflare induced a series of errors where duplicate entries were generated while creating the feature file. Consequently, the files were unexpectedly inflated in size. This became problematic when the larger files were disseminated to the servers, as the Bot Management software encountered issues reading the files and subsequently failed.

Cloudflare’s proxy engine encountered issues as the Bot Management module, a key component of Cloudflare’s fundamental traffic control system, failed. This malfunction explained the error pages that were displayed instead of the usual website content when accessing websites protected by Cloudflare.

Why the Outage Came and Went

The inconsistent nature of the outage may have been noticeable, with errors fluctuating in frequency. This progression can be attributed to the feature file being set to regenerate every five minutes. There were discrepancies between the database nodes regarding the file generation, resulting in the network configuration alternating between functional and non-functional states. Eventually, every node failed to generate the correct file, resulting in total and sustained system failure while the problem was being addressed.

By 2PM, Cloudflare engineers discovered that the configuration file for Bot Management was to blame. At 14:24 UTC, the engineers paused all file updates, automated and manual, and replaced the file with an earlier version. When the proxy was restarted, traffic was flowing in and out of the servers with the configuration file.

Approximately 14:30 UTC, Cloudflare reported the main impact as resolved. By 17:06 UTC, all ancillary services, which included internal deployments of Workers KV, Access, and Turnstile, were completely normal.

Services for Which the Impactor Was Cloudflare

The impact for which Cloudflare is the intermediary was the loss of performance and security services for several customers. The user may have faced:

  • The website was unavailable, which reported various HTTP 5xx errors.
  • Turnstile CAPTCHA not initializing, causing log in failures for multiple sites.
  • Login problems in the Cloudflare dashboard due to Turnstile failures.
  • Response times were increased as a result of overloaded monitoring systems.
  • Blocked controls on private apps.

Due to other problems, Cloudflare’s status page also went offline, fostering the misconception that the incident was an attack.

What is Downdetector

Downdetector.com” is a platform that monitors and reports outages and disruptions of websites and apps. The service collects user reports of various service errors including problems logging in, errors playing videos, and in-processing server requests. If a large number of users submit problems, Downdetector is able to identify and show outages in real time and spike their graphs accordingly. It is useful to check to see if one is the only one experiencing service issues on platforms like instagram, youtube, and video game servers, and if multiple others are having issues too, or if the service is really just operational. It is a useful tool to measure service outage magnitude and to assess if the issue is just on a users end.

Root Cause: A Small Database Change With Big Consequences

The root of the issues stemmed from how ClickHouse handled queries after the new permission model. The system creating the feature file was not filtering for the correct database, and the new permissions let it see duplicated schema, causing the file size to unexpectadily balloon. The Bot Management module had a hard cap of 200 features, but the corrupt file overstepped this limit, causing the system to be unresponsive.

The entire outage was caused by this single idealization: assuming that the metadata would always come from a single database.

Cloudflare’s Response and Future Fixes

Cloudflare recognized that this was the most critical outage on their side since 2019. As a user, it is expected that the portions of the internet impacted by Cloudflare should not experience this level of failure, as it is unacceptable.

In order to mitigate any potential reoccurrences, Cloudflare declared a series of improvement initiatives, which entail:

  • Implementation of additional protective measures during the processing of internal configuration files.
  • Development of more robust systems to deactivate problem features in a timely manner.
  • Enhancements of error management in the primary proxy systems.
  • During incidents, ensuring that debugging systems do not consume excessive CPU resources.
  • Comprehensive review of all modules that rely on configuration files to ensure that they fail safely.