
A change in policy can mean a previously advertised prefix is no longer advertised, known as being "withdrawn", and those IP addresses will no longer be reachable on the Internet. The end result is that any given prefixes will either be advertised or not advertised.

These policies have individual components, which are evaluated sequentially. As part of this protocol, operators define policies which decide which prefixes (a collection of adjacent IP addresses) are advertised to peers (the other networks they connect to), or accepted from peers. In order to be reachable on the Internet, networks like Cloudflare make use of a protocol called BGP. As these locations also carry a significant proportion of the Cloudflare traffic, any problem here can have a very wide impact, and unfortunately, that’s what happened today. This new architecture has provided us with significant reliability improvements, as well as allowing us to run maintenance in these locations without disrupting customer traffic. This layer is represented by the spines in the following diagram.

This mesh allows us to easily disable and enable parts of the internal network in a data center for maintenance or to deal with a problem.

In this time, we’ve converted 19 of our data centers to this architecture, internally called Multi-Colo PoP (MCP): Amsterdam, Atlanta, Ashburn, Chicago, Frankfurt, London, Los Angeles, Madrid, Manchester, Miami, Milan, Mumbai, Newark, Osaka, São Paulo, San Jose, Singapore, Sydney, Tokyo.Ī critical part of this new architecture, which is designed as a Clos network, is an added layer of routing that creates a mesh of connections. Over the last 18 months, Cloudflare has been working to convert all of our busiest locations to a more flexible and resilient architecture. This was our error and not the result of an attack or malicious activity. In other locations, Cloudflare continued to operate normally. At 06:58 UTC the first data center was brought back online and by 07:42 UTC all data centers were online and working correctly.ĭepending on your location in the world you may have been unable to access websites and services that rely on Cloudflare.

A change to the network configuration in those locations caused an outage which started at 06:27 UTC. This outage was caused by a change that was part of a long-running project to increase resilience in our busiest locations. Unfortunately, these 19 locations handle a significant proportion of our global traffic. Today, June 21, 2022, Cloudflare suffered an outage that affected traffic in 19 of our data centers. This post is also available in Deutsch, Français, 简体中文, 繁體中文, 日本語, 한국어, Español and ไทย.
