During a regular upgrade of our control plane, several controller nodes became unresponsive. This resulted in various routing issues which were manifest by intermittent HTTP 503 errors visible in the HTTP response to some callers. Our monitoring systems alerted our Operations Engineering Team of the anomaly and, as a result, the affected cluster, located in our US Central region (Chicago), was immediately removed from production rotation. The affected cluster has been restored to full health by provisioning new cloud resources using our IaC (infrastructure as code) systems. Following this, the cluster was confirmed to be was healthy via our automated monitoring and was subsequently re-introduced into active rotation.
Posted Dec 30, 2024 - 22:32 UTC
This incident affected: International Street [Address] API (US East 1, US West 1, US Central 1), US Street [Address] API (US East 1, US West 1, US Central 1), US Extract API (US East 1, US West 1, US Central 1), US ZIP Code API (US East 1, US West 1, US Central 1), and Account Management Portal, Public Website.