During the last three days we have been observing a significant increase in CPU utilization of various application servers. As a result of this CPU utilization, we made some adjustments to the software running our proxy API servers to better watch for traffic causing the CPU spikes.
As part of this remediation effort, at 2023-11-05 00:54 UTC, a new version of our proxy API with this new behavior was deployed to a single cluster. After the “canary” release appeared to be functioning correctly according to available metrics, we rolled the change out to a second cluster roughly 15 minutes later. Finally, roughly 21 hours later, at 2023-11-05 21:45 UTC nearly, we rolled out the change to the remaining third cluster.
At 2023-11-06 01:40 UTC, we received notice from approximately four different customers of some intermittent issues wherein an HTTP request to our US Street API (https://us-street.api.smarty.com) would return an HTTP 400 response status. Within five minutes after that notice, at 2023-11-06 01:45 UTC, we rolled back the above changes to a working version of the proxy API across our entire fleet.
Since that time, the underlying bug within the proxy API introduced and originally deployed at 2023-11-05 00:54 UTC was resolved and a new version of the software that is unaffected by this issue was deployed to a single cluster today at 2023-11-06 16:50 UTC.
Additional tests and monitoring have been put in place to proactively watch for similar kinds of issues in the future.
Posted Nov 06, 2023 - 21:39 UTC
This incident affected: US Street Address API (us-east, us-central, us-west).