Root Cause Analysis of IP Address Failure
Incident Report for Smarty
One of our IP addresses became unresponsive. Investigation showed that the IP was configured properly, however the connection of the IP, to the load balancer to which it was assigned, was incomplete. This situation started when one of our nodes went offline and the IP was automatically assigned to a different node (a common and normal occurrence). When the node that went offline came back online, the IP was reassigned to the original node (also normal). While the assignment was correctly executed, the IP provider did not properly complete the routing of traffic (abnormal and unusual). As we do not control the IP routing at the provider level, we are implementing additional monitoring checks that will allow us to detect and mitigate this unusual situation in the future.
Posted Jul 12, 2023 - 14:00 UTC