A mistaken peering advertisement from a European network took Google Cloud’s europe-west1 region offline last week for around 70 minutes.
The slip-up happened when an unnamed network owner connected a new peering link to Google, and in the process, it advertised reachability for far more traffic than it could handle.
As a result, as Google explains in this post, most of the lost traffic carried destination addresses in eastern Europe and the Middle East.
“The peer’s network signalled that it could route traffic to many more destinations than Google engineers had anticipated, and more than the link had capacity for. Google’s network responded accordingly by routing a large volume of traffic to the link. At 11:55, the link saturated and began dropping the majority of its traffic”, Google says.
That kind of error, Google’s report mentions, would usually be checked by automated safety checks, but “the automation was not operational due to an unrelated failure, and the link was brought online manually, so the automation’s safety checks did not occur”.
“To prevent recurrence of this issue, Google network engineers are changing procedure to disallow manual link activation”, the post notes.
Route announcement errors are a continuing problem on the Internet.