Twitter has revealed that yesterday’s outage was caused by two data centre systems failing at roughly the same time.
"Data centers are designed to be redundant: when one system fails (as everything does at one time or another), a parallel system takes over," said Mazen Rawashdeh, VP of engineering. "What was noteworthy about today’s outage was the coincidental failure of two parallel systems at nearly the same time."
Describing the incident as an "infrastructural double-whammy", Rawashdeh said Twitter is now "investing aggressively in our systems to avoid this situation in the future".
The outage began at around 4:30pm UK time and last for almost two hours. Instead of the usual ‘fail whale’ image, visitors to Twitter.com were met with an error message that said "Twitter is currently down for <%=reason %>. We expect to be back in <%=deadline %>".
In June, a similar outage was attributed to a ‘cascading bug’, described by Rawashdeh as "a bug with an effect that isn’t confined to a particular software element, but rather its effect ‘cascades’ into other elements as well".