Delta Down and Cloud Outages…What Happened?

Delta Airlines Cloud Down

Getting all parts of the airline back to normal from the glitch.

 

On Aug 8, 2016 Delta Airlines experienced an extended six-hour outage, which some analysts and journalists falsely attributed to flaws associated with aging technology – i.e., TPF running on mainframe systems.

This false trope failed to identify the real culprits, which are exposures to all IT solutions – in public and private clouds and even more so in non-mainframe environments. IT executives should not alter their views about certain technologies or decision-making based upon biases and false reporting of facts. Nor should business or IT executives assume that outages will become history if they move to the cloud.

Normally I do not write about a single system outage that impacts a company but the Delta downtime became a cause célèbre.

On Aug. 8th at 2:38 AM EDT a power outage hit the Delta data center, which caused a global system failure that lasted six hours before business could begin to go back to normal. Tens of thousands of passengers were stranded around the world and all systems – check-in, flight scheduling and departures, airport screens, reservations, websites, etc. – were affected by the meltdown.

Getting all parts of the airline back to normal from the glitch and all passengers to their ultimate destinations actually took days as hundreds of Delta flights and flight crews were out of position post recovery.

So who’s to blame?

The True Story

The initial report that a power outage was the culprit was partially correct. According to Delta’s COO, “a critical power control module at our Technology Command Center malfunctioned, causing a surge to the transformer and a loss of power. When this happened, critical systems and network equipment didn’t switch over to backups.

Other systems did.

And now we’re seeing instability in these systems.” What the executive did not mention was that it all started when Delta’s IT staff attempted to perform a routine switch to its backup generator, which resulted in a spike that caused a fire in an Automatic Transfer Switch (ATS).

Thus, in effect what Delta and users experienced was the result of a two-step failure. First, the ATS fire and subsequent shutdown meant that a server farm of about 500 servers also closed down abnormally.

Second, Delta’s staff then executed its standard failover process and executed switchovers to the backup IT systems. But this process also failed as critical systems and network equipment did not switch over to backup power. It was determined after the fact that about 300 of the approximately 7,000 data center components (of which the TPF mainframes were a very small component) were determined to not have been configured correctly to the available backup power and therefore remained offline without power.

Even before the details of the problem were made public it was apparent that Delta’s power outage impacted only them, as there are two unique power grids feeding the site and one provider, the Atlanta utility Georgia Power, claimed it was not responsible for the failure and had not received notifications of any outages in its territory.

In fact, Delta’s passenger service system (PSS) like all major PSSs are theoretically configured with no single point of failure – from the power supply through all equipment components and databases. But in Delta’s case there was either a lack of redundancy or the backup ATS failed to kick in as expected.

Next- Outage Track Records

RELATED POSTS

AI and Web3: Unleashing the Power of Decentralized Intelligence

AI and Web3: Unleashing the Power of Decentralized Intelligence

The fundamental definitions of AI and web3 as they stand today By now you have probably heard a lot about the pros and cons of Artificial Intelligence or AI and Web3. In this article, we will explore the relationship of AI and Web3, its implications across various...

Video Gallery

Polls

Sign Up for the Latin Biz Today Newsletter

PR Newswire

Featured Authors

Innovation & Strategy

Money

Talent/HR

Legal

Marketing

Culture

Fashion

Food

Music

Sports

Work & Life

Mindfulness

Health & Fitness

Travel & Destinations

Personal Blogs

Pin It on Pinterest