July 3, 2025

Heathrow shutdown: Resilience by design — or by assumption?

The system didn’t fail because something unexpected happened. It failed because what was expected wasn’t managed.

Heathrow shutdown: Resilience by design — or by assumption?
Aerial image of SGT3 in flames, captured by London Fire Brigade drone at 02:22 on 21 March 2025. Source: NESO – North Hyde Review Final Report.

Based on the NESO–North Hyde Review Final Report, this article offers a reflection on what the incident reveals about infrastructure governance, risk culture, and resilience. What follows is not a technical breakdown — but a systemic reading of a failure that was neither sudden nor unpredictable.


The myth of resilient infrastructure design

We often speak of critical infrastructure as if it were inherently robust — engineered, regulated, protected. But when you look closely at how the UK electricity system actually works, a different picture emerges: one of technical complexity layered over institutional fragmentation.

The transmission-distribution hierarchy — high-voltage networks akin to motorways, low-voltage circuits resembling local roads — is built for flow, not friction. Substations act as crossroads. Transformers as gatekeepers. In theory, the design allows for redundancy. It should absorb faults, redirect power, and avoid cascading failures.

But theory doesn’t protect us. Governance does. Culture does. Follow-through does.

At North Hyde, a critical substation serving Heathrow Airport, the system recorded a Category 1 fault risk in 2018. Moisture ingress in SGT3's bushing. Known. Documented. Unresolved. Maintenance was deferred. Fire suppression, inoperative since 2022, remained unfixed.

When SGT3 failed in March 2025, the system responded as designed. Until it didn’t. The fire spread. Redundancy collapsed. And Heathrow went dark.

This wasn’t a failure of design. It was a failure to govern risk.


When redundancy fails: Design flaws in critical infrastructure systems

North Hyde housed three supergrid transformers. Two in service. One in reserve. On paper, this looked like resilience. In practice, it was brittle.

At 23:21, SGT3 caught fire. SGT2 activated. Minutes later, SGT1 tripped — and because it shared a transmission circuit with SGT2, the backup also failed. The entire node collapsed.

71,655 customers lost power. Heathrow was paralysed. The illusion of redundancy shattered.

Even Heathrow’s internal network, with three separate supply points, proved inflexible. Switching required manual reconfiguration. There was no auto-switching. No cross-load sharing. The airport wasn’t fragile. It was rigid.

And the fire response? Crews arrived quickly but waited six hours for full access. The site wasn’t designed for coordinated emergency intervention. Documentation was delayed. Protocols were unclear.

This wasn’t an isolated accident. It was a systemic failure — tolerated, layered into infrastructure, and triggered by the absence of integrated risk thinking.


When infrastructure warnings are ignored: How risk becomes normalised

Some failures surprise. This one didn’t.

The Category 1 moisture reading from 2018 should have triggered urgent intervention. It didn’t. Internal policies existed. So did monitoring tools. But enforcement was inconsistent, and action was delayed.

A 2020 policy update clarified procedures. But no retrospective review was ordered. In 2024, scheduled maintenance was postponed. The risk signal had been there for six years. The governance to act on it had not.

What failed wasn’t the technology. It was the institutional will to connect insight with intervention.

Resilience isn’t just detecting faults. It’s acting on them. A system that flags risk but doesn’t respond isn’t resilient. It’s performative.


Compliance without strategy: A hidden risk in critical infrastructure

From a distance, the system held. No voltage deviation. No frequency anomaly. No national alarms.

Technically, the UK grid passed the test. But that test — what we chose to measure — may have been too narrow.

National Grid complied with SQSS. SSEN followed Engineering Recommendation P2. Rules were respected. Risk was not managed.

Each actor followed its incident plan. But there was no shared risk dashboard. No unified emergency governance. No central authority.

The legal framework dates back to 1974 and 1989. Updated last in 2002. Built for a different era. In today’s interconnected world, this patchwork of legacy regulation cannot match the complexity of modern risk.

Resilience demands more than compliance. It demands accountability, coordination, and standards that reflect real-world consequences — not just megawatt thresholds.


The limits of airport resilience: When planning isn’t enough

Heathrow was not unprepared. It had three power feeds. On-site engineers. Standby generation.

But when one feed failed, the others couldn’t take over. Switching was manual. Response took hours. Engineers worked through the night to restore operations. They did their job. But the system had set them up to fail.

Risk scenarios had been modelled. A 33kV ring to enable internal switching had been proposed. But not implemented. The risk was known. Just not prioritised.

Worse: National energy actors didn’t know how Heathrow’s internal network functioned. There is no regulatory requirement to map demand-side criticality.

We assume everyone knows their part. But no one is assigned to connect the whole.


When infrastructure doesn’t know its users: A hidden vulnerability in critical systems

Perhaps the most revealing truth: energy operators do not know which of their customers are critical.

Critical National Infrastructure (CNI) is designated by the Cabinet Office. But energy networks are blind to CNI clients. There is no requirement to design for their resilience.

In theory, CNI sectors coordinate their own protections. In practice, this leads to fragmentation. Some sectors (like water) have strict safeguards. Others, like aviation or data, may have none.

The UK is building a CNI Knowledge Base to map these interdependencies. It remains unfinished. And voluntary.

Electricity licence conditions prohibit discrimination between users. Fair in theory. But in crisis, this means a data centre is restored like a domestic meter.

Equality in connection. Uniformity in fragility.


Beyond the blackout: The human consequences of power loss

When North Hyde failed, more than 71,000 customers were affected. Heathrow shut down. Three data centres isolated themselves to avoid collapse. Over 170 people were evacuated. But the technical recovery was swift. No deaths. No prolonged outages.

Still, the cracks were not electrical. They were human and institutional.

Critical actors received no formal communication. Data centres relied on informal networks. Transport for London had to reconstruct the incident from scratch. Hospitals saw pharmacy operations disrupted. And no one could say with certainty who qualified as a priority user.

Everyone responded. But the experience was of a system that doesn’t think like one.


Final reflection: We don’t just need resilience, we need coherence.

We often imagine resilience as a technical attribute: switches, sensors, generators. But the North Hyde incident reveals a deeper truth: Resilience is relational.
It depends on how infrastructures speak to each other.
How risk is communicated.
How systems are governed.

We didn’t just witness an electrical failure. We witnessed a governance failure. A design blind to interdependence. A policy environment that values compliance over adaptability.

Heathrow didn’t fail. It did exactly what it was designed to do. And that’s the problem.

Until we design infrastructure not just to operate, but to withstand.
Until we map not just circuits, but consequences.
Until we govern risk as a shared responsibility —

We will keep assuming resilience,
Until the next blackout proves we never had it.