NZ Herald
  • Home
  • Latest news
  • Herald NOW
  • Video
  • New Zealand
  • Sport
  • World
  • Business
  • Entertainment
  • Podcasts
  • Quizzes
  • Opinion
  • Lifestyle
  • Travel
  • Viva
  • Weather

Subscriptions

  • Herald Premium
  • Viva Premium
  • The Listener
  • BusinessDesk

Sections

  • Latest news
  • New Zealand
    • All New Zealand
    • Crime
    • Politics
    • Education
    • Open Justice
    • Scam Update
  • Herald NOW
  • On The Up
  • World
    • All World
    • Australia
    • Asia
    • UK
    • United States
    • Middle East
    • Europe
    • Pacific
  • Business
    • All Business
    • MarketsSharesCurrencyCommoditiesStock TakesCrypto
    • Markets with Madison
    • Media Insider
    • Business analysis
    • Personal financeKiwiSaverInterest ratesTaxInvestment
    • EconomyInflationGDPOfficial cash rateEmployment
    • Small business
    • Business reportsMood of the BoardroomProject AucklandSustainable business and financeCapital markets reportAgribusiness reportInfrastructure reportDynamic business
    • Deloitte Top 200 Awards
    • CompaniesAged CareAgribusinessAirlinesBanking and financeConstructionEnergyFreight and logisticsHealthcareManufacturingMedia and MarketingRetailTelecommunicationsTourism
  • Opinion
    • All Opinion
    • Analysis
    • Editorials
    • Business analysis
    • Premium opinion
    • Letters to the editor
  • Politics
  • Sport
    • All Sport
    • OlympicsParalympics
    • RugbySuper RugbyNPCAll BlacksBlack FernsRugby sevensSchool rugby
    • CricketBlack CapsWhite Ferns
    • Racing
    • NetballSilver Ferns
    • LeagueWarriorsNRL
    • FootballWellington PhoenixAuckland FCAll WhitesFootball FernsEnglish Premier League
    • GolfNZ Open
    • MotorsportFormula 1
    • Boxing
    • UFC
    • BasketballNBABreakersTall BlacksTall Ferns
    • Tennis
    • Cycling
    • Athletics
    • SailingAmerica's CupSailGP
    • Rowing
  • Lifestyle
    • All Lifestyle
    • Viva - Food, fashion & beauty
    • Society Insider
    • Royals
    • Sex & relationships
    • Food & drinkRecipesRecipe collectionsRestaurant reviewsRestaurant bookings
    • Health & wellbeing
    • Fashion & beauty
    • Pets & animals
    • The Selection - Shop the trendsShop fashionShop beautyShop entertainmentShop giftsShop home & living
    • Milford's Investing Place
  • Entertainment
    • All Entertainment
    • TV
    • MoviesMovie reviews
    • MusicMusic reviews
    • BooksBook reviews
    • Culture
    • ReviewsBook reviewsMovie reviewsMusic reviewsRestaurant reviews
  • Travel
    • All Travel
    • News
    • New ZealandNorthlandAucklandWellingtonCanterburyOtago / QueenstownNelson-TasmanBest NZ beaches
    • International travelAustraliaPacific IslandsEuropeUKUSAAfricaAsia
    • Rail holidays
    • Cruise holidays
    • Ski holidays
    • Luxury travel
    • Adventure travel
  • Kāhu Māori news
  • Environment
    • All Environment
    • Our Green Future
  • Talanoa Pacific news
  • Property
    • All Property
    • Property Insider
    • Interest rates tracker
    • Residential property listings
    • Commercial property listings
  • Health
  • Technology
    • All Technology
    • AI
    • Social media
  • Rural
    • All Rural
    • Dairy farming
    • Sheep & beef farming
    • Horticulture
    • Animal health
    • Rural business
    • Rural life
    • Rural technology
    • Opinion
    • Audio & podcasts
  • Weather forecasts
    • All Weather forecasts
    • Kaitaia
    • Whangārei
    • Dargaville
    • Auckland
    • Thames
    • Tauranga
    • Hamilton
    • Whakatāne
    • Rotorua
    • Tokoroa
    • Te Kuiti
    • Taumaranui
    • Taupō
    • Gisborne
    • New Plymouth
    • Napier
    • Hastings
    • Dannevirke
    • Whanganui
    • Palmerston North
    • Levin
    • Paraparaumu
    • Masterton
    • Wellington
    • Motueka
    • Nelson
    • Blenheim
    • Westport
    • Reefton
    • Kaikōura
    • Greymouth
    • Hokitika
    • Christchurch
    • Ashburton
    • Timaru
    • Wānaka
    • Oamaru
    • Queenstown
    • Dunedin
    • Gore
    • Invercargill
  • Meet the journalists
  • Promotions & competitions
  • OneRoof property listings
  • Driven car news

Puzzles & Quizzes

  • Puzzles
    • All Puzzles
    • Sudoku
    • Code Cracker
    • Crosswords
    • Cryptic crossword
    • Wordsearch
  • Quizzes
    • All Quizzes
    • Morning quiz
    • Afternoon quiz
    • Sports quiz

Regions

  • Northland
    • All Northland
    • Far North
    • Kaitaia
    • Kerikeri
    • Kaikohe
    • Bay of Islands
    • Whangarei
    • Dargaville
    • Kaipara
    • Mangawhai
  • Auckland
  • Waikato
    • All Waikato
    • Hamilton
    • Coromandel & Hauraki
    • Matamata & Piako
    • Cambridge
    • Te Awamutu
    • Tokoroa & South Waikato
    • Taupō & Tūrangi
  • Bay of Plenty
    • All Bay of Plenty
    • Katikati
    • Tauranga
    • Mount Maunganui
    • Pāpāmoa
    • Te Puke
    • Whakatāne
  • Rotorua
  • Hawke's Bay
    • All Hawke's Bay
    • Napier
    • Hastings
    • Havelock North
    • Central Hawke's Bay
    • Wairoa
  • Taranaki
    • All Taranaki
    • Stratford
    • New Plymouth
    • Hāwera
  • Manawatū - Whanganui
    • All Manawatū - Whanganui
    • Whanganui
    • Palmerston North
    • Manawatū
    • Tararua
    • Horowhenua
  • Wellington
    • All Wellington
    • Kapiti
    • Wairarapa
    • Upper Hutt
    • Lower Hutt
  • Nelson & Tasman
    • All Nelson & Tasman
    • Motueka
    • Nelson
    • Tasman
  • Marlborough
  • West Coast
  • Canterbury
    • All Canterbury
    • Kaikōura
    • Christchurch
    • Ashburton
    • Timaru
  • Otago
    • All Otago
    • Oamaru
    • Dunedin
    • Balclutha
    • Alexandra
    • Queenstown
    • Wanaka
  • Southland
    • All Southland
    • Invercargill
    • Gore
    • Stewart Island
  • Gisborne

Media

  • Video
    • All Video
    • NZ news video
    • Herald NOW
    • Business news video
    • Politics news video
    • Sport video
    • World news video
    • Lifestyle video
    • Entertainment video
    • Travel video
    • Markets with Madison
    • Kea Kids news
  • Podcasts
    • All Podcasts
    • The Front Page
    • On the Tiles
    • Ask me Anything
    • The Little Things
  • Cartoons
  • Photo galleries
  • Today's Paper - E-editions
  • Photo sales
  • Classifieds

NZME Network

  • Advertise with NZME
  • OneRoof
  • Driven Car Guide
  • BusinessDesk
  • Newstalk ZB
  • Sunlive
  • ZM
  • The Hits
  • Coast
  • Radio Hauraki
  • The Alternative Commentary Collective
  • Gold
  • Flava
  • iHeart Radio
  • Hokonui
  • Radio Wanaka
  • iHeartCountry New Zealand
  • Restaurant Hub
  • NZME Events

SubscribeSign In
Advertisement
Advertise with NZME.
Home / Business

Facebook explains goof-up that caused massive global outage

Chris Keall
By Chris Keall
Technology Editor/Senior Business Writer·NZ Herald·
5 Oct, 2021 10:26 PM7 mins to read

Subscribe to listen

Access to Herald Premium articles require a Premium subscription. Subscribe now to listen.
Already a subscriber?  Sign in here

Listening to articles is free for open-access content—explore other articles or learn more about text-to-speech.
‌
Save

    Share this article

Inage / Facebook

Inage / Facebook

Facebook has confirmed reports that its massive outage yesterday was self-inflicted rather than the result of any cyber attack.

The social network says a mistake made during "routine maintenance" on its servers led to the catastrophic outage, which saw Facebook, and Facebook-owned Instagram and WhatsApp offline for around half a day.

A command was issued that was supposed to assess the capacity on the backbone network that Facebook uses to link its giant data centres.

But it "unintentionally took down all the connections in our backbone network, effectively disconnecting Facebook data centers globally," Facebook vice-president for infrastructure Santosh Janardhan said in a blog posted this morning NZT.

Advertisement
Advertise with NZME.
Advertisement
Advertise with NZME.

"Our systems are designed to audit commands like these to prevent mistakes like this, but a bug in that audit tool prevented it from properly stopping the command."

Safeguards backfire - or at least slow recovery efforts

And ironically, although no cyber attack was involved, safeguards put in place to prevent one slowed recovery efforts.

"We've done extensive work hardening our systems to prevent unauthorised access, and it was interesting to see how that hardening slowed us down as we tried to recover from an outage caused not by malicious activity, but an error of our own making," Janardhan said.

"I believe a tradeoff like this is worth it - greatly increased day-to-day security vs. a slower recovery from a hopefully rare event like this. From here on out, our job is to strengthen our testing, drills, and overall resilience to make sure events like this happen as rarely as possible."

Advertisement
Advertise with NZME.

The tight lockdown safeguards not only slowed recovery efforts but, according to a New York Times report, meant rank-and-file staff could not even use their swipe cards to enter Facebook offices.

Janardhan's full post:

"Now that our platforms are up and running as usual after yesterday's outage, I thought it would be worth sharing a little more detail on what happened and why — and most importantly, how we're learning from it.

This outage was triggered by the system that manages our global backbone network capacity. The backbone is the network Facebook has built to connect all our computing facilities together, which consists of tens of thousands of miles of fiber-optic cables crossing the globe and linking all our data centers.

Discover more

Business

Call for change as Facebook NZ files return but no financials

08 Oct 04:42 AM
Business

Money talks for Facebook but NZ Super Fund isn't speaking it

29 Jun 05:58 AM
Business

Gone in minutes, out for hours: Outage shakes Facebook

05 Oct 01:40 AM
Business

NZ Privacy Commissioner lands top UK role after Boris Johnson sign-off

26 Aug 05:14 AM

Those data centers come in different forms. Some are massive buildings that house millions of machines that store data and run the heavy computational loads that keep our platforms running, and others are smaller facilities that connect our backbone network to the broader internet and the people using our platforms.

When you open one of our apps and load up your feed or messages, the app's request for data travels from your device to the nearest facility, which then communicates directly over our backbone network to a larger data center. That's where the information needed by your app gets retrieved and processed, and sent back over the network to your phone.

The data traffic between all these computing facilities is managed by routers, which figure out where to send all the incoming and outgoing data. And in the extensive day-to-day work of maintaining this infrastructure, our engineers often need to take part of the backbone offline for maintenance — perhaps repairing a fibre line, adding more capacity, or updating the software on the router itself.

This was the source of yesterday's outage. During one of these routine maintenance jobs, a command was issued with the intention to assess the availability of global backbone capacity, which unintentionally took down all the connections in our backbone network, effectively disconnecting Facebook data centers globally. Our systems are designed to audit commands like these to prevent mistakes like this, but a bug in that audit tool prevented it from properly stopping the command.

This change caused a complete disconnection of our server connections between our data centers and the internet. And that total loss of connection caused a second issue that made things worse.

One of the jobs performed by our smaller facilities is to respond to DNS queries. DNS is the address book of the internet, enabling the simple web names we type into browsers to be translated into specific server IP addresses. Those translation queries are answered by our authoritative name servers that occupy well-known IP addresses themselves, which in turn are advertised to the rest of the internet via another protocol called the border gateway protocol (BGP).

Advertisement
Advertise with NZME.

To ensure reliable operation, our DNS servers disable those BGP advertisements if they themselves can not speak to our data centers, since this is an indication of an unhealthy network connection. In the recent outage the entire backbone was removed from operation, making these locations declare themselves unhealthy and withdraw those BGP advertisements. The end result was that our DNS servers became unreachable even though they were still operational. This made it impossible for the rest of the internet to find our servers.

All of this happened very fast. And as our engineers worked to figure out what was happening and why, they faced two large obstacles: first, it was not possible to access our data centers through our normal means because their networks were down, and second, the total loss of DNS broke many of the internal tools we'd normally use to investigate and resolve outages like this.

Our primary and out-of-band network access was down, so we sent engineers onsite to the data centers to have them debug the issue and restart the systems. But this took time, because these facilities are designed with high levels of physical and system security in mind. They're hard to get into, and once you're inside, the hardware and routers are designed to be difficult to modify even when you have physical access to them. So it took extra time to activate the secure access protocols needed to get people onsite and able to work on the servers. Only then could we confirm the issue and bring our backbone back online.

Once our backbone network connectivity was restored across our data center regions, everything came back up with it. But the problem was not over — we knew that flipping our services back on all at once could potentially cause a new round of crashes due to a surge in traffic. Individual data centers were reporting dips in power usage in the range of tens of megawatts, and suddenly reversing such a dip in power consumption could put everything from electrical systems to caches at risk.

Helpfully, this is an event we're well prepared for thanks to the "storm" drills we've been running for a long time now. In a storm exercise, we simulate a major system failure by taking a service, data center, or entire region offline, stress testing all the infrastructure and software involved. Experience from these drills gave us the confidence and experience to bring things back online and carefully manage the increasing loads. In the end, our services came back up relatively quickly without any further systemwide failures. And while we've never previously run a storm that simulated our global backbone being taken offline, we'll certainly be looking for ways to simulate events like this moving forward.

Every failure like this is an opportunity to learn and get better, and there's plenty for us to learn from this one. After every issue, small and large, we do an extensive review process to understand how we can make our systems more resilient. That process is already underway.

Advertisement
Advertise with NZME.

We've done extensive work hardening our systems to prevent unauthorized access, and it was interesting to see how that hardening slowed us down as we tried to recover from an outage caused not by malicious activity, but an error of our own making. I believe a tradeoff like this is worth it — greatly increased day-to-day security vs. a slower recovery from a hopefully rare event like this. From here on out, our job is to strengthen our testing, drills, and overall resilience to make sure events like this happen as rarely as possible."

Save

    Share this article

Latest from Business

Premium
Media InsiderUpdated

David Seymour v John Campbell: Act leader turns camera on broadcaster

22 Jun 06:15 AM
Business

$175k in costs awarded in $10 million Auckland mansion stoush

22 Jun 05:32 AM
Premium
Opinion

Liam Dann: The upside to this painfully slow economic recovery

22 Jun 05:30 AM

Audi offers a sporty spin on city driving with the A3 Sportback and S3 Sportback

sponsored
Advertisement
Advertise with NZME.

Latest from Business

Premium
David Seymour v John Campbell: Act leader turns camera on broadcaster

David Seymour v John Campbell: Act leader turns camera on broadcaster

22 Jun 06:15 AM

Campbell asks if interview is 'weaponised'; Act says it's giving viewers the full picture.

$175k in costs awarded in $10 million Auckland mansion stoush

$175k in costs awarded in $10 million Auckland mansion stoush

22 Jun 05:32 AM
Premium
Liam Dann: The upside to this painfully slow economic recovery

Liam Dann: The upside to this painfully slow economic recovery

22 Jun 05:30 AM
Premium
Property manager fined $3500 for breaching healthy homes standards

Property manager fined $3500 for breaching healthy homes standards

22 Jun 03:00 AM
Gold demand soars amid global turmoil
sponsored

Gold demand soars amid global turmoil

NZ Herald
  • About NZ Herald
  • Meet the journalists
  • Newsletters
  • Classifieds
  • Help & support
  • Contact us
  • House rules
  • Privacy Policy
  • Terms of use
  • Competition terms & conditions
  • Our use of AI
Subscriber Services
  • NZ Herald e-editions
  • Daily puzzles & quizzes
  • Manage your digital subscription
  • Manage your print subscription
  • Subscribe to the NZ Herald newspaper
  • Subscribe to Herald Premium
  • Gift a subscription
  • Subscriber FAQs
  • Subscription terms & conditions
  • Promotions and subscriber benefits
NZME Network
  • The New Zealand Herald
  • The Northland Age
  • The Northern Advocate
  • Waikato Herald
  • Bay of Plenty Times
  • Rotorua Daily Post
  • Hawke's Bay Today
  • Whanganui Chronicle
  • Viva
  • NZ Listener
  • Newstalk ZB
  • BusinessDesk
  • OneRoof
  • Driven Car Guide
  • iHeart Radio
  • Restaurant Hub
NZME
  • About NZME
  • NZME careers
  • Advertise with NZME
  • Digital self-service advertising
  • Book your classified ad
  • Photo sales
  • NZME Events
  • © Copyright 2025 NZME Publishing Limited
TOP