Jump to content

All Flights in US Grounded For a Few Hours After FAA Computer systems Glitch

Shreyas1

Summary

This morning the FAA grounded all flights in the US after a glitch caused the NOTAM system to fail. Although the system is back up (9AM EST) it seems that in total 4,300 flights into or out of the US were delayed. There is no evidence of a cyberattack at this point.

 

Quotes

Quote

The FAA posted an advisory notice early Wednesday that notes that the United States NOTAM (Notice to Air Missions) system “failed” but said just before 9AM ET that “normal air traffic operations are resuming gradually.”

NOTAM is a critical system that keeps pilots and other flight personnel informed of the status of airports across the country, Reuters reports. It can offer information on runway closures, bird hazards, and other obstacles.

Quote

A NOTAM is a notice to personnel containing important safety information about potential facility outages and hazards that could affect the flight. The information in a NOTAM is unclassified and is not known far enough in advance to be publicized any other way, according to a 2021 PowerPoint presentation from the FAA on the history of NOTAMs.

The construction of a NOTAM includes a specialized NOTAM number, an affected location, a keyword and the start time of the activity that could affect safety.

My thoughts

 I'm surprised there isn't more redundancy in these systems, or if there is, the issue was so big that it was able to cause such a big problem. Many of airlines must have lost a lot of money due to this error.
 

 

Sources

 https://www.washingtonpost.com/nation/2023/01/11/faa-notam-outage-flights-us/
 

https://www.theverge.com/2023/1/11/23549834/usa-flights-grounded-faa-computer-glitch

 

Spoiler
Spoiler
Spoiler
Spoiler
Spoiler
Spoiler
Spoiler
Spoiler
Spoiler
Spoiler
Spoiler
Spoiler
Spoiler
Spoiler
Spoiler
Spoiler
Spoiler
Spoiler
Spoiler
Spoiler
Spoiler

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

Its like the biggest companies make the worst crap.

-dankpods

 

 

| If someones post is helpful or solves your problem please mark it as a solution 🙂 |

I am a human that makes mistakes! If I'm wrong please correct me and tell me where I made the mistake. I try my best to be helpful.

System Specs

<Ryzen 5 3600 3.5-4.2Ghz> <Noctua NH-U12S chromax.Black> <ZOTAC RTX 2070 SUPER 8GB> <16gb 3200Mhz Crucial CL16> <DarkFlash DLM21 Mesh> <650w Corsair RMx 2018 80+ Gold> <Samsung 970 EVO 500gb NVMe> <WD blue 500gb SSD> <MSI MAG b550m Mortar> <5 Noctua P12 case fans>

Peripherals

<Lepow Portable Monitor + AOC 144hz 1080p monitor> 

<Keymove Snowfox 61m>

<Razer Mini>

Link to comment
Share on other sites

Link to post
Share on other sites

I'm always repeatedly surprised at just how many flights there are in a given day. It's amazing we're able to organize it at all.

Link to comment
Share on other sites

Link to post
Share on other sites

4 hours ago, Shreyas1 said:

 I'm surprised there isn't more redundancy in these systems, or if there is, the issue was so big that it was able to cause such a big problem. Many of airlines must have lost a lot of money due to this error.
 

 

 

Their redundancy was a phone hotline which is how the system started back in the 40s. But it was quickly overloaded and they had to shut it down.

 

Unfortunately it usually takes incidents like this for someone to realize that the system needs to be updated.

Link to comment
Share on other sites

Link to post
Share on other sites

41 minutes ago, dilpickle said:

Unfortunately it usually takes incidents like this for someone to realize that the system needs to be updated.

Given how unprecedented this is I'm sure before this incident there were a lot of other things higher up on the list to spend money on for the FAA. Makes me wonder what other systems they have that are also prone to a failure like this, at least this time the outcome wasn't really bad.

 

Spoiler
Spoiler
Spoiler
Spoiler
Spoiler
Spoiler
Spoiler
Spoiler
Spoiler
Spoiler
Spoiler
Spoiler
Spoiler
Spoiler
Spoiler
Spoiler
Spoiler
Spoiler
Spoiler
Spoiler
Spoiler

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

Sure a ton od of flights daily.

| Ryzen 7 7800X3D | AM5 B650 Aorus Elite AX | G.Skill Trident Z5 Neo RGB DDR5 32GB 6000MHz C30 | Sapphire PULSE Radeon RX 7900 XTX | Samsung 990 PRO 1TB with heatsink | Arctic Liquid Freezer II 360 | Seasonic Focus GX-850 | Lian Li Lanccool III | Mousepad: Skypad 3.0 XL / Zowie GTF-X | Mouse: Zowie S1-C | Keyboard: Ducky One 3 TKL (Cherry MX-Speed-Silver)Beyerdynamic MMX 300 (2nd Gen) | Acer XV272U | OS: Windows 11 |

Link to comment
Share on other sites

Link to post
Share on other sites

A great reason the US needs high speed rail. But we have too many can nots in our country to make it happen. 

I just want to sit back and watch the world burn. 

Link to comment
Share on other sites

Link to post
Share on other sites

On 1/11/2023 at 10:32 AM, Shreyas1 said:

NOTAM (Notice to Air Missions) system

I was confused at why the American news feeds were saying "notice to air missions" then I just found out the FAA has reinterpreted what NOTAM actually stands for.

 

The Canadian definition still calls it notice to airmen, the rest of the world probably couldn't be bothered to change it either.

 

Regardless, the problems were due to the system not taking new updates. 

It was 100% a failure of the backend systems.  They better come up with a solution to prevent further interruptions as its a really dumb reason to cancel scheduled flights. 

 

 

 

 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

Update: Turns out it was due to a corrupt file (or database entry) that was present in both the live as well as backup systems?

Quote

Officials are still trying to figure out exactly what led to the Federal Aviation Administration system outage on Wednesday but have traced it to a corrupt file, which was first reported by CNN.
"Our preliminary work has traced the outage to a damaged database file. At this time, there is no evidence of a cyberattack," the FAA said.
The FAA is still trying to determine whether any one person or "routine entry" into the database is responsible for the corrupted file, a government official familiar with the investigation into the NOTAM system outage told CNN.

FAA officials told reporters early Wednesday that the issues developed in the 3 p.m. ET hour on Tuesday.
Officials ultimately found a corrupt file in the main NOTAM system, the source told CNN. A corrupt file was also found in the backup system.

Source: https://edition.cnn.com/travel/article/faa-ground-stop-causes

"A high ideal missed by a little, is far better than low ideal that is achievable, yet far less effective"

 

If you think I'm wrong, correct me. If I've offended you in some way tell me what it is and how I can correct it. I want to learn, and along the way one can make mistakes; Being wrong helps you learn what's right.

Link to comment
Share on other sites

Link to post
Share on other sites

Sounds like ancient code running on ancient systems. If the same outdated infra is used for both the primary and backup system, then it shouldn't be suprising that both systems failed at the same time. The US government seriously needs to spend some money upgrading their terrible infra across the country instead of spending it on rubbish.

Currently working towards fully restoring my ThinkPad T510

AMD Ryzen 5 5600 -- ASUS Dual Radeon RX 6650 XT OC Edition 8GB -- Crucial Ballistix Sport LT 32GB 3200MHz DDR4 (4x8GB) -- ASRock B450M Steel Legend -- Corsair CX650M -- SAMA IM01 -- Crucial P2 500GB NVMe, 2x Crucial BX500 1TB

Link to comment
Share on other sites

Link to post
Share on other sites

On 1/11/2023 at 10:32 AM, Shreyas1 said:

 I'm surprised there isn't more redundancy in these systems, or if there is, the issue was so big that it was able to cause such a big problem. Many of airlines must have lost a lot of money due to this error.

I'm not. It would make sense, and I would certainly hope there would be more, but even Google and Facebook (and I believe Amazon) have had major, worldwide outages, which also cost massive amounts of money. Part of it is sloppiness to be sure, but part of it is that it's practically impossible to make something 100% failsafe.

Link to comment
Share on other sites

Link to post
Share on other sites

On 1/13/2023 at 12:35 PM, AyesC said:

Sounds like ancient code running on ancient systems

During the lock downs and everyone being laid off, many of the state level unemployment systems were written in COBOL and literally all had strokes due to the high demand. Considering they no longer teach COBOL, no new devs could come in to fix the issue. They had to find some old farts to do it, and thats even if they could find them. 

 

On 1/13/2023 at 12:35 PM, AyesC said:

The US government seriously needs to spend some money upgrading their terrible infra across the country instead of spending it on rubbish.

But then they couldn't spend $800+ billion on defense spending. The other issue is not all infrastructure is government controlled. Rail for example, the 4 big railroads own most of the tracks. Amtrak which is our passenger rail service, owned by the government only has 600 miles of owned track, it pays fees to use the rail of the freight companies. The freight companies do a piss poor job at maintenance so trains cant go fast on much of the track. Their train of thought is, the shit gets there when it gets there. The electrical infrastructure is also privately owned, BUT power companies are generally "Legal" monopolies and are subject to more strict rules and regulations. Same thing applies to natural gas as well. 

 

 The infrastructure that the government is directly responsible for like the interstate system is falling apart in many areas, BUT the interstate also falls under state control as well. Michigan for example has ripped up miles of I275 and rebuilt it. Dont get me talking about drinking water, because there are major failures there as well. 

I just want to sit back and watch the world burn. 

Link to comment
Share on other sites

Link to post
Share on other sites

Update on this story. A worker that was getting the backups and active copy synchronised accidently deleted several critical system files which is what caused the outage.

 

https://www.bbc.co.uk/news/world-us-canada-64341873

 

Given the tendency of a lot of critical systems to run on really old hardware and software i wouldn't be surprised to find that the system isn't compatible with off the shelf backup methods and they where resorting to manual copy paste with the worker probably trying to delete files on the backup copy and instead deleting them on the live copy instead. Easy to understand but oof, baaaad system of doing things. 

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×