Jump to content

NANDpocalypse - 6 Exabytes lost

LukeSavenije
15 minutes ago, WereCat said:

I call BS claims. 

 

If they lost 6Exabytes in 13min, they can make 6Exbytes in 13min which means that the production is so large that 6Exbytes doesn't matter since the production is so large that this is only a small fraction. 

They didn't say that they wouldn't have lost everything if the outage lasted only e.g. 12 minutes. The 13 minute - figure is irrelevant, the fact that the machinery lost power is the relevant part and even a second of power-loss would have resulted in the same outcome. You can't just stop the process and resume it like that, it has to go through from start to finish.

Hand, n. A singular instrument worn at the end of the human arm and commonly thrust into somebody’s pocket.

Link to comment
Share on other sites

Link to post
Share on other sites

Time for price fixing 2.0 . RAM prices falling quickly? Just artificially inflate them. NAND prices falling quickly? Just turn it off and turn it back on again. Prices go up, stocks go up, profits go up. Nice to see that I can always count on the same couple NAND/RAM companies to disappoint time and time again. 

 

I'll be damned if the companies not only make up the loss in product through price hikes, but probably make money off of this.

Link to comment
Share on other sites

Link to post
Share on other sites

5 minutes ago, WereCatf said:

They didn't say that they wouldn't have lost everything if the outage lasted only e.g. 12 minutes. The 13 minute - figure is irrelevant, the fact that the machinery lost power is the relevant part and even a second of power-loss would have resulted in the same outcome. You can't just stop the process and resume it like that, it has to go through from start to finish.

Yes. Still, it's a BS excuse for share holders. 

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, iamdarkyoshi said:

Why wasn't this stuff battery backed?

They probably had a redundant source of power (though most likely not on site, you can't power a whole fab with a backyard diesel generator) - maybe that failed too for some reason. The article seems to indicate the whole region suffered an outage.

Don't ask to ask, just ask... please 🤨

sudo chmod -R 000 /*

Link to comment
Share on other sites

Link to post
Share on other sites

7 minutes ago, Sauron said:

They probably had a redundant source of power (though most likely not on site, you can't power a whole fab with a backyard diesel generator) - maybe that failed too for some reason. The article seems to indicate the whole region suffered an outage.

You don't even really need to keep the whole plant running, you only need enough power to keep critical systems up long enough to shut down properly. I'm not versed in fabrication methods, but I have to assume it's not an unstoppable process. There certainly must be a procedure to safely stop production. The sudden loss of power is the killer here; you only need power long enough to safely shut down.

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, Dacien said:

You don't even really need to keep the whole plant running, you only need enough power to keep critical systems up long enough to shut down properly. I'm not versed in fabrication methods, but I have to assume it's not an unstoppable process. There certainly must be a procedure to safely stop production. The sudden loss of power is the killer here; you only need power long enough to safely shut down.

Maybe you're right, though it's possible the process cannot be cleanly stopped and restarted - maybe once the silicon is in you can no longer get it out and if you stop half way through the material is rendered unusable. It wouldn't surprise me given the precision required.

Don't ask to ask, just ask... please 🤨

sudo chmod -R 000 /*

Link to comment
Share on other sites

Link to post
Share on other sites

So how much did they pay the Bob the electrician to "accidentally" flip the switch to the wrong power node? "Oops my bad it was an honest mistake. Oh well, I'm sure it'll be all okay".

Link to comment
Share on other sites

Link to post
Share on other sites

How can 13 minutes of power outage damage half your quarterly output?

I call BS.

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, TetraSky said:

How do you somehow lose 6EB of NAND in 13 minutes without power

I'm more confused as to why they can't start production again until mid July, what the hell is the start up sequence of their foundry where it takes half a month to do anything

🌲🌲🌲

 

 

 

◒ ◒ 

Link to comment
Share on other sites

Link to post
Share on other sites

4 hours ago, 2FA said:

 

I'm no nanoengineer but the equipment used to produce NAND is really expensive and sensitive (nanometer scale after all) so I'm guessing that time frame includes calibration and verification of all the machinery.

I work for an engineering firm that does the mechanical electronical and plumbing for buildings and when we work with companies with extremely sensitive equipment they have us put in alot of redundancy because the equipment will literally break if the power is out for too long. I mean we are talking about a couple hundred thousand dollars a machine. I dont know what type of equipment they use but I would imagine it is a similar case. 

Link to comment
Share on other sites

Link to post
Share on other sites

5 hours ago, TetraSky said:

How do you somehow lose 6EB of NAND in 13 minutes without power?

I smell bullshit here and feels like they just wanted a reason to raise the prices of NAND based products after they've been going down a lot lately.

 

3 hours ago, WereCat said:

I call BS claims. 

 

If they lost 6Exabytes in 13min, they can make 6Exbytes in 13min which means that the production is so large that 6Exbytes doesn't matter since the production is so large that this is only a small fraction. 

Keep in mind an enormous number of wafers from a single crystal and each wafer can have a large number of individual chips made. Crystals take a long time to grow (1" per hour at best) and interrupting the process will reduce production, maybe even damaging what has already been grown, and restarting the process from scratch (you can't restart growing a crystal once it has been stopped) is time consuming. Also, these plants are not just growing a few crystals at a time; they are growing hundreds, if not thousands, at once.

Jeannie

 

As long as anyone is oppressed, no one will be safe and free.

One has to be proactive, not reactive, to ensure the safety of one's data so backup your data! And RAID is NOT a backup!

 

Link to comment
Share on other sites

Link to post
Share on other sites

5 hours ago, Dacien said:

As someone who works in the electrical industry, a backup generator with ATS, the wiring, additional labor to install, whatever you throw at it, will never cost more than what was lost here. It is routine to have these backup generators even in uninteresting projects like apartment buildings. At least here in the states. I can almost guarantee that there was a malfunction in either their ATS or generator.

 

They would only need a short-term backup battery bank in the range of seconds before the generator took over, so that wouldn't have been the problem. If the short-term had failed, the outage would not have lasted 13 minutes, it would have lasted seconds.

The amount of power needed just to create the crystals is enormous; the rest of the process is also power hungry. Also, building generation is not inexpensive (I worked in Supply, incuding support for Generation, for a large electrical and irrigation utility for 30 of the 32 years I was there). A power plant of the size needed is unbelievably expensive which would be hard to justify for the rare times it would be needed. Also, a plant of of that size takes time to ramp up. A gas turbine can be brought up in no less than 30 minutes. A modern combined cycle plant needs an hour. Conventional boiler steam generation needs at least 24 hours and usually takes up to 48 hours to get online.

 

Then there is the issue of the fuel needed to power a power plant. Enormous amounts are needed. Natural gas has to be piped in and that may not be feasible due to distance or terrain. Coal would need to be hauled in by truck or train and the nearest mines may be too far off to be practical. It is far easier and less expensive to deliver electric power from far off genenration.

 

The best redundancy is to be connected to a grid that has multiple sources generation connected to the grid so, if one plant or power line goes down, others can take up the slack. Still, depending on the percentage of power being drawn by the chip plant compared to total amount of generation normally in operation, it can still take a short while to bring thngs back up to speed.

 

Of course, the loss of generation is not the only cause of power loss. Control circuits, transformers, switch gear, etc. can all go time and the restoration of power or repair of damage takes time.

Jeannie

 

As long as anyone is oppressed, no one will be safe and free.

One has to be proactive, not reactive, to ensure the safety of one's data so backup your data! And RAID is NOT a backup!

 

Link to comment
Share on other sites

Link to post
Share on other sites

4 hours ago, WereCat said:

True but not everything needs recalibration. That's a corporate BS. I work in a similar company. It takes a day or two at most to fix a fail like this. Nobody likes to lose money. Not that kind of revenue as was mentioned. 

This seems more like a subterfuge to increase the cost of SSDs that were going low in price a lot. 

 

I still doubt that a fail of one company will affect the prices. Maybe stall them for a few more days 

Quoting a comment left on AnandTech's article.

Quote

Clearly you have no idea how semiconductor manufacturing works. 13 minutes is an INSANELY long outage. Samsung had at 200ms (yeah, 0.2s) outage in 2017 that cost something like $50M in product losses.

Losing power means they lost:
All RF etch tools - any wafers in those tools are scrap
All fab exhaust/pumps/cooling water - any tools with sensitive environments scrap 100%
All power to furnaces - temperature varied from spec, 100% scrap

 

4 hours ago, Sauron said:

They probably had a redundant source of power (though most likely not on site, you can't power a whole fab with a backyard diesel generator) - maybe that failed too for some reason. The article seems to indicate the whole region suffered an outage.

According to Dr. Ian Cutress of AnandTech, this 13 minutes was after batteries depleted.

2 hours ago, Rune said:

How can 13 minutes of power outage damage half your quarterly output?

I call BS.

Quoting a comment left on AnandTech's article.

Quote

Clearly you have no idea how semiconductor manufacturing works. 13 minutes is an INSANELY long outage. Samsung had at 200ms (yeah, 0.2s) outage in 2017 that cost something like $50M in product losses.

Losing power means they lost:
All RF etch tools - any wafers in those tools are scrap
All fab exhaust/pumps/cooling water - any tools with sensitive environments scrap 100%
All power to furnaces - temperature varied from spec, 100% scrap

 

if you have to insist you think for yourself, i'm not going to believe you.

Link to comment
Share on other sites

Link to post
Share on other sites

For sure they have backup generators that are tested for reliability to make sure they work when needed.

 

The storage industry seems to be the most accident-prone of industries. It's like they just spin a wheel to see which story they're going to go with to increase prices next time: Flood, power-outage, locusts, space-pirates...

You own the software that you purchase - Understanding software licenses and EULAs

 

"We’ll know our disinformation program is complete when everything the american public believes is false" - William Casey, CIA Director 1981-1987

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, Delicieuxz said:

The storage industry seems to be the most accident-prone of industries. It's like they just spin a wheel to see which story they're going to go with to increase prices next time: Flood, power-outage, locusts, space-pirates...

That or the production of NAND, which is in almost everything and has quite the demand, is a lot more sensitive than you make it out to be.

if you have to insist you think for yourself, i'm not going to believe you.

Link to comment
Share on other sites

Link to post
Share on other sites

38 minutes ago, Delicieuxz said:

For sure they have backup generators that are tested for reliability to make sure they work when needed.

Can you back that up?

Jeannie

 

As long as anyone is oppressed, no one will be safe and free.

One has to be proactive, not reactive, to ensure the safety of one's data so backup your data! And RAID is NOT a backup!

 

Link to comment
Share on other sites

Link to post
Share on other sites

Regrading the restart time. bear in mind they just had at the very least several thousand very sensitive machines go down in a hard crash, they undoubtedly have technicians of their own to reset everything and recalibrate and recertify, but they won't be doing that expect at scheduled maintenance, and that means they'll only have enough techs to deal with expected maintenance plus a bit more for a sudden short term spike. Recertifying an entire plants worth of machinery though is certainly far beyond what they can do, and many parts of the plant need other parts to work.

 

And thats assuming they didn't have any chemicals go where they shouldn't during all of that. Even a small chlorine trifluoride leak could have done a real number on a whole group of machines requiring their complete replacement or at least hauling off for major repairs. That stuff is evil. And i'm sure there are other things they use that could have been equally nasty from the PoV of the machines.

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Lady Fitzgerald said:

Can you back that up?

It's a supposition, so backing it up would simply be me confirming that I assert it. I confirm it.

 

Much smaller companies have backup power to protect their operations. I think it would be pretty crazy if a large tech production process was running at the mercy of the unknown. Kind of a big oversight for an advanced tech company.

You own the software that you purchase - Understanding software licenses and EULAs

 

"We’ll know our disinformation program is complete when everything the american public believes is false" - William Casey, CIA Director 1981-1987

Link to comment
Share on other sites

Link to post
Share on other sites

25 minutes ago, Delicieuxz said:

It's a supposition, so backing it up would simply be me confirming that I assert it. I confirm it.

 

Much smaller companies have backup power to protect their operations. I think it would be pretty crazy if a large tech production process was running at the mercy of the unknown. Kind of a big oversight for an advanced tech company.

These places use so much power generators become non feasible. More commonly they would have two utility grid sources so if there is a major fault at a sub station power is not interrupted.

 

It's a similar situation for the largest super computers, many of them had a power plant built for them or were located next to/near one to service the demands these need.

 

Facilities with Megawatt type draws don't have backup generators, they would actually be power plants. Some things just don't scale and backup power is one of them.

Link to comment
Share on other sites

Link to post
Share on other sites

6 minutes ago, leadeater said:

These places use so much power generators become non feasible. More commonly they would have two utility grid sources so if there is a major fault at a sub station power is not interrupted.

 

It's a similar situation for the largest super computers, many of them had a power plant built for them or were located next to/near one to service the demands these need.

 

Facilities with Megawatt type draws don't have backup generators, they would actually be power plants. Some things just don't scale and backup power is one of them.

 

Not strictly acurratte, but it is a very specialised use case and somthing you'll only see ethier when you absolutely, absolutely, absolutely have to have it, (like a nuclear power plant), or are dealing with somthing especially expensive, (like the British National Grid, we love our tea).

Link to comment
Share on other sites

Link to post
Share on other sites

12 minutes ago, CarlBar said:

Not strictly acurratte, but it is a very specialised use case and somthing you'll only see ethier when you absolutely, absolutely, absolutely have to have it, (like a nuclear power plant), or are dealing with somthing especially expensive, (like the British National Grid, we love our tea).

Not sure how this applies? A semi conductor fab uses 60MW of power or more, building something that large and expensive (100mil-400mil) in case power might fail is near impossible to justify.

 

For super computers these are located in places that do have there own power plants, Los Alamos National Laboratory for example, but that has less to do with the super computer itself and the entire facility itself. But this is a common trend with the very top of the top 500.

 

Long distance very high power draw to a single place is rather expensive and hard to deliver reliably so you tend to build these things close to a power source, our only Aluminium smelter is next to one for example and if that were to lose power and the metal were to cool enough to solidify that would be a huge problem.

 

Edit:

Anyway, hospital in my city has 2 grid feeds plus 3 very large gas turbine generators but they aren't megawatt scale either.

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, leadeater said:

Not sure how this applies? A semi conductor fab uses 60MW of power or more, building something that large and expensive (100mil-400mil) in case power might fail is near impossible to justify.

 

For super computers these are located in places that do have there own power plants, Los Alamos National Laboratory for example, but that has less to do with the super computer itself and the entire facility itself. But this is a common trend with the very top of the top 500.

 

Long distance very high power draw to a single place is rather expensive and hard to deliver reliably so you tend to build these things close to a power source, our only Aluminium smelter is next to one for example and if that were to lose power and the metal were to cool enough to solidify that would be a huge problem.

 

I wasn't saying it did apply to this situation, just saying there are times when you do need multi-megawatt emergency power feed measures, they're rare and most situations don't justify them, but there are edge cases, (which is all i was saying btw).

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, CarlBar said:

I wasn't saying it did apply to this situation, just saying there are times when you do need multi-megawatt emergency power feed measures, they're rare and most situations don't justify them, but there are edge cases, (which is all i was saying btw).

I just wasn't sure how the British national grid applied, sounded more like a grid centric situation rather than a particular facility needing to maintain power etc.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


×