Jump to content

A 1 Petabit DVD-like disc has been created

Uttamattamakin
1 hour ago, Uttamattamakin said:

When you can put 100 of these into a box and FedEx them to someone why bother with a cloud for that transfer?  

Simply because almost nobody actually needs to transport that amount of data or in that type of way. Single point to point data transfer of a point in time data set of 125PB (125TB x 100) simply doesn't really exist and when that is needed we actually have solutions for that because the data size isn't the important part it's the writing and reading of it which is why it's done with shipping containers filled with HDD, servers and network equipment.

 

1*ST-FSSZ4P1QFZAwati4cog.png

 

Quote

Snowmobile moves up to 100 petabytes (PB) of data in a 45-foot long ruggedized shipping container and is ideal for multi-PB or exabyte (EB)-scale digital media migrations and data center shutdowns. A Snowmobile arrives at the customer site and appears as a network-attached data store for a more secure, high-speed data transfer. After data is transferred to Snowmobile, it is driven back to an AWS Region where the data is loaded into Amazon S3.

 

Snowmobile pricing is based on the amount of data stored on the truck per month.

 

Quote

Designed to address data challenges for companies dealing with large film vaults and troves of satellite imagery, the truck consumes a whopping 350 KW of AC power. All this power fuels a switch that can handle one terabit of data per-second across multiple 40gbps connections.

 

This means a Snowmobile can be filled to the brim in roughly 10 days.

 

Take Netflix for example, while they could and do transfer Petabytes worth of data it's not to you, it's to millions of "you". So you don't need 125PB data carriers you need 1 millionth of that aka 125GB which makes current internet more than capable of this task and faster than "Fedex".

 

For some perspective the above can be achieved in one day with a 11Mbps internet connection (125GB). You really do need to be transferring an enormous amount of data from one place to another to exceed the capability of internet connections and that is squarely not in the realms of consumer demands.

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, Uttamattamakin said:

This I agree with and @wanderingfool2  you are missing the point.  It taking over 120 what would be VERY large current year SSD's or HDD's to fill up one of these disc is the point.  If it's Gib or GB or whatever.   The issue is what having a "sneakernet" that can carry that much data will enable.   When you can put 100 of these into a box and FedEx them to someone why bother with a cloud for that transfer?  

If anything the existence of such a media might shake things up.  You know we are getting into a very 720k is enough for anyone phase of computing again.  So many devices have 1 TB as the limit for their storage.  If disc like this threaten to take business that might change.   Even if they do only that that would be a game changer. 

No, I'm not, you are just being ignorant of reality.  It's greatly important of GiB vs GB (btw capitalization of the B matters) because you literally went at someone saying the use of bit as the article was misleading and went on to get everything wildly wrong.

 

Again, you offer no use-case just some imaginary scenario that anyone who has any knowledge of the digital landscape would say is wrong.

 

Your whole "1 TB as the limit for their storage." and "threaten to take business that might change" shows how you are missing the point because you lack basic knowledge/research into this; despite people telling you you are wrong.  Look at what I said before, Moores law is dead.  Have a massive storage disc wouldn't change the business.  They can't magically change things to make it better.

 

Again, HDD disc density can only go so far before the physics breaks down.  SSD's can only get so packed in...the saving grace with SSD's is they are managing to store multiple bits per cell (but at the cost of durability).

 

If they could make 1TB cheaper or higher capacity at the same pricing they would...but it's a slow progression (because SSD's have a competitor already, HDD's).

 

A regular person likely has less than 10 TB of data.  There isn't use-cases for people having a lot of data.  Your whole "eliminate the cloud" is pointless because again you are talking about it as though it's a consumer level thing.  Again, consumers already have spoken when they have been largely abandoning blu-ray and discs to go for digital downloads.  Music CD's, despite being lossless got beaten out by digital.  You have yet to state why you somehow think more space would magically convince people to do it (or companies to go back to releasing things like that).

 

4 hours ago, Godlygamer23 said:

If a RAM manufacturer is representing GiB as GB, then it's wrong, and I say the same thing about Windows as well. Represent it for what it actually is - if there are additional units that are available to define what you're talking about...then fucking use them. There's no excuse except for companies that refuse to change. 

 

Microsoft clearly acknowledges the difference between GiB and GB, and units of similar caliber(KiB and KB for example) because it's in the default Windows calculator, and yet Windows is representing storage space in binary vs decimal. It doesn't even need to change anything - just change how the unit is being represented, and users will probably figure it out. 

 

This is a debate that shouldn't even be a debate.

It's not wrong though.  There was like 30 years of history behind it.

 

The whole binary prefix stuff wasn't even formalized into a standard by the IEC until 1999...there were still groups pushing for k vs K to be the difference (with HDD's being a notable hold out for adopting that).

 

Actually back in 1994 IEEE states the definition of a kilobyte to be 1024 bytes.  2000 ANSI/IEEE later redefined it to be 1024 bytes or 10^3, where the person using the term was to specify which was used in the current context.  So for a time there actually was two competing bodies saying their own thing.  Generally in computing though, the 1024 was actually there first

 

So while you say, this shouldn't be a debate, then I'd like to point out that computers are very much a binary system, the existence of the term had meant 1024 in a lot of cases...it was taught in schools even (with the caveat the storage industry just played by their own definition of megabyte).  You are effectively suggesting the forcing of a standard that for 30 years prior to the standard had a different meaning.

3735928559 - Beware of the dead beef

Link to comment
Share on other sites

Link to post
Share on other sites

7 hours ago, leadeater said:

Simply because almost nobody actually needs to transport that amount of data or in that type of way. Single point to point data transfer of a point in time data set of 125PB (125TB x 100) simply doesn't really exist and when that is needed we actually have solutions for that because the data size isn't the important part it's the writing and reading of it which is why it's done with shipping containers filled with HDD, servers and network equipment.

 

1*ST-FSSZ4P1QFZAwati4cog.png

 

 

 

Take Netflix for example, while they could and do transfer Petabytes worth of data it's not to you, it's to millions of "you". So you don't need 125PB data carriers you need 1 millionth of that aka 125GB which makes current internet more than capable of this task and faster than "Fedex".

 

For some perspective the above can be achieved in one day with a 11Mbps internet connection (125GB). You really do need to be transferring an enormous amount of data from one place to another to exceed the capability of internet connections and that is squarely not in the realms of consumer demands.

still need to do a snowball transfer of my dad's nas to have a backup to some form of deep glacre 

The large reason we looked at snowball is because comcast data caps of 1.2tb is stupid.

I just also need to find out if we can currier it ourselves rather then pay FedEx as the cost to ship those HDD is gross. Snowball with optical disks will be far cheaper for everyone involved due to weight per bit.

Link to comment
Share on other sites

Link to post
Share on other sites

5 minutes ago, starsmine said:

still need to do a snowball transfer of my dad's nas to have a backup to some form of deep glacre 

The large reason we looked at snowball is because comcast data caps of 1.2tb is stupid.

Just do the initial backup over a few months and then the incremental changes won't/shouldn't be a problem.

 

5 minutes ago, starsmine said:

I just also need to find out if we can currier it ourselves rather then pay FedEx as the cost to ship those HDD is gross. Snowball with optical disks will be far cheaper for everyone involved due to weight per bit.

You can only do what the service allows so you have to find one that allows HDD mail-in i.e. Backblaze.

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, leadeater said:

Just do the initial backup over a few months and then the incremental changes won't/shouldn't be a problem.

 

You can only do what the service allows so you have to find one that allows HDD mail-in i.e. Backblaze.

By the time you are looking at snowball, you are looking at way more then a dozen terabytes.

backblaze would be hundreds of dollars a month for something like that, glacier is far far cheaper. This isnt a normal s3 storage where backblaze is cheaper with.
I'm confused what you are talking about here, Snowball IS HDD mail in. But I know where the servers are in Fairfax county VA. Thats a common trip for us, it would be cheaper to NOT spend 100 dollars on FEDEX to ship it to them. Of course the other Snowball fees would apply. 

A disk would cut down the fedex costs to sub 10. 

amazon deep glacier is $0.00099 per GB per month, With $0.0025 per GB to retrieve if the NAS blows up.
Thats like 13 dollars a month for a dozen TB, 

Backblaze is 72 a month

Link to comment
Share on other sites

Link to post
Share on other sites

34 minutes ago, starsmine said:

backblaze would be hundreds of dollars a month for something like that, glacier is far far cheaper. This isnt a normal s3 storage where backblaze is cheaper with.

Backblaze is $99/year unlimited, so long as it's personal use and not for a business.

 

And yes I know what Glacier is, I'm literally a backup solution architect and were I work we protect actual petabytes of data on disk and LTO tape.

 

34 minutes ago, starsmine said:

I'm confused what you are talking about here, Snowball IS HDD mail in.

Snowball is a service offering that has a price and used how they tell you, Backblaze allows you to mail in an HDD, literally what I said. And no it's not $100 to send an HDD, find a better courier. Same goes for the disc, once you put it in a protective carrier so it won't get broken you'll be paying just as much since there are minimum costs based on dimensions and weight.

 

If you actually fairly priced out sending an HDD and a Disc they'd be quite similar. Now 10 Discs, that is where you win easily.

Link to comment
Share on other sites

Link to post
Share on other sites

9 hours ago, Zodiark1593 said:

Comcast is at about 1.2 TB per month. Most satellite providers, Starlink aside perhaps, are only 100-200GB of “Priority Data”. 

Well idk about satellite I meant general wire. Even my phone plan is unlimited. Sad if so.

| Ryzen 7 7800X3D | AM5 B650 Aorus Elite AX | G.Skill Trident Z5 Neo RGB DDR5 32GB 6000MHz C30 | Sapphire PULSE Radeon RX 7900 XTX | Samsung 990 PRO 1TB with heatsink | Arctic Liquid Freezer II 360 | Seasonic Focus GX-850 | Lian Li Lanccool III | Mousepad: Skypad 3.0 XL / Zowie GTF-X | Mouse: Zowie S1-C | Keyboard: Ducky One 3 TKL (Cherry MX-Speed-Silver)Beyerdynamic MMX 300 (2nd Gen) | Acer XV272U | OS: Windows 11 |

Link to comment
Share on other sites

Link to post
Share on other sites

9 hours ago, leadeater said:

Just remember the internet isn't tied to economic status of the country or the "physical world", economic super powers can have hilariously terrible internet access and internet access equality. 

 

But also yea here all we really have is "fair use" which is not specifically defined. Just don't be a problem and you can use as much as you like.

Yeah I somewhat figured that, because I've seen a bit worse countries have better internet then mine. Yet we are worse in Europe quite sad.

 

But caps aside, yes quality is another thing, it's not really good, can be iffy a lot of the times. But I'm not on fiber.

| Ryzen 7 7800X3D | AM5 B650 Aorus Elite AX | G.Skill Trident Z5 Neo RGB DDR5 32GB 6000MHz C30 | Sapphire PULSE Radeon RX 7900 XTX | Samsung 990 PRO 1TB with heatsink | Arctic Liquid Freezer II 360 | Seasonic Focus GX-850 | Lian Li Lanccool III | Mousepad: Skypad 3.0 XL / Zowie GTF-X | Mouse: Zowie S1-C | Keyboard: Ducky One 3 TKL (Cherry MX-Speed-Silver)Beyerdynamic MMX 300 (2nd Gen) | Acer XV272U | OS: Windows 11 |

Link to comment
Share on other sites

Link to post
Share on other sites

@starsmine I think you can have an Unlimited plan.

 

Quote

However you use the Internet, our data plans have you covered. Customers in select markets are automatically given our 1.2 Terabyte Data Plan. That's enough monthly data to stream HD Movies for nearly 18 hours a day. Need more? Go unlimited.

 

Quote

The 1.2 Terabyte Internet Data Usage Plan provides you with 1.2 terabytes (TB) of Internet data usage each month as part of your monthly Xfinity Internet service. If you choose to use more than 1.2 TB in a month without being on an unlimited plan, we will automatically add blocks of 50 GB to your account for an additional fee of $10 each. Your charges, however, will not exceed $100 each month, no matter how much you use. We're also offering you a courtesy month, so you will not be billed the first time you exceed the limit.

^ Or just play the game unless you have already exceeded it once.

Link to comment
Share on other sites

Link to post
Share on other sites

11 hours ago, leadeater said:

Simply because almost nobody actually needs to transport that amount of data or in that type of way. Single point to point data transfer of a point in time data set of 125PB (125TB x 100) simply doesn't really exist.

(Consider the size of the simple screen shot at the bottom of this image)

 

Not yet not for most applications.  I can think of scientific applications where more data more better it might make sense to record all of that.  
https://www.phenix.bnl.gov/WWW/publish/rseidl/talks/Triggering In Particle Physics Experiments.pdf


To greatly oversimplify in particle physics most of the information is deemed to be noise because it would not be indicative of expected new physics.  There is always a chance, a good one now, that our predictions are a bit off and the ability to store more of this data and look for less expected physics would be welcome.   That's science though more data more better always for that. 

 

You are right it is very hard to think of an application right now that would need a Pb  or Pib much less a PB or PiB. On the other hand there is the idea that if the space exist we'll find something to fill it with. 

 

11 hours ago, leadeater said:

 

and when that is needed we actually have solutions for that because the data size isn't the important part it's the writing and reading of it which is why it's done with shipping containers filled with HDD, servers and network equipment.

 

It makes total sense that things like this exist.  Something more compact than the technology we have now could be a good thing.  Instead of a 40 foot container of network equipment and hard drives for that amount of data. 

 

Your post was really very informative and I complement you on it.  I knew such things existed somehow but couldn't've named a service that did that for people or what exactly it looked like.    In academia "sneakernet" as we call it usually is just harnessing the power of 100 eager young undergraduates each carrying a hard drive. 🙂 

 

 

11 hours ago, leadeater said:

For some perspective the above can be achieved in one day with a 11Mbps internet connection (125GB). You really do need to be transferring an enormous amount of data from one place to another to exceed the capability of internet connections and that is squarely not in the realms of consumer demands.

Oh no doubt you are right about that.  Right now that is not called for consumers.   I remember computing in the 80's.  People used to argue "You can store 1000 copies of the bible on a 720 k floppy.... no ones ever going to need more than that"  etc.  So far whenever there was a storage that was "enough" or a RAM that was "enough" we found something to do with more capacity.  

Let me be clear You are 100% right that this is not needed now.  In 10 or 20 years I'd not be surprised if we look back on 1 TB as a quaint tiny amount of storage barely able to hold part of some sort of photorealistic holographic video or whatever. 

 

Did you consider the size of say a normal screen shot of the plausible future ? This is 5 MB in size.   In 1995 one could run Windows 95 on a computer with less total RAM than the size of this one image and a 120 MB hard drive was a lot.   Now we toss these around like they are nothing. 

 

image.thumb.png.b741dd01e41df6502f54fd66c7d9a481.png

 

Link to comment
Share on other sites

Link to post
Share on other sites

8 hours ago, wanderingfool2 said:

No, I'm not, you are just being ignorant of reality.  It's greatly important of GiB vs GB (btw capitalization of the B matters) because you literally went at someone saying the use of bit as the article was misleading and went on to get everything wildly wrong.

Every article about this says Petabit.  Not petabyte.    Converting from bits to bytes is so it can be expressed in terms we really use for storage media.   I used an online converter and gave you a screen shot of it.  I used a different one and gave a screen shot of that.  Write to Wolfram with your concerns.   The point of the article and this post is not harmed or inaccurate or misleading in any way. 

 

So just relax huh.  

daruku-hoshino-tired.gif.99e3917004bc5696876765b7db1fc3ee.gif

 

(As an aside.  Consider the above gif is 3.6 MB.  There was a time when it would not have fit in RAM at the same time as ones OS.   It may take a period as long as from 1993 to now for this to be used but it's progress I am confident it will be. 

 

One last thing our beloved message board does not meet your standard of outrage at Mb Mb or mb being capitalized. 

Spoiler

Screenshot_20240223_095548.png.2d1d8ff5853d4158c2af4e0cc1beb563.png

So really truly just relax.  We aren't writing technical manual here. 

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, Doobeedoo said:

Yeah I somewhat figured that, because I've seen a bit worse countries have better internet then mine. Yet we are worse in Europe quite sad.

 

But caps aside, yes quality is another thing, it's not really good, can be iffy a lot of the times. But I'm not on fiber.

Guess what country in the world actually has really cheap high speed internet, and has had it for longer than most of the US. 

Go on Guess. 

 

They took down a Blackhawk helicopter once and we made a movie about it. 

Link to comment
Share on other sites

Link to post
Share on other sites

6 minutes ago, Uttamattamakin said:

 

 

Did you consider the size of say a normal screen shot of the plausible future ? This is 5 MB in size.   In 1995 one could run Windows 95 on a computer with less total RAM than the size of this one image and a 120 MB hard drive was a lot.   Now we toss these around like they are nothing. 

I had a similar thought a bit ago. My Sony camera isn’t even all that high resolution (24 Megapixels), but it spits out 50MB RAWs every click of the button. Would’ve been an absolutely bonkers amount of data back in the 90’s. 

My eyes see the past…

My camera lens sees the present…

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, Zodiark1593 said:

I had a similar thought a bit ago. My Sony camera isn’t even all that high resolution (24 Megapixels), but it spits out 50MB RAWs every click of the button. Would’ve been an absolutely bonkers amount of data back in the 90’s. 

It would've taken a late 80's to early 90's equivalent of the AWS snowball to do.  

 

I remember a news report from the 90's about NASA trying to gather and store one TB of data on the Earth.  Hearing scientist talk about how astounding the ammount of data was.   I can't find it but I found this. 

https://www.jpl.nasa.gov/news/mission-accomplished-by-twin-telescope-sky-survey

 

Quote

For the past three and a half years, the twin telescopes of the Two Micron All-Sky Survey (2MASS), located in Arizona and Chile, have conducted the first high-resolution digital survey of the complete sky. The successful completion of observations marks a milestone in modern astronomy. For the next two years, data processing will continue for the 24 terabytes of archive data, which is enough to fill more than 2,000 hard drives on an average home computer.

 

"These telescopes have given us the first detailed global view of our Milky Way galaxy and the galaxies that lie beyond," said Dr. Michael Skrutskie, of the University of Massachusetts, Amherst, 2MASS principal investigator. 

 

Link to comment
Share on other sites

Link to post
Share on other sites

23 minutes ago, Uttamattamakin said:

Not yet not for most applications.  I can think of scientific applications where more data more better it might make sense to record all of that.  
https://www.phenix.bnl.gov/WWW/publish/rseidl/talks/Triggering In Particle Physics Experiments.pdf

Those do create a lot of data yes but not in the order of 100+ PB in a single experimentation run. Then further to that they actually want to do something with that data so it sits somewhere actually useful and readable so computation can be run against it so the data is generated and stored close to each other and if it's moved then it's across a very high speed network designed to move that amount of data.

 

Finally when it's no longer needed to be ready on demand it's moved to tape.

 

So my question to you would be for what purpose and to where would someone like for example CERN want to send 100+ PB of data to? What functional need for this is there? Why is CERN wanting to send 100+ PB of data and to where/who?

 

23 minutes ago, Uttamattamakin said:

It makes total sense that things like this exist.  Something more compact than the technology we have now could be a good thing.  Instead of a 40 foot container of network equipment and hard drives for that amount of data. 

This issue is primarily that to move this amount of data you need a sufficient data rate and that is where the size comes from. You need enough servers to handle that data rate and you need a fast enough physical medium to read it from and write it to. So I don't think much space will be saved by doing this on a ~100TB optical disk medium since it would be so painfully slow it wouldn't be practical which leads to requiring multiple servers connected to one or more optical drives leading to need a shipping container still.

 

Moving 100PB of data is simply difficult and the physical medium the data is written to is only one factor.

 

Have you considered how much computational resources would be required to actually read and write 125PB of data in say one month?

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, leadeater said:

Those do create a lot of data yes but not in the order of 100+ PB in a single experimentation run. Then further to that they actually want to do something with that data so it sits somewhere actually useful and readable so computation can be run against it so the data is generated and stored close to each other and if it's moved then it's across a very high speed network design to move that amount of data.  

100 PB in a single run no.   Scientists don't do single runs.   For something like a particle physics experiment  

Just to make it more confusing they don't call the data in bits and bytes in particle physics lingo.  They use inverse femtobarns.  (LOL as of this wasn't all confusing enough.) 

https://atlas.cern/updates/news/atlas-reaches-milestone-5-inverse-femtobarns-data

 

Quote

In an amazing year that has exceeded our expectations, the Large Hadron Collider has delivered, and ATLAS has recorded, over 5 inverse femtobarns (fb-1) of collisions. These units correspond to having 3.4 x 1014 or 340 000 000 000 000 total collisions. Most analyses presented at the last major conference (the Lepton Photon Symposium in August in Mumbai) made use of about 1 fb-1, so this is a big jump.

 

Now the question is how much data can one gather about each collision?  

As I recall they don't record all of it.  Triggering circuits are used to look for the expected physics ...but unexpected physics is also a thing so gathering more and more data is good.    Yeah if there is a cost effective medium that can store Pb and eventually PB's of data we can fill it.   Network speed may improve enough to make this a moot point but at some point we do want to store it 

 

1 minute ago, leadeater said:

 

Finally when it's no longer needed to be ready on demand it's moved to tape.  

 

So my question to you would be for what purpose and to where would someone like for example CERN want to sent 100+ PB of data to? What functional need for this is there? Why is CERN wanting to send 100+ PB of data and to where/who?

Here's my answer for you.  

After the principal researchers are done with the data.  In the EU they get a certain period of time some years where they and only they have access then it becomes open to the public. 

 

In the USA data that is financed by the government is open to the public from the start.   So there has to be a reasonable way to transfer it.   If Pb's of data exist and can be stored on 1, 2 or even 100 such disc then they have to be willing to provide them at cost. 

 

1 minute ago, leadeater said:

 

This issue is primarily that to move this amount of data you need a sufficient data rate and that is where the size comes from. You need enough servers to handle that data rate and you need a fast enough physical medium to read it from and write it to. So I don't think much space will be saved by doing this on a 100TB (Pb 1 Pb) optical disk medium since it would be so painfully slow it wouldn't be practical which leads to requiring multiple servers connected to one or more optical drives leading to need a shipping container still.

All of this is true now.  The computers of 10 or 20 years from now are a different story.  Remember in the late 50's to Early 60's a HDD was something only suitable for a mainframe with an IBM technician on staff to keep it working.  20 years latter it was for high end computers.  20 more years every computer had it.  

Take the LONG view of thigs as I keep mentioning.  This is something that young people in 10 or 20 years will look at as "revolutionary".  Wow a disc that can hold a whole Pb with no waiting to download like the "Millennials" used to. 

 

1 minute ago, leadeater said:

 

Moving 100PB of data is simply difficult and the physical medium the data is written to is only one factor.

 

Have you considered how much computational resources would be required to actually read and write 125PB of data in say one month?

With current computers it would take a lot.  With future computers and further development we could see that kind of data being moved around more than we can imagine.  

 

1 TB is not enough.   There is no enough. 

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Uttamattamakin said:

Guess what country in the world actually has really cheap high speed internet, and has had it for longer than most of the US. 

Go on Guess. 

 

They took down a Blackhawk helicopter once and we made a movie about it. 

Heh, I mean be it any middle east or a bit further, some countries really have good internet. Meanwhile larger parts of eastern europe crap. 

| Ryzen 7 7800X3D | AM5 B650 Aorus Elite AX | G.Skill Trident Z5 Neo RGB DDR5 32GB 6000MHz C30 | Sapphire PULSE Radeon RX 7900 XTX | Samsung 990 PRO 1TB with heatsink | Arctic Liquid Freezer II 360 | Seasonic Focus GX-850 | Lian Li Lanccool III | Mousepad: Skypad 3.0 XL / Zowie GTF-X | Mouse: Zowie S1-C | Keyboard: Ducky One 3 TKL (Cherry MX-Speed-Silver)Beyerdynamic MMX 300 (2nd Gen) | Acer XV272U | OS: Windows 11 |

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Uttamattamakin said:

Every article about this says Petabit.  Not petabyte.    Converting from bits to bytes is so it can be expressed in terms we really use for storage media.   I used an online converter and gave you a screen shot of it.  I used a different one and gave a screen shot of that.  Write to Wolfram with your concerns.   The point of the article and this post is not harmed or inaccurate or misleading in any way. 

Congratulations on being an ignorant person on the internet.

 

The fact you wrote "write to Wolfram with your concerns" is shows how ignorant you are and lack of any level of understanding.  Did I say Wolfram was wrong?  No, it produced a correct result.  Did the online tool you use produce a correct result?  Yes.  Why was there a difference, because it's an ambiguity of the meaning of the prefix.  So like the ANSI/IEEE 2000 standard had once stated, it's up to the writer to clarify which standard they are choosing to use.  It's not the tools fault you mixed conversions, by using their 125,000 GB followed by calculating in the binary for 1 Pb to 1TB.  You mixed types in a same sentence which is wrong

 

Using a bit instead of byte as their head-lining portion is misleading in their paper.  Generally the public, even storage mediums use...you guessed it BYTES to express capacity.  Even in their comparison chart they use as a figure they utilized bytes for a comparison.  Their paper they literally are quoting a double sided disc as well, so a better comparison when doing a DVD is stating their 800tb claim [100 tB].

 

Your prior posts 100% contained inaccurate/misleading

On 2/21/2024 at 6:52 PM, Uttamattamakin said:

The other way around.   There are bits and bits are bits.  But BYTES can be 1000 or 1024.  Some OS's I think windows calles 1024 a MB.  While OSX and Linux call 1024 a MiB.  So calling it a petabit is good.  

You are either arguing in bad faith, or you really should not be making so confident statements like the above...because anyone who even googles around for 30 seconds would realize how terrible your statement was [and you used these fake facts as a justification against a quote saying it wasn't misleading]

 

Let me say this again, the quote directly above is misleading, and factually WRONG.

 

 

1 hour ago, Uttamattamakin said:

(As an aside.  Consider the above gif is 3.6 MB.  There was a time when it would not have fit in RAM at the same time as ones OS.   It may take a period as long as from 1993 to now for this to be used but it's progress I am confident it will be. 

You are being ignorant of the reality.

 

First, 3.6MB for a gif is a terrible comparison.  gif's even back in their day were not exactly the best formats to store things in (unless it was a comic with a specific 256 color palette).  animated gifs were literally just gifs layered on top of each other.

 

A modern day video file absolutely destroys the gif compression.  I have a video on my computer, ~2500x1500.  It has 236 frames (16 seconds), no audio...it's 1.7MB...and it's better than your gif.  Although if you were to convert your gif to a more modern format like webp, without loss in quality you get 0.56MB

 

You can't simply take what happened in the past and assume the same will happen in the future.

 

Like I said, Moores law as it applies  to things such as storage space is dead.  It's been dead for a good long time now on the HDD medium, and honestly it's feeling like it's starting to be dead for SSD's.  The only meaningful changes that it could apply to would be the enterprise level....because again, unless some revolutionary new technology, consumers won't be be buying systems with much more than even 4TB of storage in them.

 

7-8 years ago, if you bought a big box store PC; you would probably get 1-2 TB of storage.  Today, if you go in and buy a big box store PC you get...1 TB of storage (albeit they moved to SSD's, but clearly space isn't the concerning factor).  Yes there are some with more, but it's still only 4TB at the upper end.

 

No this technology will not help the consumer level...and by the time it could even be relevant other technologies will have had to have taken over.

 

1 hour ago, Uttamattamakin said:

One last thing our beloved message board does not meet your standard of outrage at Mb Mb or mb being capitalized. 

You are lacking critical thinking, when you are specifically comparing two things together in a single sentence it's a lot more important not to mess up the nomenclatures than lets say a web form (especially if you are one who was blaming a conversion tool when it was a PEBKAC).

 

1 hour ago, leadeater said:

This issue is primarily that to move this amount of data you need a sufficient data rate and that is where the size comes from. You need enough servers to handle that data rate and you need a fast enough physical medium to read it from and write it to. So I don't think much space will be saved by doing this on a ~100TB optical disk medium since it would be so painfully slow it wouldn't be practical which leads to requiring multiple servers connected to one or more optical drives leading to need a shipping container still.

 

Moving 100PB of data is simply difficult and the physical medium the data is written to is only one factor.

 

Have you considered how much computational resources would be required to actually read and write 125PB of data in say one month?

I agree with what you are saying, but I think one of the points that wasn't addressed is that when places get to the size of having PB of storage that also gets offsited the cost of gigabit internet is a lot more trivial, and I'd assume if gigabit was a limiting factor it's just about going getting a better connection.

 

CERN showed transferring the data at 6.25 Gbps.  Even LMG has 10 gig internet.  Google as an example has like a 1+ petabit/second connection between their backbones apparently.

 

Honestly, iirc 10 gig internet isn't really that expensive either

 

So yea, even if we are talking about scientific data that won't be able to have incremental changes, those experiments also have the funds to purchase internet speeds where it could be achievable.

 

 

Now for a bit of unrelated thing

Inbound some guesswork math:

If we think about it, we could also roughly calculate the speed at which it would take to read/write.

Given Sony's archival disc as the best analog, it uses a dual laser approach to write on a double sided disc (which this would presumably do).

 

You can get a maximum of 1.5 Gbps write, and 3 Gbps read [with the note, you can only spin a disc so fast and the likely measurement of 1.5 Gbps occurs at the edge of the disc instead of the center which travels slower].

 

Now if you are going off their claims it's at best 1000x density per inch [26 Tb/square inch vs 29 Gb/square inch].  So in the most ideal scenario of where things scale linearly with density (which they don't as was first saw with blu-ray having to start at slower speeds due to consistency of reading data).

 

With that statement, the maximum speed would sit at around 1.5 Tbps and 3 Tbps.

 

So lets go on 1 Pb of data, that's ~11 minutes to write, and ~22 minutes to read.  Totally at least 33 minutes of data IO required for the transfer...plus time it takes to physically move it between locations (so when you consider the logistics etc that will still be an additional few hours...because no point in having a location so close).

 

Now sure, at 1 Gbps that will be 11 days, but at 10 Gbps that's only 1.1 days...and if you really had tons of data to consistently move you would likely just use 100 Gbps connections...so a few hours anyways. [Because even bluray archival discs cost in the hundreds of dollars...if you have to burn the discs that frequently to justify you likely would be able to afford 100 Gbps connection between datacenters]

 

It also doesn't really matter that much if it takes 1 or 2 days to do the initial transfer anyways, when you are talking about the amount of days worked on the project 1 day really doesn't justify moving to a new technology.

3735928559 - Beware of the dead beef

Link to comment
Share on other sites

Link to post
Share on other sites

13 hours ago, wanderingfool2 said:

So while you say, this shouldn't be a debate, then I'd like to point out that computers are very much a binary system, the existence of the term had meant 1024 in a lot of cases...it was taught in schools even (with the caveat the storage industry just played by their own definition of megabyte).  You are effectively suggesting the forcing of a standard that for 30 years prior to the standard had a different meaning.

The only thing I'm asking for is that the units be more clear. The values themselves don't need to change - just how they're displayed to the user(Windows displaying storage space as actual binary, rather than portraying it as decimal for example).

"It pays to keep an open mind, but not so open your brain falls out." - Carl Sagan.

"I can explain it to you, but I can't understand it for you" - Edward I. Koch

Link to comment
Share on other sites

Link to post
Share on other sites

10 minutes ago, Godlygamer23 said:

The only thing I'm asking for is that the units be more clear. The values themselves don't need to change - just how they're displayed to the user(Windows displaying storage space as actual binary, rather than portraying it as decimal for example).

But that's the "debate" part of it.  On the software side of things, it makes a whole lot of sense to formalize things as powers of 2...which is why k, m, g, etc were first used as the binary prefixes.

 

So the question then becomes, why should Windows change?  Software wise you end up with simple bit shifting in order to accomplish things, and general referring to large number limits is better.  Had the ANSI/IEEE 1994 version been followed by hdd manufacturers we likely would have ended up with no ambiguity in this day and age.

 

 

If you are asking for the units to be more clear, then back 24 years ago they should have just said kilo mean 1024 like what was already defined in a standard that was conventionally followed.  You are saying it shouldn't be up for debate, but the way you wrote it you are implying as though Windows is in the wrong when they effectively stuck to the way it was originally created.

 

You stated that RAM manufactures are wrong, but again that shows you effectively have a side picked already.  I'd much rather have an ambiguous system of measurements where it states 8GB ram, and 512 TB of SSD, than have the "truth" of 8 GiB ram and 512 TB of SSD...because having Gi and G while talking about things like that will confuse regular people more so [and honestly will lead to additional misuse as people misattribute GiB to be GB and then people writing GiB instead of GB].

 

It's similar to the confusion about encryption, people hear that something isn't E2EE and people now assume there isn't encryption.

 

3735928559 - Beware of the dead beef

Link to comment
Share on other sites

Link to post
Share on other sites

11 minutes ago, wanderingfool2 said:

But that's the "debate" part of it.  On the software side of things, it makes a whole lot of sense to formalize things as powers of 2...which is why k, m, g, etc were first used as the binary prefixes.

 

So the question then becomes, why should Windows change?  Software wise you end up with simple bit shifting in order to accomplish things, and general referring to large number limits is better.  Had the ANSI/IEEE 1994 version been followed by hdd manufacturers we likely would have ended up with no ambiguity in this day and age.

 

 

If you are asking for the units to be more clear, then back 24 years ago they should have just said kilo mean 1024 like what was already defined in a standard that was conventionally followed.  You are saying it shouldn't be up for debate, but the way you wrote it you are implying as though Windows is in the wrong when they effectively stuck to the way it was originally created.

 

You stated that RAM manufactures are wrong, but again that shows you effectively have a side picked already.  I'd much rather have an ambiguous system of measurements where it states 8GB ram, and 512 TB of SSD, than have the "truth" of 8 GiB ram and 512 TB of SSD...because having Gi and G while talking about things like that will confuse regular people more so [and honestly will lead to additional misuse as people misattribute GiB to be GB and then people writing GiB instead of GB].

 

It's similar to the confusion about encryption, people hear that something isn't E2EE and people now assume there isn't encryption.

Of course I picked a side. I clearly demonstrated the side that I was on, but having clear units being used shouldn't be a debate. Windows and RAM manufacturers are not the ones at fault. It is the industry as a whole allowing the issue to propagate. 

 

You stated that you'd rather keep things ambiguous, than having to explain things to people. The problem is...you already have to explain things to people BECAUSE the units are not clear. I have seen many comments where they claim storage drive manufacturers are not giving them all the space they're advertising on their product page. 

 

Right now, if someone buys a hard drive that's 8TB, and they go into Windows and see 7.27TB, they think they lost space. To them, they lost close to 1TB of space they thought they had...all because Windows states TiB as TB, rather than clearly displaying the values as they actually are, allowing the user to Google "TiB" and then have an understanding of the unit, and maybe get frustrated that they used TiB instead of TB, but now they know, rather than claiming that storage drive manufacturers are falsely advertising their products.

 

Quite frankly, I think you're grasping at straws here. The adjustment time wouldn't be required if we had already totally switched over in the beginning in the first place, and people stopped being stubborn. 

 

Again why this is a debate is seriously beyond me. Humans will be humans, I guess.

"It pays to keep an open mind, but not so open your brain falls out." - Carl Sagan.

"I can explain it to you, but I can't understand it for you" - Edward I. Koch

Link to comment
Share on other sites

Link to post
Share on other sites

6 hours ago, Uttamattamakin said:

100 PB in a single run no.   Scientists don't do single runs.   For something like a particle physics experiment 

By single run I mean a single experiment run which will have multiple things going on and lots of different data collections etc.

 

The most CERN has done over an entire month and this was archiving out to tape is 15.8PB. CERN's entire active data amount, aka on disk, in 2022 was 184.5PB. This is everything they have that hasn't been archived out to tape, which that data amount is 424PB.

 

I meant to give this info last post but forgot 🙃

 

6 hours ago, Uttamattamakin said:

All of this is true now.  The computers of 10 or 20 years from now are a different story.

20 years from now consumer demands and computers won't be effectively 1 million times faster or using 1 million times the data capacity. 20 years ago the fastest super computer was 70 TFLOPs FP64 and another 398 were above 1 TFLOP, a Ryzen 7950X is ~0.9 TFLOP.

 

The progression curve is a lot less in the later half of that 20 years so we won't be having single CPUs or GPUs as fast as the fastest say 10 super computers of today, probably not even close.

 

None of this actually has much to do with how much data consumers will actually need to store and use though. All those super computers I mentioned, they still have vastly more storage than any desktop or laptop of today.

 

Not that I am saying this technology shouldn't have been developed or anything but it simply lacks a use case and demand for it. It's going to take a lot more than just a optical disc with a capacity of 100TB to make anyone interested in it since IBM tapes can already do 150TB. It's not even better than our best archive media of today.

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, wanderingfool2 said:

Now sure, at 1 Gbps that will be 11 days, but at 10 Gbps that's only 1.1 days...and if you really had tons of data to consistently move you would likely just use 100 Gbps connections...so a few hours anyways. [Because even bluray archival discs cost in the hundreds of dollars...if you have to burn the discs that frequently to justify you likely would be able to afford 100 Gbps connection between datacenters]

We have dual diverse 100Gbps connection at each campus as well as guaranteed 100Gbps from NZ to AUS and 100Gbps from NZ to North America. We'd be using that first before entertaining the idea of couriering physical media.

https://www.reannz.co.nz/the-network/reannz-network/

Link to comment
Share on other sites

Link to post
Share on other sites

6 hours ago, Uttamattamakin said:

100 PB in a single run no.   Scientists don't do single runs.   For something like a particle physics experiment  

Just to make it more confusing they don't call the data in bits and bytes in particle physics lingo.  They use inverse femtobarns.  (LOL as of this wasn't all confusing enough.) 

https://atlas.cern/updates/news/atlas-reaches-milestone-5-inverse-femtobarns-data

A "single run" at CERN is about 4 years long - we are currently in the middle of Run 3 of the LHC. So... yeah they would probably store that much in that amount of time.

 

Strictly speaking, Femtobarns is a measure of area - not data. Inverse femtobarns is (roughly speaking) a measure of the number of particle collision events. They don't use bytes because events is what matters to them.

 

6 hours ago, leadeater said:

Those do create a lot of data yes but not in the order of 100+ PB in a single experimentation run. Then further to that they actually want to do something with that data so it sits somewhere actually useful and readable so computation can be run against it so the data is generated and stored close to each other and if it's moved then it's across a very high speed network designed to move that amount of data.

6 hours ago, Uttamattamakin said:

Now the question is how much data can one gather about each collision?  

As I recall they don't record all of it.  Triggering circuits are used to look for the expected physics ...but unexpected physics is also a thing so gathering more and more data is good.    Yeah if there is a cost effective medium that can store Pb and eventually PB's of data we can fill it.   Network speed may improve enough to make this a moot point but at some point we do want to store it 

 

The triggers they use are ML models running on ASICs using the data flowing directly out of the detectors, which identify whether or not a collision is 'interesting' and determine whether or not that data should be stored before it even reaches a CPU core. This is because the vast majority of collisions are completely useless. If you picture two bullets being shot at each other, we only care about the absolutely perfect, head-on collisions. The off-centre hits or glancing blows aren't interesting and so can be ignored.

 

There's then a massive server farm that performs more in-depth analysis of the information, discarding yet more interactions. This entire process is completed in ~0.2s, at which point the interaction is sent for long-term storage.

 

If everything was stored, the raw data thoughput of the ATLAS detector alone would be approaching 100TB (yes terabytes) per second. After all the layers of processing, less than 1 percent of that data is stored. But that's only one of the dozens of experiments being performed at CERN.

 

Data that is stored is stored on magnetic tape, but this is mostly for archival purposes to my knowledge. I believe fresh data is stored on hard disks, so that it can be easily transmitted. They don't send anything out via tape anymore as far as I'm aware - certainly my lecturers never got any. They sell old tapes in the LHC giftshop for 10CHF though!

4 hours ago, wanderingfool2 said:

CERN showed transferring the data at 6.25 Gbps.  Even LMG has 10 gig internet.  Google as an example has like a 1+ petabit/second connection between their backbones apparently.

 

Honestly, iirc 10 gig internet isn't really that expensive either

 

So yea, even if we are talking about scientific data that won't be able to have incremental changes, those experiments also have the funds to purchase internet speeds where it could be achievable.

Big science experiments generally don't send things via the public internet. They use academic networks or NRENs, which are usually completely physically separate to public internet infrastructure.

 

In Europe these have all merged to form one massive network called GÉANT, although national ones like the UK's JANET still operate as a subset of that.

 

image.thumb.png.655d96fa892998ed2ae9dcf870edc0e6.png

 

GÉANT has a throughput of about 7PB/day and offers speeds far beyond those available commercially to big experiments like CERN. But basically every university in Europe has at least one connection onto the network - they claim to have over 10,000 institutions connected. There are also multiple 100Gbit connections going outside of Europe, including across the Atlantic.

 

So no, while data is stored on magnetic tapes for long-term offline storage, it is sent around the world at least predominantly via GÉANT. CERN has access to multiple (I believe 4?) 100Gb connections onto GÉANT as well as connections to the public internet. 10TB of data would only take about 10 minutes to send at that speed and will most definitely be throttled by the connection at the other end. Maybe it's different to institutions on the other side of the world, but at least within Europe, the days of sending CERN data by FedEx are long gone.

CPU: i7 4790k, RAM: 16GB DDR3, GPU: GTX 1060 6GB

Link to comment
Share on other sites

Link to post
Share on other sites

25 minutes ago, tim0901 said:

A "single run" at CERN is about 4 years long - we are currently in the middle of Run 3 of the LHC. So... yeah they would probably store that much in that amount of time.

We're probably all meaning something different by "run" 🙃 Anyway basically trying to portray that amount of data that is actually stored in time scales of a day or month to relate it back to how useful an optical media disc of 100TB would realistically be. Not all that useful to them essentially.

 

25 minutes ago, tim0901 said:

The triggers they use are ML models running on ASICs using the data flowing directly out of the detectors, which identify whether or not a collision is 'interesting' and determine whether or not that data should be stored before it even reaches a CPU core. This is because the vast majority of collisions are completely useless. If you picture two bullets being shot at each other, we only care about the absolutely perfect, head-on collisions. The off-centre hits or glancing blows aren't interesting and so can be ignored.

 

There's then a massive server farm that performs more in-depth analysis of the information, discarding yet more interactions. This entire process is completed in ~0.2s, at which point the interaction is sent for long-term storage.

Awesome point and info, quite easy to not consider in such a discussion that the vast majority of data is processed before actually being stored on physical media like HDD. Even a huge scale HDD based distributed storage system like CERN has is woefully slow in comparison to memory and since what CERN does is "real time" keeping it memory resident is a requirement, even between ASICs and servers/nodes.

 

Quote

Within one year, when the LHC is running, more than one exabyte (the equivalent to 1000 petabytes) of data is being accessed (read or written).

I assume that would be the data that traverses through the HDD storage system.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×