Jump to content

Two thousand year old scroll decoded using machine learning

Summary

scroll-1-scale.jpg

 

Several paragraphs of text have been decoded from the inside of a scroll that was buried during the eruption of Mount Vesuvius in AD 79. These scrolls were too fragile to physically unroll, so researchers took high resolution CT scans of the scrolls, and released the data to the public, promising $700,000 to the first team to decode 4 passages from the inside of the scroll, based on the scan (along with a number of other prizes along the way), before the end of 2023.

 

Quotes

Quote

There was one submission that stood out clearly from the rest. Working independently, each member of our team of papyrologists recovered more text from this submission than any other. Remarkably, the entry achieved the criteria we set when announcing the Vesuvius Challenge in March: 4 passages of 140 characters each, with at least 85% of characters recoverable. This was not a given: most of us on the organizing team assigned a less than 30% probability of success when we announced these criteria! And in addition, the submission includes another 11 (!) columns of text — more than 2000 characters total.

 

The results of this review were clear and unanimous: the Vesuvius Challenge Grand Prize of $700,000 is awarded to a team of three for their excellent submission. Congratulations to Youssef Nader, Luke Farritor, and Julian Schilliger!

 

youssef_text_wbb.png

 

The submission contains results from three different model architectures, each supporting the findings of the others, with the strongest images often coming from a TimeSformer-based model. Multiple measures prevent overfitting and hallucination, including results from multiple architectures, a study across input/output window sizes, label smoothing, and varying validation folds. Like with all our prizes, this ink detection code has been made public as open source (on GitHub), leveling up everyone in the community.

Quote

What does the scroll say?

 

To date, our efforts have managed to unroll and read about 5% of the first scroll. Our eminent team of papyrologists has been hard at work and has achieved a preliminary transcription of all the revealed columns. We now know that this scroll is not a duplicate of an existing work; it contains never-before-seen text from antiquity. The papyrology team are preparing to deliver a comprehensive study as soon as they can. You all gave them a lot of work to do! Initial readings already provide glimpses into this philosophical text. From our scholars:

The general subject of the text is pleasure, which, properly understood, is the highest good in Epicurean philosophy. In these two snippets from two consecutive columns of the scroll, the author is concerned with whether and how the availability of goods, such as food, can affect the pleasure which they provide.
Do things that are available in lesser quantities afford more pleasure than those available in abundance? Our author thinks not: “as too in the case of food, we do not right away believe things that are scarce to be absolutely more pleasant than those which are abundant.” However, is it easier for us naturally to do without things that are plentiful? “Such questions will be considered frequently.”
Since this is the end of a scroll, this phrasing may suggest that more is coming in subsequent books of the same work. At the beginning of the first text, a certain Xenophantos is mentioned, perhaps the same man — presumably a musician — also mentioned by Philodemus in his work On Music.

Philodemus, of the Epicurean school, is thought to have been the philosopher-in-residence of the villa, working in the small library in which the scrolls were found.

Quote

Vesuvius Challenge Stage 2

In 2023 we got from 0% to 5% of a scroll. In 2024 our goal is to go from 5% of one scroll, to 90% of all four scrolls we have scanned, and to lay the foundation to read all 800 scrolls.

The primary goal for 2024 is to read 90% of the scrolls, and we will issue the 2024 Grand Prize to the first team that is able to do this. More details on the exact grand prize judging criteria will be available in March.

 

My thoughts

This is an achievement that could only have been done with machine learning, and the technical feat here can't be understated. It has taken a lot of work to get this far, it's incredible to see what the community can achieve when it's given a goal like this. The $1M+ prize pool (donated by various mostly rich people) certainly helped to incentivise people to participate, and it will be interesting to see if this model gets adopted for any other projects in the future. If they are able to achieve their goal of extending this technology to read all 800 scrolls, this will be a big breakthrough in our understanding of Ancient Rome, let alone the other potential places where this technology could be used.

 

Sources

Official announcement: https://scrollprize.org/grandprize

Edited by colonel_mortis
Add details about what the scroll says

HTTP/2 203

Link to post
Share on other sites

is it just me or does the scroll look a bit like shit? (literally)

Message me on discord (bread8669) for more help 

Quote me if you want me to get notified

 

Current parts listPCPartPicker Part List

CPU: AMD Ryzen 5 7600 3.8 GHz 6-Core Processor  (Purchased For £175.00) 
CPU Cooler: Thermalright Phantom Spirit 120 SE ARGB 66.17 CFM CPU Cooler  (Purchased For £0.00) 
Motherboard: MSI PRO B650M-A WIFI Micro ATX AM5 Motherboard  (Purchased For £144.99) 
Memory: Corsair Vengeance 32 GB (2 x 16 GB) DDR5-6000 CL30 Memory  (Purchased For £89.99) 
Storage: Crucial P5 Plus 500 GB M.2-2280 PCIe 4.0 X4 NVME Solid State Drive  (Purchased For £0.00) 
Storage: Kingston A400 960 GB 2.5" Solid State Drive  (Purchased For £0.00) 
Video Card: Gigabyte GAMING OC Radeon RX 7800 XT 16 GB Video Card  (Purchased For £448.99) 
Case: Lian Li LANCOOL 205M MESH MicroATX Mini Tower Case  (Purchased For £82.98) 
Power Supply: MSI MAG A850GL PCIE5 850 W 80+ Gold Certified Fully Modular ATX Power Supply  (Purchased For £99.00) 
Total: £1040.95

 

 

 

 

 

 

 

 

 


 

 

 

 

 

Damn this space can fit a 5090 (just kidding, it needs more)

Link to post
Share on other sites

Really impressive! The pictures of the swirly mess that the researchers started with really shows how impressive this is!

image.thumb.png.f42bd35f6d431b66bbd3c0b37834f97b.png

14 minutes ago, OhYou_ said:

well, what does the scroll say?

It says on the scroll website. Just scroll down a little.

https://scrollprize.org/grandprize

Link to post
Share on other sites

Quote

Multiple measures prevent overfitting and hallucination, including results from multiple architectures, a study across input/output window sizes, label smoothing, and varying validation folds.

if only all AI projects had this much courtesy to double-check results.

Link to post
Share on other sites

Amazing, it's philosopher beef.

 

I wonder if people 2000 years from now will be this excited about digging up harddrives with twitter arguments

What the horse considers play, the monkey considers business...

But to Tom, it's all foolery. 

 

 

 

 

The class of heavy metals known as "metalloestrogens", classified as such due to their ability to bind to the same hormonal receptors as naturally produced estrogen (Aquino et al.), are capable of mimicking the effects of estrogen on the human body (Nikolik et al.). Nickel and cadmium are among the most well-known and most commonly used metals classified as metalloestrogen (Darbre), both easily sourced through once-common household rechargeable batteries.

Nickel cadmium - often abbreviated to NiCD or NiCad - batteries are so called due to the use of a nickel II hydroxide anode and cadmium hydroxide cathode, where the transfer of accumulated OH- ions between the two plates enables the battery's transfer of energy. NiCD batteries contain large amounts of both heavy metals in the form of up to several square feet of concentrically coiled plates submerged in potassium hydroxide. Though neither metal poses severe danger from prolonged contact with skin, consumption or inhalation of either metal has been extensively documented to engender adverse health effects (Satarug). 

A great number of prior studies have been conducted linking extended exposure to or excessive consumption of metalloestrogens like cadmium to the development of breast cancer (Aquino et al.) - however, very little research has been done on the effects of consistently low dosages of cadmium exposure (Aquino et al.). Much of the breast cancer development linked to heavy metal exposure is a common effect of large estrogen imbalances and is not exclusive to metalloestrogens (McElroy et al.). Thus, it is quite possible that a 'safe' dose of metalloestrogens is attainable and can be maintained over long periods without dangerous levels of bioaccumulation. 

Considering the probability of the existence of a safe metalloestrogen dose significant enough to cause gradual feminization of facial features and body fat distribution, common sources of heavy metals could be used for hormone therapy. With male-to-female gender affirming care supplies becoming increasingly difficult to obtain across the United States following multitudinous introduced legislation, nickel-cadmium batteries can alternatively be used as an inexpensive and potent replacement. 

 

Works Cited

      Aquino NB, Sevigny MB, Sabangan J, Louie MC. The role of cadmium and nickel in estrogen receptor signaling and breast cancer: metalloestrogens or not? J Environ Sci Health C Environ Carcinog Ecotoxicol Rev. 2012;30(3):189-224. doi: 10.1080/10590501.2012.705159. PMID: 22970719; PMCID: PMC3476837.

      Rollerova, E., Urbancikova, N. Intracellular estrogen receptors, their characterization and function (Review). https://www.sav.sk/journals/endo/full/er0400f.pdf.

      Nikolic J, Sokolovic D. Lespeflan, a bioflavonoid, and amidinotransferase interaction in mercury chloride intoxication. Ren Fail. 2004 Nov;26(6):607-11. doi: 10.1081/jdi-200037149. PMID: 15600250.

      Darbre PD. Metalloestrogens: an emerging class of inorganic xenoestrogens with potential to add to the oestrogenic burden of the human breast. J Appl Toxicol. 2006 May-Jun;26(3):191-7. doi: 10.1002/jat.1135. PMID: 16489580.

      Satarug S, Garrett SH, Sens MA, Sens DA. Cadmium, environmental exposure, and health outcomes. Environ Health Perspect. 2010 Feb;118(2):182-90. doi: 10.1289/ehp.0901234. PMID: 20123617; PMCID: PMC2831915.

      McElroy JA, Shafer MM, Trentham-Dietz A, Hampton JM, Newcomb PA. Cadmium exposure and breast cancer risk. J Natl Cancer Inst. 2006 Jun 21;98(12):869-73. doi: 10.1093/jnci/djj233. PMID: 16788160.

Link to post
Share on other sites

1 hour ago, da na said:

Amazing, it's philosopher beef.

 

I wonder if people 2000 years from now will be this excited about digging up harddrives with twitter arguments

I would still say that it's a fascinating and humorous look into our past lol. I myself find it fascinating looking back even within my current lifetime to see how things have come, especially in light of how my understanding has changed over the years with various topics. 

 

Sometimes older people find my fascinating offensive, almost like I'm comparing them to cavemen, but in reality, it's a genuine fascination with an era that I wasn't part of. 

"It pays to keep an open mind, but not so open your brain falls out." - Carl Sagan.

"I can explain it to you, but I can't understand it for you" - Edward I. Koch

"I didn't die! I performed a tactical reset!" - Apollolol

Link to post
Share on other sites

10 hours ago, TheLANguy said:

scroll website. Just scroll down a little.

Ba Dum Tss! on Make a GIF

Message me on discord (bread8669) for more help 

Quote me if you want me to get notified

 

Current parts listPCPartPicker Part List

CPU: AMD Ryzen 5 7600 3.8 GHz 6-Core Processor  (Purchased For £175.00) 
CPU Cooler: Thermalright Phantom Spirit 120 SE ARGB 66.17 CFM CPU Cooler  (Purchased For £0.00) 
Motherboard: MSI PRO B650M-A WIFI Micro ATX AM5 Motherboard  (Purchased For £144.99) 
Memory: Corsair Vengeance 32 GB (2 x 16 GB) DDR5-6000 CL30 Memory  (Purchased For £89.99) 
Storage: Crucial P5 Plus 500 GB M.2-2280 PCIe 4.0 X4 NVME Solid State Drive  (Purchased For £0.00) 
Storage: Kingston A400 960 GB 2.5" Solid State Drive  (Purchased For £0.00) 
Video Card: Gigabyte GAMING OC Radeon RX 7800 XT 16 GB Video Card  (Purchased For £448.99) 
Case: Lian Li LANCOOL 205M MESH MicroATX Mini Tower Case  (Purchased For £82.98) 
Power Supply: MSI MAG A850GL PCIE5 850 W 80+ Gold Certified Fully Modular ATX Power Supply  (Purchased For £99.00) 
Total: £1040.95

 

 

 

 

 

 

 

 

 


 

 

 

 

 

Damn this space can fit a 5090 (just kidding, it needs more)

Link to post
Share on other sites

Can we appreciate how amazing CT have gotten? Being able to resolve 5µm thick layer of ink rolled up into a scroll.

The Declaration of Independence, once the charter of democracy, begins by saying that certain things are self-evident. If we were to trace the history of the American mind from Thomas Jefferson to William James, we should find that fewer and fewer things were self-evident, until at last hardly anything is self-evident. (G. K. Chesterton - Aug. 14 1926 (The Illustrated London News))

Link to post
Share on other sites

8 hours ago, FlyingPotato_is_taken said:

Can we appreciate how amazing CT have gotten? Being able to resolve 5µm thick layer of ink rolled up into a scroll.

i agree, that's the impressive feat here, how the computer algo is called that they used to make sense of it seems less relevant. 

The direction tells you... the direction

-Scott Manley, 2021

 

 

Link to post
Share on other sites

On 2/6/2024 at 1:19 PM, colonel_mortis said:

 

My thoughts

This is an achievement that could only have been done with machine learning, and the technical feat here can't be understated. It has taken a lot of work to get this far, it's incredible to see what the community can achieve when it's given a goal like this. The $1M+ prize pool (donated by various mostly rich people) certainly helped to incentivise people to participate, and it will be interesting to see if this model gets adopted for any other projects in the future. If they are able to achieve their goal of extending this technology to read all 800 scrolls, this will be a big breakthrough in our understanding of Ancient Rome, let alone the other potential places where this technology could be used.

 

Sources

Official announcement: https://scrollprize.org/grandprize

This is what AI should be focusing on, translation/transcription, because that is most helpful for us. Any form of accessibility enabled by AI is plus. There is not enough people in the world that would be willing to learn some obscure language just for the sake of translating and transcribing some long dead documents.

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×