Jump to content

Summary

Meta has illegally downloaded dozens of terabytes from Z-library, Libgen and other similar sites to use it to train their AI models.

 

Sources

Video below from youtuber Mental Outlaw.

 

And...

https://www.pcgamer.com/gaming-industry/court-documents-show-not-only-did-meta-torrent-terabytes-of-pirated-books-to-train-ai-models-employees-wouldnt-stop-emailing-each-other-about-it-torrenting-from-a-corporate-laptop-doesnt-feel-right/

 

https://arstechnica.com/tech-policy/2025/02/meta-torrented-over-81-7tb-of-pirated-books-to-train-ai-authors-say/

 

My thoughts

How much money does Meta make?

And yet they don't want to pay for the book material.... 😠

Another example of corporate greed. Why should the regular home user feel guilty of pirating something little when Meta has pirated billions of books?

Edited by Mumintroll
Added more sources, adjusted the layout

I usually edit my posts.

Refresh the page before answering to my post.

Link to post
Share on other sites

Sadly nothing will happen

CPU AMD 5800x_____Asus Crosshair VIII_____Asus Strix LC 360_____RAM Corsair Dominator Pro 2x8Gb 3600mhz_____ASUS RTX 3080 Strix

PSU Corsair HX1000w_____CASE Lian Li 011 Dynamic (original choice right? w/9 UNI Fans)_____Keyboard Razer BlackWidow Chroma_____Mouse Razer Deathadder Chroma_____Headphones Bose QC25_____Monitor (1) Acer Predator XB1 144hz G-Sync  (2) Benq 144hz G-Sync

Microphone Blue Yeti Black

Razer Blade 14

Also an XBOX one s.

 

 

Link to post
Share on other sites

1 hour ago, Mumintroll said:

Meta has illegally downloaded dozens of terabytes from Z-library, Libgen and other similar sites to use it to train their AI models.

I don't have a problem with it as long as Facebook releases llama models open for EVERYONE to use. Taking, transforming and giving back is fair use as far as I am concerned.

I have a problem with OpenAI that takes everything, but gives nothing back.

Link to post
Share on other sites

The problem here is not what the material is used for.

The problem is about the piracy of it.

 

The amount of literature that was torrented is huge, but for a rich company like Meta they could've afforded to buy it from publishers.

But money is all that matters to them, that's how greed works.

 

The goverments and institutions that enforce the law expects us to follow them. They expect us to feel guilty and ashamed if we pirated a movie or a game, or shared it among friends and family.

 

Some goes even further claiming that adblocking youtube videos is piracy, and we should feel guilty about doing it.

 

Does the well dressed suits at Meta feel guilty?

What will they get? A fine to pay?

What could a person in USA get for piracy? Max 5 years prison and $250.000 fine?

 

A society more based on open source and free media is preferable.

 

Edited by Mumintroll
Correction

I usually edit my posts.

Refresh the page before answering to my post.

Link to post
Share on other sites

9 minutes ago, Salted Spinach said:

 

Oh I thought I read it as free open source, did you edit?

I usually edit my posts.

Refresh the page before answering to my post.

Link to post
Share on other sites

20 hours ago, 05032-Mendicant-Bias said:

I don't have a problem with it as long as Facebook releases llama models open for EVERYONE to use. Taking, transforming and giving back is fair use as far as I am concerned.

If I "take" something to transform and reuse it, then I paid to "take" it in the first place. That's the issue with just pirating everything to use as training data, not what happens after that. People have to remember that these corpos could easily licence all that material. They just choose not to, in the hopes that nobody will notice or cause enough of a stink about it. They already tried protecting AI generated slop, so they see the value in copyright only in as long as they get to benefit from it.

And now a word from our sponsor: 💩

-.-. --- --- .-.. --..-- / -.-- --- ..- / -.- -. --- .-- / -- --- .-. ... . / -.-. --- -.. .

ᑐᑌᑐᑢ

Spoiler

    ▄██████                                                      ▄██▀

  ▄█▀   ███                                                      ██

▄██     ███                                                      ██

███   ▄████  ▄█▀  ▀██▄    ▄████▄     ▄████▄     ▄████▄     ▄████▄██   ▄████▄

███████████ ███     ███ ▄██▀ ▀███▄ ▄██▀ ▀███▄ ▄██▀ ▀███▄ ▄██▀ ▀████ ▄██▀ ▀███▄

████▀   ███ ▀██▄   ▄██▀ ███    ███ ███        ███    ███ ███    ███ ███    ███

 ██▄    ███ ▄ ▀██▄██▀    ███▄ ▄██   ███▄ ▄██   ███▄ ▄███  ███▄ ▄███▄ ███▄ ▄██

  ▀█▄    ▀█ ██▄ ▀█▀     ▄ ▀████▀     ▀████▀     ▀████▀▀██▄ ▀████▀▀██▄ ▀████▀

       ▄█ ▄▄      ▄█▄  █▀            █▄                   ▄██  ▄▀

       ▀  ██      ███                ██                    ▄█

          ██      ███   ▄   ▄████▄   ██▄████▄     ▄████▄   ██   ▄

          ██      ███ ▄██ ▄██▀ ▀███▄ ███▀ ▀███▄ ▄██▀ ▀███▄ ██ ▄██

          ██     ███▀  ▄█ ███    ███ ███    ███ ███    ███ ██  ▄█

        █▄██  ▄▄██▀    ██  ███▄ ▄███▄ ███▄ ▄██   ███▄ ▄██  ██  ██

        ▀███████▀    ▄████▄ ▀████▀▀██▄ ▀████▀     ▀████▀ ▄█████████▄

 

Link to post
Share on other sites

Come on they would never do something like this.

| Ryzen 7 7800X3D | AM5 B650 Aorus Elite AX | G.Skill Trident Z5 Neo RGB DDR5 32GB 6000MHz C30 | Sapphire PULSE Radeon RX 7900 XTX | Samsung 990 PRO 1TB with heatsink | Arctic Liquid Freezer II 360 | Seasonic Focus GX-850 | Lian Li Lanccool III | Zowie GTF-X | Mouse: Vaxee XE wired | Keyboard: Ducky One 3 TKL (Cherry MX-Speed-Silver)Beyerdynamic MMX 300 (2nd Gen) | LG 32GS95UV-B OLED 4K 240Hz / 1080p 480Hz dual-mode | OS: Windows 11 |

Link to post
Share on other sites

10 hours ago, Mumintroll said:

The problem here is not what the material is used for.

The problem is about the piracy of it.

 

The amount of literature that was torrented is huge, but for a rich company like Meta they could've afforded to buy it from publishers.

But money is all that matters to them, that's how greed works.

 

The goverments and institutions that enforce the law expects us to follow them. They expect us to feel guilty and ashamed if we pirated a movie or a game, or shared it among friends and family.

 

Some goes even further claiming that adblocking youtube videos is piracy, and we should feel guilty about doing it.

 

Does the well dressed suits at Meta feel guilty?

What will they get? A fine to pay?

What could a person in USA get for piracy? Max 5 years prison and $250.000 fine?

 

A society more based on open source and free media is preferable.

The general issue I have with all this is that by holding AI to copyright laws in regards to consuming media we are effectively giving the money to the rich and powerful...or to the companies that exist outside of the laws of copyright (e.g. Japan it's perfectly legal to do what Meta has done).

 

What can start happening is when models get good enough, they could start lobbying to have the copyright laws more enforced and now you are stuck with the top players always being the top players.

 

Anyways, another issue becomes what do you think should be paid. Lets assume an average e-book is 10 MiB, in reality it's smaller.  Now 35 TiB is the lower end number used for infringements...that implies 3670016.  Fun fact as well, a lot of authors wouldn't just allow their works to be consumed by AI for cheaper than the book price (actually in fact free use purchases can go well above that...like thousands).  It could feasibly cost them billions...but those numbers also include scholarly journals...some of those journals you can only get with like $10,000 subscriptions.  Realistically that amount could cost the bulk of their revenue.

 

At the moment as well, AI really isn't too much of a money maker either.  So ROI is effectively none for them.  In this particular case though, it's all whether or not something falls under fair use...but the torrenting part is where they would get into most trouble as are technically distributing the data as well then.

 

 

But this is just my opinion though, we should allow AI to be trained on any input (whether obtained through piracy or not).  The exception to this is distillation, but that's a whole other subject matter.  For regular things though, I think that AI should be allowed to train on it without the cost of paying for the copyright (again there are some exceptions, but things like books, source code etc I think should be fair game)

3735928559 - Beware of the dead beef

Link to post
Share on other sites

6 minutes ago, wanderingfool2 said:

But this is just my opinion though, we should allow AI to be trained on any input (whether obtained through piracy or not).

I disagree. Until copyright is completely overhauled and AI is mandatorily nationalized, no AI research in private hands should ever be allowed to be trained on anything that private hand hasn't licensed.

And now a word from our sponsor: 💩

-.-. --- --- .-.. --..-- / -.-- --- ..- / -.- -. --- .-- / -- --- .-. ... . / -.-. --- -.. .

ᑐᑌᑐᑢ

Spoiler

    ▄██████                                                      ▄██▀

  ▄█▀   ███                                                      ██

▄██     ███                                                      ██

███   ▄████  ▄█▀  ▀██▄    ▄████▄     ▄████▄     ▄████▄     ▄████▄██   ▄████▄

███████████ ███     ███ ▄██▀ ▀███▄ ▄██▀ ▀███▄ ▄██▀ ▀███▄ ▄██▀ ▀████ ▄██▀ ▀███▄

████▀   ███ ▀██▄   ▄██▀ ███    ███ ███        ███    ███ ███    ███ ███    ███

 ██▄    ███ ▄ ▀██▄██▀    ███▄ ▄██   ███▄ ▄██   ███▄ ▄███  ███▄ ▄███▄ ███▄ ▄██

  ▀█▄    ▀█ ██▄ ▀█▀     ▄ ▀████▀     ▀████▀     ▀████▀▀██▄ ▀████▀▀██▄ ▀████▀

       ▄█ ▄▄      ▄█▄  █▀            █▄                   ▄██  ▄▀

       ▀  ██      ███                ██                    ▄█

          ██      ███   ▄   ▄████▄   ██▄████▄     ▄████▄   ██   ▄

          ██      ███ ▄██ ▄██▀ ▀███▄ ███▀ ▀███▄ ▄██▀ ▀███▄ ██ ▄██

          ██     ███▀  ▄█ ███    ███ ███    ███ ███    ███ ██  ▄█

        █▄██  ▄▄██▀    ██  ███▄ ▄███▄ ███▄ ▄██   ███▄ ▄██  ██  ██

        ▀███████▀    ▄████▄ ▀████▀▀██▄ ▀████▀     ▀████▀ ▄█████████▄

 

Link to post
Share on other sites

1 hour ago, Avocado Diaboli said:

I disagree. Until copyright is completely overhauled and AI is mandatorily nationalized, no AI research in private hands should ever be allowed to be trained on anything that private hand hasn't licensed.

The issue with that is that by taking such stance, you risk eroding the "transformative work" argument that I believe this has a decent chance of falling under. 

 

It's easy to go "Meta can pay, so they should pay" without considering the further implications of such a decision. 

Link to post
Share on other sites

1 hour ago, Avocado Diaboli said:

I disagree. Until copyright is completely overhauled and AI is mandatorily nationalized, no AI research in private hands should ever be allowed to be trained on anything that private hand hasn't licensed.

Except that you already have countries that allow for it, so by disallowing it you are in essence slowing down the development locally.

 

Overall as well, there are plenty of copyright holders who value their work in too high of a regard when it comes to having their work trained on by "AI".

 

Take this analogy, should we disallow humans to learn based on viewing artwork?  What about humans to learn by going to stack overflow and looking at how others solved things...or looking at github repos to see how things were done.  If a human is allowed doing it, then why shouldn't an AI be allowed to do it?  After all, as well, you can have humans who look at something and create near recreations of it even after years of not seeing it (Mario's theme being a key example of this...a human essentially copied green green despite probably not hearing it for like a decade).

 

Generally it creates that issue that where does one draw the line in "learning".  If someone created a real neuron dish (Looking at thought emporium here), would that be allowed to consume works without having a license.  Overall the media is still being transformed so I think should fall under fair use.

3735928559 - Beware of the dead beef

Link to post
Share on other sites

2 hours ago, LAwLz said:

The issue with that is that by taking such stance, you risk eroding the "transformative work" argument that I believe this has a decent chance of falling under. 

 

It's easy to go "Meta can pay, so they should pay" without considering the further implications of such a decision. 

 

AI research is transformative and permitted. It's the commercial use of the AI after it has consumed it under the research permission that should be forbidden. If you make an AI model that has consumed anything under fair use, then the model itself must be accessed under the same permissions for which it consumed the content in the first place. If you create a model on only content you created, then you're not obligated to release it. 

 

Take a simple situation. Someone writes a joke book. It's on the market for 4 days before someone rips it and throws it on a piracy site, then AI doesn't know it's a joke book and consumes it. When someone searches for a joke that exists in it, inside a LLM search, the LLM somehow reproduces 100% of the joke, and the book is still in print. Why would you ever by the book?

 

More to the point people are using LLM's to generate new "ai slop" books so Amazon will pay them, and will also make their own audiobook versions and stick them on youtube to receive ad revenue. All reproducing verbatim the materials from this book without crediting the source.

 

That is the problem with LLM's. Citations and credit disappear, errors/hallucinations happen to the material so it's unreliable, and people end up using it to launder content.

 

 

Link to post
Share on other sites

The shadow that bred them can only mock, it cannot make: not real new things of its own. I don't think it gave life to the AIs, it only ruined them, and twisted them.

Caroline doesn't need to hear all this, she's a highly trained professional.

Link to post
Share on other sites

8 hours ago, LAwLz said:

The issue with that is that by taking such stance, you risk eroding the "transformative work" argument that I believe this has a decent chance of falling under. 

 

It's easy to go "Meta can pay, so they should pay" without considering the further implications of such a decision. 

7 hours ago, wanderingfool2 said:

Except that you already have countries that allow for it, so by disallowing it you are in essence slowing down the development locally.

 

Overall as well, there are plenty of copyright holders who value their work in too high of a regard when it comes to having their work trained on by "AI".

 

Take this analogy, should we disallow humans to learn based on viewing artwork?  What about humans to learn by going to stack overflow and looking at how others solved things...or looking at github repos to see how things were done.  If a human is allowed doing it, then why shouldn't an AI be allowed to do it?  After all, as well, you can have humans who look at something and create near recreations of it even after years of not seeing it (Mario's theme being a key example of this...a human essentially copied green green despite probably not hearing it for like a decade).

 

Generally it creates that issue that where does one draw the line in "learning".  If someone created a real neuron dish (Looking at thought emporium here), would that be allowed to consume works without having a license.  Overall the media is still being transformed so I think should fall under fair use.

 

By that logic, I'm allowed to break every copyright in the world by accessing everything ever made for free, because I am the one using it in a transformative way. I'm learning by consuming, enriching my pool of knowledge and experience that I then employ to shape the world around me with my thoughts and actions. This is where the analogy of training an AI being akin to human learning and therefore falling under fair use completely breaks down, even before we get to the fact that access to these models gets monetized in many scenarios. If I want to learn things that aren't public domain, I need to license the media that contain that information. That's what I do when I buy books and movies, I license them for personal use and whatever I do with the information afterward is not subject to being further compensated to the copyright holder of the work I licensed.

And now a word from our sponsor: 💩

-.-. --- --- .-.. --..-- / -.-- --- ..- / -.- -. --- .-- / -- --- .-. ... . / -.-. --- -.. .

ᑐᑌᑐᑢ

Spoiler

    ▄██████                                                      ▄██▀

  ▄█▀   ███                                                      ██

▄██     ███                                                      ██

███   ▄████  ▄█▀  ▀██▄    ▄████▄     ▄████▄     ▄████▄     ▄████▄██   ▄████▄

███████████ ███     ███ ▄██▀ ▀███▄ ▄██▀ ▀███▄ ▄██▀ ▀███▄ ▄██▀ ▀████ ▄██▀ ▀███▄

████▀   ███ ▀██▄   ▄██▀ ███    ███ ███        ███    ███ ███    ███ ███    ███

 ██▄    ███ ▄ ▀██▄██▀    ███▄ ▄██   ███▄ ▄██   ███▄ ▄███  ███▄ ▄███▄ ███▄ ▄██

  ▀█▄    ▀█ ██▄ ▀█▀     ▄ ▀████▀     ▀████▀     ▀████▀▀██▄ ▀████▀▀██▄ ▀████▀

       ▄█ ▄▄      ▄█▄  █▀            █▄                   ▄██  ▄▀

       ▀  ██      ███                ██                    ▄█

          ██      ███   ▄   ▄████▄   ██▄████▄     ▄████▄   ██   ▄

          ██      ███ ▄██ ▄██▀ ▀███▄ ███▀ ▀███▄ ▄██▀ ▀███▄ ██ ▄██

          ██     ███▀  ▄█ ███    ███ ███    ███ ███    ███ ██  ▄█

        █▄██  ▄▄██▀    ██  ███▄ ▄███▄ ███▄ ▄██   ███▄ ▄██  ██  ██

        ▀███████▀    ▄████▄ ▀████▀▀██▄ ▀████▀     ▀████▀ ▄█████████▄

 

Link to post
Share on other sites

9 minutes ago, Avocado Diaboli said:

By that logic, I'm allowed to break every copyright in the world by accessing everything ever made for free, because I am the one using it in a transformative way. I'm learning by consuming, enriching my pool of knowledge and experience that I then employ to shape the world around me with my thoughts and actions. This is where the analogy of training an AI being akin to human learning and therefore falling under fair use completely breaks down, even before we get to the fact that access to these models gets monetized in many scenarios. If I want to learn things that aren't public domain, I need to license the media that contain that information. That's what I do when I buy books and movies, I license them for personal use and whatever I do with the information afterward is not subject to being further compensated to the copyright holder of the work I licensed.

The act of using material for educational use is already codified into the law though.  It's actually why teachers are technically allowed lets say taking a diagram, photocopying it and distributing it to the class.  There is nuance to it though, but generally educational material is generally afforded a greater leeway in regards to being allowed to be copyrighted without getting the proper licensing for it.

 

e.g. Under fair use, a person is able to utilize a clip of music in their video if they are explaining the music clip and teaching in regards to the music clip  be in the clear (although there have been some notable takedowns where people have filed a content ID...but that's apart from the DMCA).

 

The act of educational material is a bit murky though, but yea...generally if you download something for the sole purpose of education it can be a legal defense.  The issue in that case though is that you won't find any judge who will uphold the idea that you were using just enough to serve the purpose for which you needed.

 

Regulate the output not the input.

3735928559 - Beware of the dead beef

Link to post
Share on other sites

1 hour ago, wanderingfool2 said:

The act of using material for educational use is already codified into the law though.  It's actually why teachers are technically allowed lets say taking a diagram, photocopying it and distributing it to the class.  There is nuance to it though, but generally educational material is generally afforded a greater leeway in regards to being allowed to be copyrighted without getting the proper licensing for it.

 

e.g. Under fair use, a person is able to utilize a clip of music in their video if they are explaining the music clip and teaching in regards to the music clip  be in the clear (although there have been some notable takedowns where people have filed a content ID...but that's apart from the DMCA).

 

The act of educational material is a bit murky though, but yea...generally if you download something for the sole purpose of education it can be a legal defense.  The issue in that case though is that you won't find any judge who will uphold the idea that you were using just enough to serve the purpose for which you needed.

 

Regulate the output not the input.

That's only true for using copyrighted materials to teach things to others, not for people to learn. Otherwise, university textbooks would be freely available and wouldn't cost an arm and a leg and piracy of those textbooks wouldn't get prosecuted.

 

Again, the core issue comes down to how copyright actually works. Right now, you have the incentive to create things in order to generate revenue with that and, equally importantly, stop others from just distributing your work for profit themselves and you can demand credit for your creations. If you grant AI companies the exception to train the model they intend to make money off of for free on all that copyrighted content, which, nota bene, also threatens those creators after the model has been trained, you break that protection. So again, until copyright is completely overhauled and, ideally, we move away from capitalism altogether, private AI companies do not deserve an ounce of data that they didn't get permission for.

And now a word from our sponsor: 💩

-.-. --- --- .-.. --..-- / -.-- --- ..- / -.- -. --- .-- / -- --- .-. ... . / -.-. --- -.. .

ᑐᑌᑐᑢ

Spoiler

    ▄██████                                                      ▄██▀

  ▄█▀   ███                                                      ██

▄██     ███                                                      ██

███   ▄████  ▄█▀  ▀██▄    ▄████▄     ▄████▄     ▄████▄     ▄████▄██   ▄████▄

███████████ ███     ███ ▄██▀ ▀███▄ ▄██▀ ▀███▄ ▄██▀ ▀███▄ ▄██▀ ▀████ ▄██▀ ▀███▄

████▀   ███ ▀██▄   ▄██▀ ███    ███ ███        ███    ███ ███    ███ ███    ███

 ██▄    ███ ▄ ▀██▄██▀    ███▄ ▄██   ███▄ ▄██   ███▄ ▄███  ███▄ ▄███▄ ███▄ ▄██

  ▀█▄    ▀█ ██▄ ▀█▀     ▄ ▀████▀     ▀████▀     ▀████▀▀██▄ ▀████▀▀██▄ ▀████▀

       ▄█ ▄▄      ▄█▄  █▀            █▄                   ▄██  ▄▀

       ▀  ██      ███                ██                    ▄█

          ██      ███   ▄   ▄████▄   ██▄████▄     ▄████▄   ██   ▄

          ██      ███ ▄██ ▄██▀ ▀███▄ ███▀ ▀███▄ ▄██▀ ▀███▄ ██ ▄██

          ██     ███▀  ▄█ ███    ███ ███    ███ ███    ███ ██  ▄█

        █▄██  ▄▄██▀    ██  ███▄ ▄███▄ ███▄ ▄██   ███▄ ▄██  ██  ██

        ▀███████▀    ▄████▄ ▀████▀▀██▄ ▀████▀     ▀████▀ ▄█████████▄

 

Link to post
Share on other sites

23 hours ago, Avocado Diaboli said:

If I "take" something to transform and reuse it, then I paid to "take" it in the first place. That's the issue with just pirating everything to use as training data, not what happens after that. People have to remember that these corpos could easily licence all that material. They just choose not to, in the hopes that nobody will notice or cause enough of a stink about it. They already tried protecting AI generated slop, so they see the value in copyright only in as long as they get to benefit from it.

My ideological take on this goes as follow:

  • I can read a book in a public library, take inspiration and make a derivative book on that. Even sell it if I like.
  • I can look at copyrighted images and films that are playing out on demo screens make derivative images and films on that. Even sell it if I like.
  • I can look and play an application on apple store, play store, steam etc... and make my derivative version of that. Even sell it if I like.
  • I can click on the buttton of a camera to snap a picture of something that is not mine, and I own the copyright to that picture.
  • I do all of that using tools. Openoffice for writing, gimp for image editing, shotcut for video editing, etc...

The above is stuff that our civilization agreed is cool, fair use, vocational/professional training, creating and is celebrated. What's not cool is making a partial copy of someone's work, and selling that.

 

What's happening now is that I can also improve on the above tools to do a automate a portion of workflow above with an ML model. I can also use a tool that incorporates ML tools developed by others to the same end results. The result is transformative, it isn't an exact/partial copy of anything. It is fair use, it is worth of copyright and can be sold if the creator wishes to do so.

The reason I'm fine with it, is because everyone everywhere benefits from better automation, even the people in the industry that gets automated. It's a trend that has gone unbroken for millennia. What propelled our civilization from hunter gatherer to (hopefully) space faring.

 

My ONLY condition to all this to the corporations, is that ML developed this way should be released openly. Training data, the weights and the censorship applied. Corpos took the total sum of everything our civilization has produced, and they aren't entitled to keep it to themselves, they aren't entitled to distilling that into an ML model and selling access behind closed APIs as far as I am concerned.

 

The world is growing increasingly complex. We sorely need tools to tackle the problems that our civilization face. E.g. it's obvious game theory prevents humanity from tackling climate change. ML models can help nail down that plasma physics and get us a viable fusion generator. E.g. what about ML models working a million years in exploring every photocatalyst to find one that is cheap and green and uses sunlight to scrub co2? E.g. ML models already did hundreds of millennia worth of protein folding, finding the structure of 200 million proteins, when the painstaking work of researchers folded 100 thousands of them. And that folding work was instrumental in folding everything else.

 

TLDR:


I'm cool with Facebook scraping human knowledge, and giving me llama 3.2 to run on my laptop.

 

I'm not cool with OpenAI scraping human knowledge, and demanding a subscription to access that knowledge.

 

3 hours ago, wanderingfool2 said:

The act of using material for educational use is already codified into the law though.  It's actually why teachers are technically allowed lets say taking a diagram, photocopying it and distributing it to the class.  There is nuance to it though, but generally educational material is generally afforded a greater leeway in regards to being allowed to be copyrighted without getting the proper licensing for it.

1 hour ago, Avocado Diaboli said:

That's only true for using copyrighted materials to teach things to others, not for people to learn. Otherwise, university textbooks would be freely available and wouldn't cost an arm and a leg and piracy of those textbooks wouldn't get prosecuted.

Some universities have a copy shop nearby where you get the copies of the books needed 😉

 

I'm of the belief that education is a great equalizer. More advanced countries with higher social mobility also have better education systems. Current LLM tools are heaving an earthshattering impact on education, it's one of the places where they help immensely professors with large classes. LLM have infinite patience, and have reasonable accuracy on simpler topics. It's one of the reason we need better lighter higher performance models that run locally, not fewer.

 

Link to post
Share on other sites

2 minutes ago, 05032-Mendicant-Bias said:
  • I can read a book in a public library, take inspiration and make a derivative book on that. Even sell it if I like.

You paid for that through taxes. Libraries also license the material they get.

2 minutes ago, 05032-Mendicant-Bias said:
  • I can look at copyrighted images and films publicly and make derivative images and films on that. Even sell it if I like.

People who display these things publicly have licensed them for that use.

2 minutes ago, 05032-Mendicant-Bias said:
  • I can look and play an application on apple store, play store, steam etc... and make my derivative version of that. Even sell it if I like.

You licensed that by either buying the app or subjecting yourself to ads.

2 minutes ago, 05032-Mendicant-Bias said:
  • I can click on the buttton of a camera to snap a picture of something that is not mine, and I own the copyright to that picture.

But you don't own the copyright to what it depicts. Try photographing the Eiffel tower at night with its lights on and selling that picture.

 

5 minutes ago, 05032-Mendicant-Bias said:

I'm cool with Facebook scraping human knowledge, and giving me llama 3.2 to run on my laptop.

 

I'm not cool with OpenAI scraping human knowledge, and demanding a subscription to access that knowledge.

I'm not cool with either doing it. Private hands don't get the benefit of the doubt. 

And now a word from our sponsor: 💩

-.-. --- --- .-.. --..-- / -.-- --- ..- / -.- -. --- .-- / -- --- .-. ... . / -.-. --- -.. .

ᑐᑌᑐᑢ

Spoiler

    ▄██████                                                      ▄██▀

  ▄█▀   ███                                                      ██

▄██     ███                                                      ██

███   ▄████  ▄█▀  ▀██▄    ▄████▄     ▄████▄     ▄████▄     ▄████▄██   ▄████▄

███████████ ███     ███ ▄██▀ ▀███▄ ▄██▀ ▀███▄ ▄██▀ ▀███▄ ▄██▀ ▀████ ▄██▀ ▀███▄

████▀   ███ ▀██▄   ▄██▀ ███    ███ ███        ███    ███ ███    ███ ███    ███

 ██▄    ███ ▄ ▀██▄██▀    ███▄ ▄██   ███▄ ▄██   ███▄ ▄███  ███▄ ▄███▄ ███▄ ▄██

  ▀█▄    ▀█ ██▄ ▀█▀     ▄ ▀████▀     ▀████▀     ▀████▀▀██▄ ▀████▀▀██▄ ▀████▀

       ▄█ ▄▄      ▄█▄  █▀            █▄                   ▄██  ▄▀

       ▀  ██      ███                ██                    ▄█

          ██      ███   ▄   ▄████▄   ██▄████▄     ▄████▄   ██   ▄

          ██      ███ ▄██ ▄██▀ ▀███▄ ███▀ ▀███▄ ▄██▀ ▀███▄ ██ ▄██

          ██     ███▀  ▄█ ███    ███ ███    ███ ███    ███ ██  ▄█

        █▄██  ▄▄██▀    ██  ███▄ ▄███▄ ███▄ ▄██   ███▄ ▄██  ██  ██

        ▀███████▀    ▄████▄ ▀████▀▀██▄ ▀████▀     ▀████▀ ▄█████████▄

 

Link to post
Share on other sites

6 minutes ago, Avocado Diaboli said:

You paid for that through taxes. Libraries also license the material they get

6 minutes ago, Avocado Diaboli said:

I'm not cool with either doing it. Private hands don't get the benefit of the doubt. 

I'm fine with nations using taxes to nationalize ML models and providing foundation models themselves.

As a matter of fact I prefer it over entrusting private firms to keep releasing the GGUF.

 

Right now it's propelled by private corpos because investors like burning money on buying Nvidia accelerators with no clear path to profitability. Their result sits inside my laptop, so I'm cool having VC subsidized software :3

 

Hopefully that's what the European AI initiative will result in. National foundational ML model built with everything humanity have and known, founded by taxes, and released free for the public to benefit from.

Link to post
Share on other sites

I don't think we're disagreeing on the "let it learn" angle, we're only disagreeing on what to do with it once it has learned.

 

Gating the AI model and the code to run it behind a paywall, means that you MUST license every single thing ingested by it, because that is a COMMERCIALLY PRODUCED model. You don't get to take other copyrighted works and just reproduce it by chance or on purpose. Which is what people are doing right now. "read this wikipedia article and generate a 300 page textbook for me to publish on amazon". "Read this novel about a travelling salesman and change his name from Joseph to Ted." The AI LLM's are very good at plagiarism. They however DO NOT create. They have no understanding of sarcasm, nuance, emotions, metaphors, onomatopoeia, etc. They don't know the practical reasons why all mammals can't be replaced with another. If someone wrote a novel about a Cat, and then the LLM was told to make it about a Dog, what would it actually write? It would merely substitute "Cat" with "Dog", but not know that "purr" and "bark" are animal specific sounds.

 

Like people keep thinking LLM's are fricken magic, and they aren't. They're dumb as rocks, and require a lot of human intervention to make it not be such. If we ever get to a point where we can train a GenAI by just feeding it human-made media (eg books, that have to be read visually, videos that have to be watched and listened to, music that have to be listened to) without having to label it, then we will have a break through. Till then I don't think these "AI"'s are commercially viable and no company with them should be permitted to charge money for access to it, if it contains even a single still-under-copyright work that was not licensed in perpetuity for it. Let users run it locally on their own systems, and "BigAI" can rent out GPU time for customers to run these models on, but please Meta, OpenAI, Google, Microsoft, don't insult me by trying to keep the AI models and software private.

 

 

Link to post
Share on other sites

4 hours ago, Kisai said:

They however DO NOT create. They have no understanding of sarcasm, nuance, emotions, metaphors, onomatopoeia, etc. They don't know the practical reasons why all mammals can't be replaced with another. If someone wrote a novel about a Cat, and then the LLM was told to make it about a Dog, what would it actually write? It would merely substitute "Cat" with "Dog", but not know that "purr" and "bark" are animal specific sounds.

I would say that in effect they do have some forms of "sarcasm", "nuance" etc.  You are giving a very simplified example proclaiming that it will merely substitute but even a basic level test would have showing you are wrong that it will just merely substitute cat with dog

 

Quote

My cat likes to purr at the sight of birds and is such a lovely cat.

Can you replace cat with dog

Quote

My dog likes to purr at the sight of birds and is such a lovely dog.

Would you like me to adjust "purr" to something more fitting for a dog, like "wag its tail" or "bark softly"?

 

But hey, it doesn't know that "purr" and "bark" are animal specific.

 

If you were to to actually have a full novel about a cat, and said you wanted to change the character to a dog I bet that in the majority of cases it would replace all those "animal specific" things with appropriate variants.

 

Yes, it does "plagiarize" or comes up with things that are really similar...but you know what, so do humans after they have been exposed to material.  Again one of the famous Mario themes was "plagiarized", same with MGS' iconic theme ("Sviridov Winter Road").  A lot of famous books still had a lot of the story based around folklore etc.  Look at some of the black mirror episodes, there have been other shows that have effectively done very similar concepts as that as well.

 

I'd say that if you give AI a specific thing to write about, it would often write works that aren't really wouldn't be considered copyright infringement.

 

 

Overall the way that it works, by building essentially a web of words it to an extent has some "understanding" of things.  While it might not act like a human, it shouldn't necessarily disqualify the fact that it can to an extent be "creative".

 

4 hours ago, Kisai said:

If we ever get to a point where we can train a GenAI by just feeding it human-made media (eg books, that have to be read visually, videos that have to be watched and listened to, music that have to be listened to) without having to label it, then we will have a break through

You can already present some models with images and video and it will "understand" what is occurring and be able to explain and to an extent "adapt" on what is being presented.

3735928559 - Beware of the dead beef

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×