Jump to content

33-46% of Amazon's Mechanical Turk workers estimated to use LLMs to automate their work.

Silverflame

Summary

In a paper submitted as pre-print (i.e. awaiting peer-review and publishing) last month, the authors estimate that 33-46% of Amazon's Mechanical Turk (MT) workers use LLMs of some form to be more efficient and/or perform more tasks [0]. MT is a platform where repetitive tasks can be "automated" by hiring a large number of human workers for very little money. This is often used for things like data classification for AI training set preparation, since humans need to label the data manually before the data can be used to train Machine Learning (ML) or Large Language Models on them.

While it is conceivable that some MT users already automated their work to some extent, the arrival of powerful, widely-accessible LLMs like ChatGPT probably greatly increased both the number of humans automating their MT tasks, as well as the extent to which the tasks could be automated. (This is from the more accessible TechChrunch article covering the paper [1])

 

Quotes

Quote

With the widespread adoption of LLMs, human gold-standard annotations are key to understanding the capabilities of LLMs and the validity of their results. However, crowdsourcing, an important, inexpensive way to obtain human annotations, may itself be impacted by LLMs, as crowd workers have financial incentives to use LLMs to increase their productivity and income. [...] We reran an abstract summarization task from the literature on Amazon Mechanical Turk and [...] estimate that 33-46% of crowd workers used LLMs when completing the task.

 

My thoughts

"I used the AI to train the AI." (Oh and also somewhat: https://xkcd.com/2494/)
Flawed Data

But seriously though, this is simultaneously hilarious and worrying. Having relied on things like Mechanical Turk for human verification and labelling of data, if a significant proportion of such services' workers are now using some automated system to earn more (who can blame them, I'm sure it's not a well paid or fun job), will the next generation of ML or LLM systems be less accurate?
The paper addresses concerns about artificially generated text detection in more detail. It does, reassuringly, seem like the academics behind it were aware of the poor accuracy of most tools that claim to be able to do this, and used a custom one which they double-checked afterwards. Assuming the 33-46% is mostly sound then, that is a significant number of workers which use it. And as these systems get better, more people will undoubtedly use them unless Amazon finds some way to block them, which is going to be tough given the "work from anywhere, anytime"-spirit of Mechanical Turk...

 

There is also potentially a question of whether we should be worried at all? Recent studies on image classification and computer vision seem to suggest that feeding in generated data actually improves the models further [2, 3, 4]. Could this be the same for LLMs?

 

Sources

[0]: Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks (Veselovsky et al., 2023; https://arxiv.org/abs/2306.07899)
[1]: https://techcrunch.com/2023/06/14/mechanical-turk-workers-are-using-ai-to-automate-being-human/
[2]: Training on Thin Air: Improve Image Classification with Generated Data (Zhou et al., 2023; https://arxiv.org/abs/2305.15316)
[3]: Leaving Reality to Imagination: Robust Classification via Generated Datasets (Bansal & Grover, 2023; https://arxiv.org/abs/2302.02503)
[4]: A data augmentation perspective on diffusion models and retrieval (Burg et al., 2023; https://arxiv.org/abs/2304.10253)

Link to comment
Share on other sites

Link to post
Share on other sites

On the first part, new tool can be used to make money; it gets used to make money. That isn't too surprising.

 

As for the big picture, all iterative data models are going to be prone to becoming self-reverential and losing "significance" to the real space task at hand. It's why statistical significance and relativity are so important. If you understand why they're so important, you also know why we're in the replication crisis right now.

 

Going forward, I would expect a lot of LLMs to go through the 1% Error Problem. (I think there's a technical name for it, but I would need to track it down.) Each iteration is going to pick up errors and encode them, getting better at being itself each time but slowly accumulating errors that will cause issues at further iterations. Your model will end up being exceptional at what you've feed it, not what you want it to do. And that's going to include information you didn't realize you've given it. You've also picked up error factors with each data set, since the data itself isn't error free.

 

At some level, we're about to live through a global Garbage In, Garbage Out event in the AI/LLM/Tech space.  It's going to be hilarious. 

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, Taf the Ghost said:

Going forward, I would expect a lot of LLMs to go through the 1% Error Problem. (I think there's a technical name for it, but I would need to track it down.)

Likely positive feedback loops

https://en.wikipedia.org/wiki/Positive_feedback

 

or error propagation:
https://en.wikipedia.org/wiki/Propagation_of_uncertainty

3900x | 32GB RAM | RTX 2080

1.5TB Optane P4800X | 2TB Micron 1100 SSD | 16TB NAS w/ 10Gbe
QN90A | Polk R200, ELAC OW4.2, PB12-NSD, SB1000, HD800
 

Link to comment
Share on other sites

Link to post
Share on other sites

53 minutes ago, cmndr said:

Or both at the same time. Last time I detail with it was in some finance modeling discussions.  Models are like SatNav, they're great right up until the moment they tell you to turn into a lake. haha

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, Taf the Ghost said:

Or both at the same time. Last time I detail with it was in some finance modeling discussions.  Models are like SatNav, they're great right up until the moment they tell you to turn into a lake. haha

I want to say that error propagation is a specific type of feedback loop.
It's not necessarily the end of the world if the long run pattern is to converge about the true value but... if it's a positive feedback loop... eugh. 

"the model is only off by +10% each time"

1.1 -> 1.21 -> 1.33 -> 1.46 -> 1.61 -> 1.77 -> 1.95 -> 2.14 -> 2.36 -> 2.59 -> 2.85 -> 3.14

 

The good news is after enough iterations it looks like you get Pi. The bad news it isn't Pi and it's supposed to be 1. 

3900x | 32GB RAM | RTX 2080

1.5TB Optane P4800X | 2TB Micron 1100 SSD | 16TB NAS w/ 10Gbe
QN90A | Polk R200, ELAC OW4.2, PB12-NSD, SB1000, HD800
 

Link to comment
Share on other sites

Link to post
Share on other sites

So they're basically proving that their jobs aren't needed. Brilliant. 

CPU: Ryzen 9 5900 Cooler: EVGA CLC280 Motherboard: Gigabyte B550i Pro AX RAM: Kingston Hyper X 32GB 3200mhz

Storage: WD 750 SE 500GB, WD 730 SE 1TB GPU: EVGA RTX 3070 Ti PSU: Corsair SF750 Case: Streacom DA2

Monitor: LG 27GL83B Mouse: Razer Basilisk V2 Keyboard: G.Skill KM780 Cherry MX Red Speakers: Mackie CR5BT

 

MiniPC - Sold for $100 Profit

Spoiler

CPU: Intel i3 4160 Cooler: Integrated Motherboard: Integrated

RAM: G.Skill RipJaws 16GB DDR3 Storage: Transcend MSA370 128GB GPU: Intel 4400 Graphics

PSU: Integrated Case: Shuttle XPC Slim

Monitor: LG 29WK500 Mouse: G.Skill MX780 Keyboard: G.Skill KM780 Cherry MX Red

 

Budget Rig 1 - Sold For $750 Profit

Spoiler

CPU: Intel i5 7600k Cooler: CryOrig H7 Motherboard: MSI Z270 M5

RAM: Crucial LPX 16GB DDR4 Storage: Intel S3510 800GB GPU: Nvidia GTX 980

PSU: Corsair CX650M Case: EVGA DG73

Monitor: LG 29WK500 Mouse: G.Skill MX780 Keyboard: G.Skill KM780 Cherry MX Red

 

OG Gaming Rig - Gone

Spoiler

 

CPU: Intel i5 4690k Cooler: Corsair H100i V2 Motherboard: MSI Z97i AC ITX

RAM: Crucial Ballistix 16GB DDR3 Storage: Kingston Fury 240GB GPU: Asus Strix GTX 970

PSU: Thermaltake TR2 Case: Phanteks Enthoo Evolv ITX

Monitor: Dell P2214H x2 Mouse: Logitech MX Master Keyboard: G.Skill KM780 Cherry MX Red

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

Humanity is moving closer to the goal that no one has to work, everything is done by AI and robots and everyone has the same universal income.

/s (is it though?)

If someone did not use reason to reach their conclusion in the first place, you cannot use reason to convince them otherwise.

Link to comment
Share on other sites

Link to post
Share on other sites

On 7/7/2023 at 5:01 PM, Silverflame said:

But seriously though, this is simultaneously hilarious and worrying. Having relied on things like Mechanical Turk for human verification and labelling of data, if a significant proportion of such services' workers are now using some automated system to earn more (who can blame them, I'm sure it's not a well paid or fun job), will the next generation of ML or LLM systems be less accurate?

In my opinion the "next generation" of LLMs will have to get very creative if they hope to get significantly better performance. GPT4 is already trained on a set so large that achieving the orders of magnitude larger set needed for a substantial breakthrough is going to be extremely hard if not impossible.

 

Don't be worried though, regardless of how human-like these systems appear you should always assume they are lying to you. An LLM does not know what it's talking about and the size of the data set is not related to how "correct" the system ends up being.

On 7/8/2023 at 5:43 AM, dizmo said:

So they're basically proving that their jobs aren't needed. Brilliant. 

Their job is labeling text snippets to train LLMs so it is quite important that it's a human doing it if you want good results. The problem is that with this kind of outsourcing to underpaid workers you can never be sure of the quality of those labels and if you tried to check it would just mean doing the whole work again yourself...

Don't ask to ask, just ask... please 🤨

sudo chmod -R 000 /*

Link to comment
Share on other sites

Link to post
Share on other sites

16 hours ago, Sauron said:

In my opinion the "next generation" of LLMs will have to get very creative if they hope to get significantly better performance. GPT4 is already trained on a set so large that achieving the orders of magnitude larger set needed for a substantial breakthrough is going to be extremely hard if not impossible.

Yes, we'll need a paradigm shift in how we implement or approach LLMs for significant improvements on GPT4 to be seen. IIRC, wan show a month or two ago talked about how OpenAI had already said that there were significantly diminishing returns in GPT5...

And Meta and others have achieved incredible results with smaller sets and fewer parameters. So new technology of some sorts seems to be the next required step. There are some interesting ideas out there like Forward-Forward [0], but it's basically just a concept for now.

16 hours ago, Sauron said:

Don't be worried though, regardless of how human-like these systems appear you should always assume they are lying to you. An LLM does not know what it's talking about and the size of the data set is not related to how "correct" the system ends up being.

I mean given that we as humans can't even agree what "truth" is, it's hard to imagine how we'd teach/program an AI to distinguish. Alignment and all that.
And at the point where AI can reason about truth, correctness, nuance, etc. we probably have much bigger ethical questions concerning such a system ^^;;


[0]: The Forward-Forward Algorithm: Some Preliminary Investigations (Hinton, 2022; https://arxiv.org/abs/2212.13345)

Link to comment
Share on other sites

Link to post
Share on other sites

18 minutes ago, Silverflame said:

I mean given that we as humans can't even agree what "truth" is, it's hard to imagine how we'd teach/program an AI to distinguish. Alignment and all that.

You have alignment issues when you accidentally give your AI the wrong objective; in this case however the AI does do what it was built to do, which is to generate text that could pass as being written by a human and isn't giving you necessarily accurate information. It's like building a tractor and being surprised because it can't fly.

21 minutes ago, Silverflame said:

And at the point where AI can reason about truth, correctness, nuance, etc. we probably have much bigger ethical questions concerning such a system ^^;;

I don't think it's inconceivable to build an AI system that is virtually always truthful, it just needs to take the information from known good (or "good enough") sources; it probably wouldn't be as close to human writing patterns though. It all depends on what you're looking for, but consider that instead of asking an AI chatbot something like "who was the president of the US in 1856" I could just open a wikipedia page and get the same information with less ambiguity.

Don't ask to ask, just ask... please 🤨

sudo chmod -R 000 /*

Link to comment
Share on other sites

Link to post
Share on other sites

On 7/10/2023 at 5:02 PM, Stahlmann said:

Humanity is moving closer to the goal that no one has to work, everything is done by AI and robots and everyone has the same universal income.

This should become a circle of hell in addition to Dante's interpretation of hell in the divina commedia

One day I will be able to play Monster Hunter Frontier in French/Italian/English on my PC, it's just a matter of time... 4 5 6 7 8 9 years later: It's finally coming!!!

Phones: iPhone 4S/SE | LG V10 | Lumia 920 | Samsung S24 Ultra

Laptops: Macbook Pro 15" (mid-2012) | Compaq Presario V6000

Other: Steam Deck

<>EVs are bad, they kill the planet and remove freedoms too some/<>

Link to comment
Share on other sites

Link to post
Share on other sites

On 7/7/2023 at 5:01 PM, Silverflame said:

This is often used for things like data classification for AI training set preparation,

So humen are using AI to train AI.
Alrighty then.

 

 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×