33-46% of Amazon's Mechanical Turk workers estimated to use LLMs to automate their work.

Silverflame · July 7, 2023

Summary

In a paper submitted as pre-print (i.e. awaiting peer-review and publishing) last month, the authors estimate that 33-46% of Amazon's Mechanical Turk (MT) workers use LLMs of some form to be more efficient and/or perform more tasks [0]. MT is a platform where repetitive tasks can be "automated" by hiring a large number of human workers for very little money. This is often used for things like data classification for AI training set preparation, since humans need to label the data manually before the data can be used to train Machine Learning (ML) or Large Language Models on them.

While it is conceivable that some MT users already automated their work to some extent, the arrival of powerful, widely-accessible LLMs like ChatGPT probably greatly increased both the number of humans automating their MT tasks, as well as the extent to which the tasks could be automated. (This is from the more accessible TechChrunch article covering the paper [1])

Quotes

Quote

With the widespread adoption of LLMs, human gold-standard annotations are key to understanding the capabilities of LLMs and the validity of their results. However, crowdsourcing, an important, inexpensive way to obtain human annotations, may itself be impacted by LLMs, as crowd workers have financial incentives to use LLMs to increase their productivity and income. [...] We reran an abstract summarization task from the literature on Amazon Mechanical Turk and [...] estimate that 33-46% of crowd workers used LLMs when completing the task.

My thoughts

"I used the AI to train the AI." (Oh and also somewhat: https://xkcd.com/2494/)

But seriously though, this is simultaneously hilarious and worrying. Having relied on things like Mechanical Turk for human verification and labelling of data, if a significant proportion of such services' workers are now using some automated system to earn more (who can blame them, I'm sure it's not a well paid or fun job), will the next generation of ML or LLM systems be less accurate?
The paper addresses concerns about artificially generated text detection in more detail. It does, reassuringly, seem like the academics behind it were aware of the poor accuracy of most tools that claim to be able to do this, and used a custom one which they double-checked afterwards. Assuming the 33-46% is mostly sound then, that is a significant number of workers which use it. And as these systems get better, more people will undoubtedly use them unless Amazon finds some way to block them, which is going to be tough given the "work from anywhere, anytime"-spirit of Mechanical Turk...

There is also potentially a question of whether we should be worried at all? Recent studies on image classification and computer vision seem to suggest that feeding in generated data actually improves the models further [2, 3, 4]. Could this be the same for LLMs?

Sources

[0]: Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks (Veselovsky et al., 2023; https://arxiv.org/abs/2306.07899)
[1]: https://techcrunch.com/2023/06/14/mechanical-turk-workers-are-using-ai-to-automate-being-human/
[2]: Training on Thin Air: Improve Image Classification with Generated Data (Zhou et al., 2023; https://arxiv.org/abs/2305.15316)
[3]: Leaving Reality to Imagination: Robust Classification via Generated Datasets (Bansal & Grover, 2023; https://arxiv.org/abs/2302.02503)
[4]: A data augmentation perspective on diffusion models and retrieval (Burg et al., 2023; https://arxiv.org/abs/2304.10253)

Taf the Ghost · July 7, 2023

On the first part, new tool can be used to make money; it gets used to make money. That isn't too surprising.

As for the big picture, all iterative data models are going to be prone to becoming self-reverential and losing "significance" to the real space task at hand. It's why statistical significance and relativity are so important. If you understand why they're so important, you also know why we're in the replication crisis right now.

Going forward, I would expect a lot of LLMs to go through the 1% Error Problem. (I think there's a technical name for it, but I would need to track it down.) Each iteration is going to pick up errors and encode them, getting better at being itself each time but slowly accumulating errors that will cause issues at further iterations. Your model will end up being exceptional at what you've feed it, not what you want it to do. And that's going to include information you didn't realize you've given it. You've also picked up error factors with each data set, since the data itself isn't error free.

At some level, we're about to live through a global Garbage In, Garbage Out event in the AI/LLM/Tech space. It's going to be hilarious.

cmndr · July 7, 2023

2 hours ago, Taf the Ghost said:

Going forward, I would expect a lot of LLMs to go through the 1% Error Problem. (I think there's a technical name for it, but I would need to track it down.)

Likely positive feedback loops

https://en.wikipedia.org/wiki/Positive_feedback

or error propagation:
https://en.wikipedia.org/wiki/Propagation_of_uncertainty

Taf the Ghost · July 7, 2023

53 minutes ago, cmndr said:

Likely positive feedback loops

https://en.wikipedia.org/wiki/Positive_feedback

or error propagation:
https://en.wikipedia.org/wiki/Propagation_of_uncertainty

Or both at the same time. Last time I detail with it was in some finance modeling discussions. Models are like SatNav, they're great right up until the moment they tell you to turn into a lake. haha

cmndr · July 7, 2023

2 hours ago, Taf the Ghost said:

Or both at the same time. Last time I detail with it was in some finance modeling discussions. Models are like SatNav, they're great right up until the moment they tell you to turn into a lake. haha

I want to say that error propagation is a specific type of feedback loop.
It's not necessarily the end of the world if the long run pattern is to converge about the true value but... if it's a positive feedback loop... eugh.

"the model is only off by +10% each time"

1.1 -> 1.21 -> 1.33 -> 1.46 -> 1.61 -> 1.77 -> 1.95 -> 2.14 -> 2.36 -> 2.59 -> 2.85 -> 3.14

The good news is after enough iterations it looks like you get Pi. The bad news it isn't Pi and it's supposed to be 1.

dizmo · July 8, 2023

So they're basically proving that their jobs aren't needed. Brilliant.

Stahlmann · July 10, 2023

Humanity is moving closer to the goal that no one has to work, everything is done by AI and robots and everyone has the same universal income.

/s (is it though?)

Sauron · July 10, 2023

On 7/7/2023 at 5:01 PM, Silverflame said:

But seriously though, this is simultaneously hilarious and worrying. Having relied on things like Mechanical Turk for human verification and labelling of data, if a significant proportion of such services' workers are now using some automated system to earn more (who can blame them, I'm sure it's not a well paid or fun job), will the next generation of ML or LLM systems be less accurate?

In my opinion the "next generation" of LLMs will have to get very creative if they hope to get significantly better performance. GPT4 is already trained on a set so large that achieving the orders of magnitude larger set needed for a substantial breakthrough is going to be extremely hard if not impossible.

Don't be worried though, regardless of how human-like these systems appear you should always assume they are lying to you. An LLM does not know what it's talking about and the size of the data set is not related to how "correct" the system ends up being.

On 7/8/2023 at 5:43 AM, dizmo said:

So they're basically proving that their jobs aren't needed. Brilliant.

Their job is labeling text snippets to train LLMs so it is quite important that it's a human doing it if you want good results. The problem is that with this kind of outsourcing to underpaid workers you can never be sure of the quality of those labels and if you tried to check it would just mean doing the whole work again yourself...

Silverflame · July 11, 2023

16 hours ago, Sauron said:

In my opinion the "next generation" of LLMs will have to get very creative if they hope to get significantly better performance. GPT4 is already trained on a set so large that achieving the orders of magnitude larger set needed for a substantial breakthrough is going to be extremely hard if not impossible.

Yes, we'll need a paradigm shift in how we implement or approach LLMs for significant improvements on GPT4 to be seen. IIRC, wan show a month or two ago talked about how OpenAI had already said that there were significantly diminishing returns in GPT5...

And Meta and others have achieved incredible results with smaller sets and fewer parameters. So new technology of some sorts seems to be the next required step. There are some interesting ideas out there like Forward-Forward [0], but it's basically just a concept for now.

16 hours ago, Sauron said:

Don't be worried though, regardless of how human-like these systems appear you should always assume they are lying to you. An LLM does not know what it's talking about and the size of the data set is not related to how "correct" the system ends up being.

I mean given that we as humans can't even agree what "truth" is, it's hard to imagine how we'd teach/program an AI to distinguish. Alignment and all that.
And at the point where AI can reason about truth, correctness, nuance, etc. we probably have much bigger ethical questions concerning such a system ^^;;

[0]: The Forward-Forward Algorithm: Some Preliminary Investigations (Hinton, 2022; https://arxiv.org/abs/2212.13345)

Sauron · July 11, 2023

18 minutes ago, Silverflame said:

I mean given that we as humans can't even agree what "truth" is, it's hard to imagine how we'd teach/program an AI to distinguish. Alignment and all that.

You have alignment issues when you accidentally give your AI the wrong objective; in this case however the AI does do what it was built to do, which is to generate text that could pass as being written by a human and isn't giving you necessarily accurate information. It's like building a tractor and being surprised because it can't fly.

21 minutes ago, Silverflame said:

And at the point where AI can reason about truth, correctness, nuance, etc. we probably have much bigger ethical questions concerning such a system ^^;;

I don't think it's inconceivable to build an AI system that is virtually always truthful, it just needs to take the information from known good (or "good enough") sources; it probably wouldn't be as close to human writing patterns though. It all depends on what you're looking for, but consider that instead of asking an AI chatbot something like "who was the president of the US in 1856" I could just open a wikipedia page and get the same information with less ambiguity.

suicidalfranco · July 12, 2023

On 7/10/2023 at 5:02 PM, Stahlmann said:

Humanity is moving closer to the goal that no one has to work, everything is done by AI and robots and everyone has the same universal income.

This should become a circle of hell in addition to Dante's interpretation of hell in the divina commedia

Senzelian · July 13, 2023

On 7/7/2023 at 5:01 PM, Silverflame said:

This is often used for things like data classification for AI training set preparation,

So humen are using AI to train AI.
Alrighty then.

Sign In

33-46% of Amazon's Mechanical Turk workers estimated to use LLMs to automate their work.

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Featured Topics

Topics

Latest From Linus Tech Tips:

I shouldn’t have kept the $1,000,000 computer

Latest From Tech Quickie:

This Guy BUILT His Own Graphics Card!

Latest From TechLinked:

Microsoft, Give Up Already.

Latest From GameLinked:

Roblox and Walmart... Are One

Latest From ShortCircuit:

Dell Has Destroyed the XPS - Dell XPS 16 (2024)

Latest From Mac Address:

Why did you buy an Apple Vision Pro?

Latest From Channel Super Fun:

I Swapped the CEO's Assistant For a Day!