Don't Anthropomorphize Large Language Models (they hate that!)

Thomas A. Fine · May 26

Old man venting another pet peeve here. I wish Linus and Luke (and basically most of the world) would stop referring to large language models (LLMs) producing non-useful text as "hallucinations". It only contributes to the general public misunderstanding that LLMs are anything at all approaching something like real artificial intelligence.

LLMs just produce text statistically similar to their lexicon. They really aren't even knowledge bases and are certainly not logic engines. It's no surprise when they can produce seemingly accurate responses to very common questions, but it's also no surprise when they say bizarre things in response to things that are rarely mentioned in the lexicon. Calling these bad answers "hallucinations" makes it sound like we only need to make these AI models a bit more sane, or a bit more human-like. But that's not the case at all. They're simply incapable of producing useful responses to many questions.

Corporations who are selling LLMs as if they were natural-language knowledge bases fully know this truth (well, the mid-level engineers know this, the profit making top level execs may or may not get it). But they don't care if corner case questions result in dangerous answers, because (I assume) they expect that these bizarre corner case answers will probably only injure or kill a very small number of people relative to the large profit they hope to make.

This just makes the use of the term "hallucinations" even worse. Not only does it confuse people about the nature of LLMs, it also lets the giant corporations that are selling them off the hook for the entirely predictable results we are seeing.

manikyath · May 26

45 minutes ago, Thomas A. Fine said:

referring to large language models (LLMs) producing non-useful text as "hallucinations"

but.. that is just the term chosen to express what it is doing. it is not any different from calling repeated brief dips in framerate "stuttering".

the idea behind using that term is it makes it very easy to understand what is going wrong. you feed the LLM a query, and it produces something that has no basis in reality with a very high degree of confidence in the same way that a person that is hallucinating might 'see things' with a very high degree of confidence.

i'm gonna use an argument i've been using an awful lot recently... if you fall over the word "hallucination" being used by youtubers around LLM's.. you should've been far more upset about other things far before we got to shouting at LMG for their use of what is essentially an industry term.

the fact an LLM is being used in a search engine, the fact there's companies profiteering off LLM 'virtual girlfriends', the fact these are being trained with copyrighted materials from non-conscenting parties, the fact the makers of some of these LLM's are the ones breeding said misunderstanding in their marketing materials from the very roots of the industry... in all of this the word used to describe "the human brain producing garbage" being used to describe "software producing garbage" is the least of anyone's problem..

or.. perhaps instead of complaining you should offer up a better term to use instead.

podkall · May 26

2 hours ago, manikyath said:

or.. perhaps instead of complaining you should offer up a better term to use instead.

you could say the LLM is "assuming the correct answer (based on it's available information)", but "hallucinating" rolls off the tongue better

Blue4130 · May 26

2 hours ago, manikyath said:

Or.. perhaps instead of complaining you should offer up a better term to use instead.

how about "mistake".

eg - I asked Chat GPT what the moon was made from. It replied that it was made of cheese. It made a mistake.

Nah, That would be too easy and logical.

Kilrah · May 26

47 minutes ago, Blue4130 said:

I asked Chat GPT what the moon was made from. It replied that it was made of cheese. It made a mistake.

"Mistake" is fine for a simple wrong point but isn't very appropriate/descriptive for the usual case where "hallucination" is used though, where it spits out a detailed "explanation" that's wrong in so many different ways or a mishmash of completely unrelated things.

manikyath · May 26

2 hours ago, Blue4130 said:

how about "mistake".

eg - I asked Chat GPT what the moon was made from. It replied that it was made of cheese. It made a mistake.

Nah, That would be too easy and logical.

but doesnt a "mistake" sound just as much as a "human" thing as "hallucination", while being less descriptive?

on that note.. calling it a mistake goes against one of the very core rules of IT: "software doesnt make mistakes, people do".

Needfuldoer · May 26

Should we not call the processing of a mountain of stolen content "training"?

manikyath · May 26

3 minutes ago, Needfuldoer said:

Should we not call the processing of a mountain of stolen content "training"?

there's no rail vehicles involved, so this is obviously choo-morphization of digital theft.

Doobeedoo · May 26

Laughs in HAL 9000

thevictor390 · May 28

I've never liked it either but we are stuck with it now. "Hallucination" implies a departure from reality but there was never a "reality" to depart from. They hallucinate full time, and we can only hope to wrangle that hallucination into something that resembles reality as often as possible.

Kisai · May 28

On 5/25/2024 at 11:14 PM, Thomas A. Fine said:

Old man venting another pet peeve here. I wish Linus and Luke (and basically most of the world) would stop referring to large language models (LLMs) producing non-useful text as "hallucinations".

This is the word that the LLM creators (eg OpenAI) have picked to describe incorrect information that "sounds like it could be right". English does not have a word for when a machine spits out a response that needs to be vetted by a real person to be true. Basically all we have are anthropomorphized words like "hallucination", "dream", "pretend", "imagination", all of which an AI doesn't actually do.

When people on a forum spit out a wrong answer, people go "uh huh, anyway" and move on, ignoring obviously stupid information from people who aren't trusted. When a trusted person spits out a clearly wrong answer, we go "Is it april fools day? Have you been hacked? Is this Satire?" Nobody assumes for a minute that sudden behavior change is anything but intentional clowning with their friends.

But a "LLM" is supposed to just "know everything" and thus when it spits out something that is supposed to be an honest answer and is anything but, we've been calling it a hallucination, because it's something that people on mind altering drugs do.

emothxughts · May 29

Watching clips of AI VTuber Neuro-sama lately, I beg to differ.

DeerDK · May 29

What wrong with "error"

The software has errors, or the output has potential to contain errors

manikyath · May 29

6 hours ago, DeerDK said:

What wrong with "error"

The software has errors, or the output has potential to contain errors

because "error" is non-descriptive of *what* the error is. there's many errors, essentially everything that is not expected behavior is "an error".

imagine being at the service desk of literally anything, and your clients just go "there is an error" for every support call. giving people words they understand the meaning of to describe these errors is at the basis of average joe being able to recognize "an error" when it happens.

DeerDK · May 29

1 hour ago, manikyath said:

because "error" is non-descriptive of *what* the error is. there's many errors, essentially everything that is not expected behavior is "an error".

imagine being at the service desk of literally anything, and your clients just go "there is an error" for every support call. giving people words they understand the meaning of to describe these errors is at the basis of average joe being able to recognize "an error" when it happens.

True, but it is a computer program, is it not? It encounters errors, or produces faulty results.

What do we usually call that? I ask in honesty, I'm not a native English speaker.

I'd say, glitches, faults or errors, but I'm open to more technical language

thevictor390 · May 30

17 hours ago, DeerDK said:

True, but it is a computer program, is it not? It encounters errors, or produces faulty results.

What do we usually call that? I ask in honesty, I'm not a native English speaker.

I'd say, glitches, faults or errors, but I'm open to more technical language

Error is correct but also too broad. Hallucination could be said to be an AI error. But, if you just say the AI had an error/bug/glitch, it does not explain what happened.

All of these things are errors:

AI outputs wrong information (assuming it is a fact-finding AI like ChatGPT)

AI outputs a refusal to answer a valid question

AI outputs nothing

AI outputs random characters

AI outputs an error code

AI interface crashes or freezes

Only one of those is "hallucination." It is one type of error.

Sauron · May 30

I agree with the general sentiment but "hallucination" is a pretty good shorthand for non technical people to get what you're talking about.

2 minutes ago, thevictor390 said:

AI outputs wrong information (assuming it is a fact-finding AI like ChatGPT)

Arguably this isn't even an error, the software (i.e. the LLM) is working as designed by outputting text that resembles human writing. Returning true information isn't part of the system's goals... which makes it pretty worrying that it's what it keeps getting marketed for

thevictor390 · May 30

1 minute ago, Sauron said:

I agree with the general sentiment but "hallucination" is a pretty good shorthand for non technical people to get what you're talking about.

Arguably this isn't even an error, the software (i.e. the LLM) is working as designed by outputting text that resembles human writing. Returning true information isn't part of the system's goals... which makes it pretty worrying that it's what it keeps getting marketed for

It's the marketing that makes it a goal. It doesn't matter to the consumer how the goal is achieved (or not).

ChatGPT's tagline technically does not claim factual output. But it does claim assistance with writing and learning, among other things. So output that does not further that goal can be considered an error. And their actions certainly indicate that they do not want it outputting grossly factually incorrect information.

SpaceGhostC2C · May 30

On 5/26/2024 at 3:09 AM, manikyath said:

the idea behind using that term is it makes it very easy to understand what is going wrong. you feed the LLM a query, and it produces something that has no basis in reality with a very high degree of confidence in the same way that a person that is hallucinating might 'see things' with a very high degree of confidence.

It doesn't, though. Any forecast algorithm (what these programs essentially are) will produce a prediction when prompted. LLMs don't "have confidence", they just output text. You could in principle compute the probability distribution of that text being the intended answer, that would tell you the confidence level. A missed prediction is just a (forecast) error, which will happen and do not indicate which level of confidence was attached to it (I mean the actual confidence: asking chatGPT "are you sure?" is just another query, not a statistical analysis of the current model producing the answers).

I agree with OP that the excessive anthropomorphizing (?) of LLMs boosts marketing-speak and obscures the reality of what the software is, misrepresenting them as lab-grown rats we are "observing as they evolve" or something. "Hallucination" creates this impression of emerging phenomenon, when it's just the software working as intended.

On 5/26/2024 at 5:24 AM, podkall said:

you could say the LLM is "assuming the correct answer (based on it's available information)", but "hallucinating" rolls off the tongue better

It's not "assuming" anything, that's just more anthropomorphism. It's just spitting an answer, which is all the program does, and by design its answers will sometimes be helpful and sometimes be useless.

53 minutes ago, thevictor390 said:

Error is correct but also too broad. Hallucination could be said to be an AI error. But, if you just say the AI had an error/bug/glitch, it does not explain what happened.

All of these things are errors:

AI outputs wrong information (assuming it is a fact-finding AI like ChatGPT)

AI outputs a refusal to answer a valid question

AI outputs nothing

AI outputs random characters

All those are the same type of error: a "prediction error", since its output is essentially a prediction of what the answer should look like based on the prediction model and training data set used. It's the expected type of erroneous answer that will occur from time to time (the competition is on how to minimize it) when the program works as intended.

53 minutes ago, thevictor390 said:

AI outputs an error code

AI interface crashes or freezes

These are just bugs, i.e., the program not working at all. It's the same difference as me running a linear regression, predicting data points based on my estimates, and measuring the errors between predictions and actual data points, and me writing a buggy code for the regression/prediction and having it crash.

thevictor390 · May 30

22 minutes ago, SpaceGhostC2C said:

These are just bugs, i.e., the program not working at all. It's the same difference as me running a linear regression, predicting data points based on my estimates, and measuring the errors between predictions and actual data points, and me writing a buggy code for the regression/prediction and having it crash.

The lack of distinction between program bugs and AI prediction errors was exactly my point.

podkall · May 30

1 hour ago, SpaceGhostC2C said:

It's not "assuming" anything, that's just more anthropomorphism. It's just spitting an answer, which is all the program does, and by design its answers will sometimes be helpful and sometimes be useless.

it's assuming an answer based on it's info, just like you'd assume if you only knew that heat rises you'd assume that cold falls

manikyath · May 30

2 hours ago, SpaceGhostC2C said:

LLMs don't "have confidence", they just output text. You could in principle compute the probability distribution of that text being the intended answer, that would tell you the confidence level.

afaik the process is selecting "the most likely" answer, and the way the "difference in how likely different answers are" would be the confidence level. again, it's a term used for describing things in humans being used for software.. but that is again because the term is the easiest way to describe what the software is doing, in a way average joe can understand.

the reason why IMO the term "hallucination" is very suitable to the issue at hand, is because the way an LLM works, it might get stuck in it's own "fake story" if you keep prompting about more information about what it just said, same way as a hallucinating person is stuck in a fake reality.

to back up my own question of "give a better word" - a slightly less humanoid way of expressing the issue would be "it is stuck in storytelling mode", because that is essentially what the LLM is doing when it's "hallucinating". you ask for information related to the real world, but the LLM is stuck in the process of producing a fictional story.

however.. hallucinating is so much easier to say...

wasab · May 30

"hallucination" is also the term used by my engineer coworker who worked on and fine-tuned our AI model. what's wrong with that? AI hallucinates whenever it makes up stuff and says false things that aren't real and do not exist.

SpaceGhostC2C · May 30

3 hours ago, podkall said:

it's assuming an answer based on it's info, just like you'd assume if you only knew that heat rises you'd assume that cold falls

No it's not. An assumption is different from a fitted value. It is constructing an answer based on a model estimated or "trained" in a given information set. It doesn't "assume" what it doesn't know, it just ventures (predicts) an answer based on the information it does have, as scarce as it may be. If information is scarce enough, the "correct" answer (as understood by us) may be "I don't know", so providing any answer is an error, but it's no different from you saying it will be 15°C tomorrow when it turns out to be 18°C or whatever.

2 hours ago, manikyath said:

afaik the process is selecting "the most likely" answer, and the way the "difference in how likely different answers are" would be the confidence level.

Not exactly, but yes, kind of, or related to it. But notice the "most likely" could have a 0.0003% probability and everything else it can come up with is just worse. This is true of every AI answer, we just get the "point estimate" (the answer) but not the "confidence level" (the probability the answer was correct). Then we differentiate them between inaccuracies and "hallucinations" based on how they sound to us, but it isn't really something intrinsic to the process producing the answer, they are all the same type of error.

2 hours ago, manikyath said:

the reason why IMO the term "hallucination" is very suitable to the issue at hand, is because the way an LLM works, it might get stuck in it's own "fake story" if you keep prompting about more information about what it just said, same way as a hallucinating person is stuck in a fake reality.

Yes, that's a consequence of using its previous answers as part of the information set to predict the best next line (which is necessary for it to produce some sense of conversation).

2 hours ago, manikyath said:

to back up my own question of "give a better word" - a slightly less humanoid way of expressing the issue would be "it is stuck in storytelling mode", because that is essentially what the LLM is doing when it's "hallucinating". you ask for information related to the real world, but the LLM is stuck in the process of producing a fictional story.

however.. hallucinating is so much easier to say...

you can always say it got jammed

Kisai · May 30

7 hours ago, thevictor390 said:

Only one of those is "hallucination." It is one type of error.

A hallucination is not an error, it's an unintended consequence.

The best way to keep hallucinations from happening is to stop training LLM's on unfiltered data. What works for training ASR and TTS does not work for LLM's. Both ASR and TTS's are trained on LibriTTS which are random public domain novels read by various people with various audio quality. But also the novels are read in a narrative way, which results in ASR's that can't recognize people with accents not learned, and TTS's that all sound like they're reading a teleprompter. ASR and TTS is "good enough" when it does that, but it does not approach an accuracy level that people would regard as human-like.

LLM's can not be trained on just public domain data, because it's usefulness would be basically zero. Not to mention we've moved the bar a lot on racism, never mind women's role in the house and in the workplace since. Anyone who argues that only PD data can be used to train an AI completely misses how that isn't even how humans learn. Humans learn from everything, so restricting the AI to just PD materials means that the AI is also pretty ass-backwards when it comes to social interactions.

Ask if it's worse for the AI to learn to be a Nazi because it hasn't learned about the effects WW2 had, since nothing is in the public domain about WW2 other than propaganda-produced materials by the governments of the time. All those news publications are not in the public domain, and likewise a lot of fiction from the time was likely written from a neutral or positive point of view on fascism. Because the US didn't want to get involved until it found an excuse to.

Can you imagine for a moment where the rules on AI training had to adhere to the silly "pay me for every inconsequential use of music, every time" levels the music industry demands? What would happen is people would donate materials in to the PD expressly to poison the AI, or bias it towards a certain opinion. Both of these things are happening right now.

I think we're already at the "peak usefulness" of tools like ChatGPT, because it's only going to be downhill from here as it starts turning into an AI ouroboros eating it's own tail.

Sign In

Don't Anthropomorphize Large Language Models (they hate that!)

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites