It refers to when an LLM will in some way try to deceive or manipulate the user interacting with it.
I think this still gives the model too much credit by implying that there’s any sort of intentionally behind this behavior.
There’s not.
These models are trained on the output of real humans and real humans lie and deceive constantly. All that’s happening is that the underlying mathematical model has encoded the statistical likelihood that someone will lie in a given situation. If that statistical likelihood is high enough, the model itself will lie when put in a similar situation.
And like hallucinations, it’s undesired behavior that proponents off LLMs will need to “fix” (a practical impossibility as far as I’m concerned, like unbaking a cake).
But how would you use words to explain the phenomenon?
“LLMs hallucinate and lie” is probably the shortest description that most people will be able to grasp.
Except that “hallucinate” is a terrible term. A hallucination is when you perceive something that doesn’t exist. What AI is doing is making things up; i.e. lying.
This language also credits LLMs with an implied ability to think they don’t have.
My point is we literally can’t describe their behaviour without using language that makes it seems like they do more than they do.
So we’re just going to have to accept that discussing it will have to come with a bunch of asterisks a lot of people are going to ignore. And which many will actively try to hide in an effort to hype up the possibility that this tech is a stepping stone to AGI.
I think this still gives the model too much credit by implying that there’s any sort of intentionally behind this behavior.
There’s not.
These models are trained on the output of real humans and real humans lie and deceive constantly. All that’s happening is that the underlying mathematical model has encoded the statistical likelihood that someone will lie in a given situation. If that statistical likelihood is high enough, the model itself will lie when put in a similar situation.
Obviusly.
And like hallucinations, it’s undesired behavior that proponents off LLMs will need to “fix” (a practical impossibility as far as I’m concerned, like unbaking a cake).
But how would you use words to explain the phenomenon?
“LLMs hallucinate and lie” is probably the shortest description that most people will be able to grasp.
Except that “hallucinate” is a terrible term. A hallucination is when you perceive something that doesn’t exist. What AI is doing is making things up; i.e. lying.
Yes.
Who are you trying to convince?
This language also credits LLMs with an implied ability to think they don’t have.
My point is we literally can’t describe their behaviour without using language that makes it seems like they do more than they do.
So we’re just going to have to accept that discussing it will have to come with a bunch of asterisks a lot of people are going to ignore. And which many will actively try to hide in an effort to hype up the possibility that this tech is a stepping stone to AGI.