AI Is Scheming, and Stopping It Won’t Be Easy, OpenAI Study Finds

fubarx@lemmy.world · 2 months ago

AI Is Scheming, and Stopping It Won’t Be Easy, OpenAI Study Finds

very_well_lost@lemmy.world · 2 months ago

It refers to when an LLM will in some way try to deceive or manipulate the user interacting with it.

I think this still gives the model too much credit by implying that there’s any sort of intentionally behind this behavior.

There’s not.

These models are trained on the output of real humans and real humans lie and deceive constantly. All that’s happening is that the underlying mathematical model has encoded the statistical likelihood that someone will lie in a given situation. If that statistical likelihood is high enough, the model itself will lie when put in a similar situation.

MentalEdge@sopuli.xyz · edit-2 2 months ago

Obviusly.

And like hallucinations, it’s undesired behavior that proponents off LLMs will need to “fix” (a practical impossibility as far as I’m concerned, like unbaking a cake).

But how would you use words to explain the phenomenon?

“LLMs hallucinate and lie” is probably the shortest description that most people will be able to grasp.

very_well_lost@lemmy.world · edit-2 2 months ago

But how would you use words to explain the phenomenon?

I don’t know, I’ve been struggling to find the right ‘sound bite’ for it myself. The problem is that all of the simplified explanations encourage people to anthropomorphize these things, which just further fuels the toxic hype cycle.

In the end, I’m unsure which does more damage.

Is it better to convince people the AI “lies”, so they’ll stop using it? Or is it better to convince people AI doesn’t actually have the capacity to lie so that they’ll stop shoveling money onto the datacenter altar like we’ve just created some bullshit techno-god?

zarkanian@sh.itjust.works · edit-2 2 months ago

Except that “hallucinate” is a terrible term. A hallucination is when you perceive something that doesn’t exist. What AI is doing is making things up; i.e. lying.

MentalEdge@sopuli.xyz · edit-2 2 months ago

Yes.

Who are you trying to convince?

What AI is doing is making things up.

This language also credits LLMs with an implied ability to think they don’t have.

My point is we literally can’t describe their behaviour without using language that makes it seems like they do more than they do.

So we’re just going to have to accept that discussing it will have to come with a bunch of asterisks a lot of people are going to ignore. And which many will actively try to hide in an effort to hype up the possibility that this tech is a stepping stone to AGI.

zarkanian@sh.itjust.works · 2 months ago

The interface makes it appear that the AI is sapient. You talk to it like a human being, and it responds like a human being. Like you said, it might be impossible to avoid ascribing things like intentionality to it, since it’s so good at imitating people.

It may very well be a stepping-stone to AGI. It may not. Nobody knows. So, of course we shouldn’t assume that it is.

I don’t think that “hallucinate” is a good term regardless. Not because it makes AI appear sapient, but because it’s inaccurate whether the AI is sapient or not.

MentalEdge@sopuli.xyz · edit-2 2 months ago

Like you said, it might be impossible to avoid ascribing things like intentionality to it

That’s not what I meant. When you say “it makes stuff up” you are describing how the model statistically predicts the expected output.

You know that. I know that.

That’s the asterisk. The more in-depth explanation a lot of people won’t bother getting far enough to learn about. Someone who doesn’t read that far into it, can read that same phrase and assume that we’re discussing what type of personality LLMs exhibit, that they are “liars”. But they’d be wrong. Neither of us is attributing intention to it or discussing what kind of “person” it is, in reality we’re referring to the fact that it’s “just” a really complex probability engine that can’t “know” anything.

No matter what word we use, if it is pre-existing, it will come with pre-existing meanings that are kinda right, but also not quite, requiring that everyone involved in a discussion know things that won’t be explained every time a term or phrase is used.

The language isn’t “inaccurate” between you and me because you and I know the technical definition, and therefore what aspect of LLMs is being discussed.

Terminology that is “accurate” without this context does not and cannot exist, short of coming up with completely new words.

zarkanian@sh.itjust.works · edit-2 2 months ago

You could say “the model’s output was inaccurate” or something like that, but it would be much more stilted.

Jakeroxs@sh.itjust.works · 2 months ago

https://www.dictionary.com/browse/hallucinate

zarkanian@sh.itjust.works · 2 months ago

I’m aware that it’s a computing term. My argument is that it’s a bad one.