ChatGPT provides false information about people, and OpenAI can’t correct it

alb_004@lemm.ee · 1 year ago

ChatGPT provides false information about people, and OpenAI can’t correct it

jol@discuss.tchncs.de · 1 year ago

Stop asking a language model for accurate information and problem solved. ChatGPT is not supposed to be a knowledge bank, that’s purely incidental for the amount of training data.

NeoNachtwaechter@lemmy.world · 1 year ago

Stop asking a language model for accurate information and problem solved

Hey chatgpt, when did jol’s wife get pregnant and by whom?

/s

jol@discuss.tchncs.de · 1 year ago

Unless they used that bitche’s only fans in the training data, it will definitely not know that.

lightnegative@lemmy.world · 1 year ago

It doesn’t need to know the real answer to produce a confident sounding answer

NeoNachtwaechter@lemmy.world · 1 year ago

And if that answer contains Elon Musk, the world is going to believe it no matter what.

filister@lemmy.world · 1 year ago

Just ask ChatGPT what it thinks for some non-existing product and it will start hallucinating.

This is a known issue of LLMs and DL in general as their reasoning is a black box for scientists.

db0@lemmy.dbzer0.com · 1 year ago

It’s not that their reasoning is a black box. It’s that they do not have reasoning! They just guess what the next word in the sentence is likely to be.

SlopppyEngineer@lemmy.world · 1 year ago

And by the time the system can actually research the facts, the internet is so full of LLM generated nonsense neither human or AI can verify the data.

givesomefucks@lemmy.world · 1 year ago

If scientists made AI, then it wouldn’t be an issue for AI to say “I don’t know”.

But capitalists are making it, and the last thing you want is it to tell an investor “I don’t know”. So you tell it to make up bullshit instead, and hope the investor believes it.

It’s a terrible fucking way to go about things, but this is America…

expr@programming.dev · 1 year ago

It’s got nothing to do with capitalism. It’s fundamentally a matter of people using it for things it’s not actually good at, because ultimately it’s just statistics. The words generated are based on a probability distribution derived from its (huge) training dataset. It has no understanding or knowledge. It’s mimicry.

It’s why it’s incredibly stupid to try using it for the things people are trying to use it for, like as a source of information. It’s a model of language, yet people act like it has actual insight or understanding.

hatedbad@lemmy.sdf.org · 1 year ago

you’re so close, just why exactly do you think people are using it for these things it’s not meant for?

because every company, every CEO, every VP, is pushing every sector of their companies to adopt AI no matter what.

most actual people understand the limitations you list, but it’s the capitalists at the table that are making AI show up where it’s not wanted

givesomefucks@lemmy.world · 1 year ago

Imagine searching your computer for a PDF named “W2.2026”…

Would you rather the computer tell you it’s not in the database? Or would you prefer a random PDF displayed with the title “W2.2026”?

This isn’t a new problem.

You’re getting hung up on “know” instead “has relevant information in it’s database and can access it”.

But besides all that and the other things you got wrong:

It’s still about capitalism for the reasons I just said

expr@programming.dev · 1 year ago

You do not understand how these things actually work. I mean, fair enough, most people don’t. But it’s a bit foolhardy to propose changes to how something works without understanding how it works now.

There is no “database”. That’s a fundamental misunderstanding of the technology. It is entirely impossible to query a model to determine if something is “present” or not (the question doesn’t even make sense in that context).

A model is, to greatly simplify things, a function (like in math) that will compute a response based on the input given. What this computation does is entirely opaque (including to the creators). It’s what we we call a “black box”. In order to create said function, we start from a completely random mapping of inputs to outputs (we’ll call them weights from now on) as well as training data, iteratively feed training data to this function and measure how close its output is to what we expect, adjusting the weights (which are just numbers) based on how close it is. This is a gross simplification of the complexity involved (and doesn’t even touch on the structure of the model’s network itself), but it should give you a good idea.

It’s applied statistics: we’re effectively creating a probability distribution over natural language itself, where we predict the next word based on how frequently we’ve seen words in a particular arrangement. This is old technology (dates back to the 90s) that has hit the mainstream due to increases in computing power (training models is very computationally expensive) and massive increases in the size of dataset used in training.

Source: senior software engineer with a computer science degree and multiple graduate-level courses on natural language processing and deep learning

Btw, I have serious issues with both capitalism itself and machine learning as it is applied by corporations, so don’t take what I’m saying to mean that I’m in any way an apologist for them. But it’s important to direct our criticisms of the system as precisely as possible.

Zarxrax@lemmy.world · 1 year ago

You don’t seem to understand. There is no database.

givesomefucks@lemmy.world · 1 year ago

https://www.merriam-webster.com/dictionary/analogy

Æther@lemmy.world · 1 year ago

https://en.wikipedia.org/wiki/False_equivalence

wahming@monyet.cc · 1 year ago

It’s not a database. God, how many years is it going to take before people understand just what LLMs are and are not capable of?

DarkThoughts@fedia.io · edit-2 1 year ago

This has nothing to do with scientists vs capitalists and everything with the fact that this is not actually “AI”. Someone called it T9 (word prediction) on steroids and I find that much more fitting with how those LLMs work. It just mimics the way humans talk, but it doesn’t actually converse intelligently or actually understands context - it just looks like it does, but only if you take it at face value and don’t look deeper into it.

givesomefucks@lemmy.world · 1 year ago

Removed by mod

DarkThoughts@fedia.io · 1 year ago

Removed by mod

then_three_more@lemmy.world · edit-2 1 year ago

It’s just short for automatic transmission, opposed to manual transmission. I think Americans call manual cars sticks though. But they’re not sticks, because sticks are wood and cars are almost always metal. Not metal like the music though.

Edit - thinking on it you could play metal through the car stereo though.

DarkThoughts@fedia.io · edit-2 1 year ago

I know the difference between an automatic & manual car & transmission. The analogy just doesn’t make sense, because when you say “automatic / manual car” you’re still referring to something within the car, the transmission system - you’re not actually calling the car to be “automated” or whatever. Calling LLMs “AI” however is nothing but a misnomer and that analogy simply does not compare at all.

givesomefucks@lemmy.world · 1 year ago

Removed by mod

DarkThoughts@fedia.io · 1 year ago

Removed by mod

howrar@lemmy.ca · 1 year ago

It is made by scientists. And we don’t know how to make the model determine whether or not it knows something. So far, we only have tools that tell us that something probably wasn’t in the training set (e.g. using variance across models in a mixture of experts setup), but that doesn’t tell us anything about how correct it is.

set_secret@lemmy.world · 1 year ago

Just put this into GPT 4.

What’s your view of the fizbang Raspberry blasters?

Gpt ‘I’m not familiar with “fizbang Raspberry blasters.” Could you provide more details or clarify what they are?’

It’s a drink making machine from china

Gpt ‘I don’t have any specific information on the “fizbang Raspberry blasters” drink making machine. If it’s a new or niche product, details might be limited online.’

So, in this instance is didn’t hallucinate, i tried a few more made up things and it’s consistent in saying it doesn’t know of these.

Explanations?

Meowing Thing@lemmy.world · 1 year ago

It is made by scientists. The problem is that said scientists are paid by investors mostly, or by grants that come from investors.

RidcullyTheBrown@lemmy.world · 1 year ago

There we go. Now that people have calmed their proverbial tits about these thinking machines, we can start talking maturely about the strengths and limitation of the LLM implementations and find their niche in our tools arsenal.

warmaster@lemmy.world · 1 year ago

I can’t wait until the AI bubble finally pops.

RidcullyTheBrown@lemmy.world · edit-2 1 year ago

There’s definitely a niche for it, more so than for other fruitless hypes like blockchain or IoT. We really need to be able to offload tasks which need autonomous decisions of simple to average complexity to machines. We can’t continuously scale up the population to handle those. But LLMs aren’t the answer to that, unfortunately. They’re just party tricks if the current limitations cannot be overcome.

cley_faye@lemmy.world · 1 year ago

Asking chatgpt for information is like asking for accurate reports from bards and minstrels. Sure, sometimes it fits, but most of it is random stuff stitched together to sound good.

NeoNachtwaechter@lemmy.world · 1 year ago

No surprise, and this is going to happen to everybody who uses neural net models for production. You just don’t know where your data is, and therefore it is unbelievably hard to change data.

So, if you have legal obligations to know it, or to delete some data, then you are deep in the mud.

erv_za@lemmy.world · 1 year ago

I think of ChatGPT as a “text generator”, similar to how Dall-E is an “image generator”.
If I were openai, I would post a fictitious person disclaimer at the bottom of the page and hold the user responsible for what the model does. Nobody holds Adobe responsible when someone uses Photoshop.

NeoNachtwaechter@lemmy.world · 1 year ago

I would post a fictitious person disclaimer

… or you could read the GDPR and learn that such excuses are void.

erv_za@lemmy.world · 1 year ago

You just wasted a lot of my time. What did I do to deserve this?

NeoNachtwaechter@lemmy.world · 1 year ago

… said the sparrow and flew out of the library.

vithigar@lemmy.ca · 1 year ago

LLMs don’t actually store any of their training data, though. And any data being held in context is easily accessible and can be wiped or edited to remove personal data as necessary.

NeoNachtwaechter@lemmy.world · 1 year ago

LLMs don’t actually store any of their training data,

Data protection law covers all kinds of data processing.

For example, input is processing, too. Output is processing, too. Section 4 of the GDPR.

If you really want to rely on excuses, you would need wayyy better ones.

vithigar@lemmy.ca · 1 year ago

Right, so keep personal data out of the training set and use it only in the easily readable and editable context. It’ll still “hallucinate” details about people if you ask it for details about people, but those people are fictitious.

yamanii@lemmy.world · 1 year ago

The technology has to follow the legal requirements, not the other way around.

That should be obvious to everyone that’s not an evangelist.