We have to stop ignoring AI’s hallucination problem

misk@sopuli.xyz · 4 months ago

We have to stop ignoring AI’s hallucination problem

Voroxpete@sh.itjust.works · 4 months ago

We not only have to stop ignoring the problem, we need to be absolutely clear about what the problem is.

LLMs don’t hallucinate wrong answers. They hallucinate all answers. Some of those answers will happen to be right.

If this sounds like nitpicking or quibbling over verbiage, it’s not. This is really, really important to understand. LLMs exist within a hallucinatory false reality. They do not have any comprehension of the truth or untruth of what they are saying, and this means that when they say things that are true, they do not understand why those things are true.

That is the part that’s crucial to understand. A really simple test of this problem is to ask ChatGPT to back up an answer with sources. It fundamentally cannot do it, because it has no ability to actually comprehend and correlate factual information in that way. This means, for example, that AI is incapable of assessing the potential veracity of the information it gives you. A human can say “That’s a little outside of my area of expertise,” but an LLM cannot. It can only be coded with hard blocks in response to certain keywords to cut it from answering and insert a stock response.

This distinction, that AI is always hallucinating, is important because of stuff like this:

But notice how Reid said there was a balance? That’s because a lot of AI researchers don’t actually think hallucinations can be solved. A study out of the National University of Singapore suggested that hallucinations are an inevitable outcome of all large language models. **Just as no person is 100 percent right all the time, neither are these computers. **

That is some fucking toxic shit right there. Treating the fallibility of LLMs as analogous to the fallibility of humans is a huge, huge false equivalence. Humans can be wrong, but we’re wrong in ways that allow us the capacity to grow and learn. Even when we are wrong about things, we can often learn from how we are wrong. There’s a structure to how humans learn and process information that allows us to interrogate our failures and adjust for them.

When an LLM is wrong, we just have to force it to keep rolling the dice until it’s right. It cannot explain its reasoning. It cannot provide proof of work. I work in a field where I often have to direct the efforts of people who know more about specific subjects than I do, and part of how you do that is you get people to explain their reasoning, and you go back and forth testing propositions and arguments with them. You say “I want this, what are the specific challenges involved in doing it?” They tell you it’s really hard, you ask them why. They break things down for you, and together you find solutions. With an LLM, if you ask it why something works the way it does, it will commit to the bit and proceed to hallucinate false facts and false premises to support its false answer, because it’s not operating in the same reality you are, nor does it have any conception of reality in the first place.

dustyData@lemmy.world · 4 months ago

This right here is also the reason why AI fanboys get angry when they are told that LLMs are not intelligent or even thinking at all. They don’t understand that in order for rational intelligence to exist, the LLMs should be able to have an internal, referential inner world of symbols, to contrast external input (training data) against and that is also capable of changing and molding to reality and truth criteria. No, tokens are not what I’m talking about. I’m talking about an internally consistent and persistent representation of the world. An identity, which is currently antithetical with the information model used to train LLMs. Let me try to illustrate.

I don’t remember the details or technical terms but essentially, animal intelligence needs to experience a lot of things first hand in order to create an individualized model of the world which is used to direct behavior (language is just one form of behavior after all). This is very slow and labor intensive, but it means that animals are extremely good, when they get good, at adapting said skills to a messy reality. LLMs are transactional, they rely entirely on the correlation of patterns of input to itself. As a result they don’t need years of experience, like humans for example, to develop skilled intelligent responses. They can do it in hours of sensing training input instead. But at the same time, they can never be certain of their results, and when faced with reality, they crumble because it’s harder for it to adapt intelligently and effectively to the mess of reality.

LLMs are a solipsism experiment. A child is locked in a dark cave with nothing but a dim light and millions of pages of text, assume immortality and no need for food or water. As there is nothing else to do but look at the text they eventually develop the ability to understand how the symbols marked on the text relate to each other, how they are usually and typically assembled one next to the other. One day, a slit on a wall opens and the person receives a piece of paper with a prompt, a pencil and a blank page. Out of boredom, the person looks at the prompt, it recognizes the symbols and the pattern, and starts assembling the symbols on the blank page with the pencil. They are just trying to continue from the prompt what they think would typically follow or should follow afterwards. The slit in the wall opens again, and the person intuitively pushes the paper it just wrote into the slit.

For the people outside the cave, leaving prompts and receiving the novel piece of paper, it would look like an intelligent linguistic construction, it is grammatically correct, the sentences are correctly punctuated and structured. The words even make sense and it says intelligent things in accordance to the training text left inside and the prompt given. But once in a while it seems to hallucinate weird passages. They miss the point that, it is not hallucinating, it just has no sense of reality. Their reality is just the text. When the cave is opened and the person trapped inside is left into the light of the world, it would still be profoundly ignorant about it. When given the word sun, written on a piece of paper, they would have no idea that the word refers to the bright burning ball of gas above them. It would know the word, it would know how it is usually used to assemble text next to other words. But it won’t know what it is.

LLMs are just like that, they just aren’t actually intelligent as the person in this mental experiment. Because there’s no way, currently, for these LLMs to actually sense and correlate the real world, or several sources of sensors into a mentalese internal model. This is currently the crux and the biggest problem on the field of AI as I understand it.

Aceticon@lemmy.world · 4 months ago

That’s an excellent methaphor for LLMs.

feedum_sneedson@lemmy.world · 4 months ago

It’s the Chinese room thought experiment.

Aceticon@lemmy.world · 4 months ago

Hadn’t heard about it before (or maybe I did but never looked into it), so I just went and found it in Wikipedia and will be reading all about it.

So thanks for the info!

feedum_sneedson@lemmy.world · 4 months ago

No worries. The person above did a good job explaining it although they kind of mashed it together with the imagery from Plato’s allegory of the cave.

Cyberflunk@lemmy.world · 4 months ago

Wtf are you even talking about.

UnsavoryMollusk@lemmy.world · edit-2 4 months ago

They are right though. LLM at their core are just about determining what is statistically the most probable to spit out.

Cyberflunk@lemmy.world · 4 months ago

Your 1 sentence makes more sense than the slop above.

UnpluggedFridge@lemmy.world · 4 months ago

How do hallucinations preclude an internal representation? Couldn’t hallucinations arise from a consistent internal representation that is not fully aligned with reality?

I think you are misunderstanding the role of tokens in LLMs and conflating them with internal representation. Tokens are used to generate a state, similar to external stimuli. The internal representation, assuming there is one, is the manner in which the tokens are processed. You could say the same thing about human minds, that the representation is not located anywhere like a piece of data; it is the manner in which we process stimuli.

dustyData@lemmy.world · edit-2 4 months ago

Not really. Reality is mostly a social construction. If there’s not an other to check and bring about meaning, there is no reality, and therefore no hallucinations. More precisely, everything is a hallucination. As we cannot cross reference reality with LLMs and it cannot correct itself to conform to our reality. It will always hallucinate and it will only coincide with our reality by chance.

I’m not conflating tokens with anything, I explicitly said they aren’t an internal representation. They’re state and nothing else. LLMs don’t have an internal representation of reality. And they probably can’t given their current way of working.

UnpluggedFridge@lemmy.world · edit-2 4 months ago

You seem pretty confident that LLMs cannot have an internal representation simply because you cannot imagine how that capability could emerge from their architecture. Yet we have the same fundamental problem with the human brain and have no problem asserting that humans are capable of internal representation. LLMs adhere to grammar rules, present information with a logical flow, express relationships between different concepts. Is this not evidence of, at the very least, an internal representation of grammar?

We take in external stimuli and peform billions of operations on them. This is internal representation. An LLM takes in external stimuli and performs billions of operations on them. But the latter is incapable of internal representation?

And I don’t buy the idea that hallucinations are evidence that there is no internal representation. We hallucinate. An internal representation does not need to be “correct” to exist.

dustyData@lemmy.world · edit-2 4 months ago

Yet we have the same fundamental problem with the human brain

And LLMs aren’t human brains, they don’t even work remotely similarly. An LLM has more in common with an Excel spreadsheet than with a neuron. Read on the learning models and pattern recognition theories behind LLMs, they are explicitly designed to not function like humans. So we cannot assume that the same emergent properties exist on an LLM.

UnpluggedFridge@lemmy.world · 4 months ago

Nor can we assume that they cannot have the same emergent properties.

dustyData@lemmy.world · 4 months ago

That’s not how science works. You are the one claiming it does, you have the burden of proof to prove they have the same properties. Thus far, assuming they don’t as they aren’t human is the sensible rational route.

???@lemmy.world · 4 months ago

I fucking hate how OpenAi and other such companies claim their models “understand” language or are “fluent” in French. These are human attributes. Unless they made a synthetic brain, they can take these claims and shove them up their square tight corporate behinds.

mamotromico@lemmy.ml · 4 months ago

I though I would have an aneurism reading their presentation page on Sora.

They are saying Sora can understand and simulate complex physics in 3D space to render a video.

How can such bullshit go unchallenged. It drives me crazy.

EatATaco@lemm.ee · 4 months ago

This is circular logic: only humans can be fluent, so the models can’t be fluent because they aren’t human.

And it’s universally upvoted…in response to an ais getting things wrong so they can’t be doing anything but hallucinating.

And will you learn from this? Nope. I’ll just be down voted and shouted at.

Danksy@lemmy.world · edit-2 4 months ago

It’s not circular. LLMs cannot be fluent because fluency comes from an understanding of the language. An LLM is incapable of understanding so it is incapable of being fluent. It may be able to mimic it but that is a different thing. (In my opinion)

EatATaco@lemm.ee · 4 months ago

You might agree with the conclusion, and the conclusion might even be correct, but the poster effectively argued ‘only humans can be fluent, and it’s not a human so it isn’t fluent’ and that is absolutely circular logic.

Danksy@lemmy.world · 4 months ago

If we ignore the other poster, do you think the logic in my previous comment is circular?

EatATaco@lemm.ee · 4 months ago

Hard to say. You claim they are incapable of understanding, which is why they can’t be fluent. however, really, the whole argument boils down to whether they are capable of understanding. You just state that as if it’s established fact, and I believe that’s an open question at this point.

So whether it is circular depends on why you think they are incapable of understanding. If it’s like the other poster, and it’s because that’s a human(ish) only trait, and they aren’t human…then yes.

???@lemmy.world · 4 months ago

This is not at all what I said. If a machine was complex enough to reason, all power to it. But these LLMs cannot.

el_bhm@lemm.ee · 4 months ago

They do not have any comprehension of the truth or untruth of what they are saying, and this means that when they say things that are true, they do not understand why those things are true.

Which can be beautifully exploited with sponsored content.

See Google I/O '24.

nucleative@lemmy.world · 4 months ago

Well stated and explained. I’m not an AI researcher but I develop with LLMs quite a lot right now.

Hallucination is a huge problem we face when we’re trying to use LLMs for non-fiction. It’s a little bit like having a friend who can lie straight-faced and convincingly. You cannot distinguish whether they are telling you the truth or they’re lying until you rely on the output.

I think one of the nearest solutions to this may be the addition of extra layers or observer engines that are very deterministic and trained on only extremely reputable sources, perhaps only peer reviewed trade journals, for example, or sources we deem trustworthy. Unfortunately this could only serve to improve our confidence in the facts, not remove hallucination entirely.

It’s even feasible that we could have multiple observers with different domains of expertise (i.e. training sources) and voting capability to fact check and subjectively rate the LLMs output trustworthiness.

But all this will accomplish short term is to perhaps roll the dice in our favor a bit more often.

The perceived results from the end users however may significantly improve. Consider some human examples: sometimes people disagree with their doctor so they go see another doctor and another until they get the answer they want. Sometimes two very experienced lawyers both look at the facts and disagree.

The system that prevents me from knowingly stating something as true, despite not knowing, without some ability to back up my claims is my reputation and my personal values and ethics. LLMs can only pretend to have those traits when we tell them to.

Voroxpete@sh.itjust.works · 4 months ago

Consider some human examples: sometimes people disagree with their doctor so they go see another doctor and another until they get the answer they want. Sometimes two very experienced lawyers both look at the facts and disagree.

This actually illustrates my point really well. Because the reason those people disagree might be

Different awareness of the facts (lawyer A knows an important piece of information lawyer B doesn’t)
Different understanding of the facts (lawyer might have context lawyer B doesn’t)
Different interpretation of the facts (this is the hardest to quantify, as its a complex outcome of everything that makes us human, including personality traits such as our biases).

Whereas you can ask the same question to the same LLM equipped with the same data set and get two different answers because it’s just rolling dice at the end of the day.

If I sit those two lawyers down at a bar, with no case on the line, no motivation other than just friendly discussion, they could debate the subject and likely eventually come to a consensus, because they are sentient beings capable of reason. That’s what LLMs can only fake through smoke and mirrors.

Hello Hotel@lemmy.world · 4 months ago

usually, what I see is that the REPL they are using is never introspective enough. The ai cant on its own revert to a prevous state or give notes to itself because the response being fast and in linear time matters for a chatbot. ChatGPT can make really cool stuff when you ask it to break it’s thoght process into steps. Ones it usually fails spectacularly at. It was like pulling teeth to get it to actually do the steps and not just give the bad answer anyway.

5gruel@lemmy.world · 4 months ago

I’m not convinced about the “a human can say ‘that’s a little outside my area of expertise’, but an LLM cannot.” I’m sure there are a lot of examples in the training data set that contains qualification of answers and expression of uncertainty, so why would the model not be able to generate that output? I don’t see why it would require an “understanding” for that specifically. I would suspect that better human reinforcement would make such answers possible.

dustyData@lemmy.world · 4 months ago

Because humans can do introspection and think and reflect about our own knowledge against the perceived expertise and knowledge of other humans. There’s nothing in LLMs models capable of doing this. An LLM cannot asses it own state, and even if it could, it has nothing to contrast it to. You cannot develop the concept of ignorance without an other to interact and compare with.

UnpluggedFridge@lemmy.world · 4 months ago

I think where you are going wrong here is assuming that our internal perception is not also a hallucination by your definition. It absolutely is. But our minds are embodied, thus we are able check these hallucinations against some outside stimulus. Your gripe that current LLMs are unable to do that is really a criticism of the current implementations of AI, which are trained on some data, frozen, then restricted from further learning by design. Imagine if your mind was removed from all stimulus and then tested. That is what current LLMs are, and I doubt we could expect a human mind to behave much better in such a scenario. Just look at what happens to people cut off from social stimulus; their mental capacities degrade rapidly and that is just one type of stimulus.

Another problem with your analysis is that you expect the AI to do something that humans cannot do: cite sources without an external reference. Go ahead right now and from memory cite some source for something you know. Do not Google search, just remember where you got that knowledge. Now who is the one that cannot cite sources? The way we cite sources generally requires access to the source at that moment. Current LLMs do not have that by design. Once again, this is a gripe with implementation of a very new technology.

The main problem I have with so many of these “AI isn’t really able to…” arguments is that no one is offering a rigorous definition of knowledge, understanding, introspection, etc in a way that can be measured and tested. Further, we just assume that humans are able to do all these things without any tests to see if we can. Don’t even get me started on the free will vs illusory free will debate that remains unsettled after centuries. But the crux of many of these arguments is the assumption that humans can do it and are somehow uniquely able to do it. We had these same debates about levels of intelligence in animals long ago, and we found that there really isn’t any intelligent capability that is uniquely human.

mindlesscrollyparrot@discuss.tchncs.de · 4 months ago

This seems to be a really long way of saying that you agree that current LLMs hallucinate all the time.

I’m not sure that the ability to change in response to new data would necessarily be enough. They cannot form hypotheses and, even if they could, they have no way to test them.

UnpluggedFridge@lemmy.world · 4 months ago

My thesis is that we are asserting the lack of human-like qualities in AIs that we cannot define or measure. Assertions should be made on data, not uneasy feelings arising when an LLM falls into the uncanny valley.

mindlesscrollyparrot@discuss.tchncs.de · 4 months ago

But we do know how they operate. I saw a post a while back where somebody asked the LLM how it was calculating (incorrectly) the date of Easter. It answered with the formula for the date of Easter. The only problem is that that was a lie. It doesn’t calculate. You or I can perform long multiplication if asked to, but the LLM can’t (ironically, since the hardware it runs on is far better at multiplication than we are).

UnpluggedFridge@lemmy.world · 4 months ago

We do not know how LLMs operate. Similar to our own minds, we understand some primitives, but we have no idea how certain phenomenon emerge from those primitives. Your assertion would be like saying we understand consciousness because we know the structure of a neuron.

EatATaco@lemm.ee · 4 months ago

they do not understand why those things are true.

Some researchers compared the results of questions between chat gpt 3 and 4. One of the questions was about stacking items in a stable way. Chat gpt 3 just, in line with what you are saying about “without understanding”, listed the items saying to place them one on top of each other. No way it would have worked.

Chat gpt 4, however, said that you should put the book down first, put the eggs in a 3 x 3 grid on top of the book, trap them in a way with a laptop so they don’t roll around, and then put the bottle on top of the laptop standing up, and then balance the nail on the top of it…even noting you have to put the flat end of the nail down. This sounds a lot like understanding to me and not just rolling the dice hoping to be correct.

Yes, AI confidently gets stuff wrong. But let’s all note that there is a whole subreddit dedicated to people being confidently wrong. One doesn’t need to go any further than Lemmy to see people confidently claiming to know the truth about shit they should know is outside of their actual knowledge. We’re all guilty of this. Including refusing to learn when we are wrong. Additionally, the argument that they can’t learn doesn’t make sense because models have definitely become better.

Now I’m not saying ai is conscious, I really don’t know, but all of your shortcomings you’ve listed humans are guilty of too. So to use it as examples as to why it’s always just a hallucination, or that our thoughts are not, doesn’t seem to hold much water to me.

AstralPath@lemmy.ca · 4 months ago

A source link to what you’re referring to would be nice.

EatATaco@lemm.ee · 4 months ago

https://www.businessinsider.com/chatgpt-open-ai-balancing-task-convinced-microsoft-agi-closer-2023-5

astreus@lemmy.ml · 4 months ago

“We invented a new kind of calculator. It usually returns the correct value for the mathematics you asked it to evaluate! But sometimes it makes up wrong answers for reasons we don’t understand. So if it’s important to you that you know the actual answer, you should always use a second, better calculator to check our work.”

Then what is the point of this new calculator?

Fantastic comment, from the article.

lateraltwo@lemmy.world · 4 months ago

It’s a nascent stage technology that reflects the world’s words back at you in statistical order by way parsing user generated prompts. It’s a reactive system with no autonomy to deviate from a template upon reset. It’s no Rokos Basilisk inherently, just because

tourist@lemmy.world · 4 months ago

am I understanding correctly that it’s just a fancy random word generator

Gigasser@lemmy.world · 4 months ago

Not random, moreso probabilistic, which is almost the same thing granted.

Logi@lemmy.world · 4 months ago

It’s like letting auto complete always pick the next word in the sentence without typing anything yourself. But fancier.

Couldbealeotard@lemmy.world · 4 months ago

Yes, but it’s, like, really fancy.

elephantium@lemmy.world · 4 months ago

Some problems lend themselves to “guess-and-check” approaches. This calculator is great at guessing, and it’s usually “close enough”.

The other calculator can check efficiently, but it can’t solve the original problem.

Essentially this is the entire motivation for numerical methods.

Aceticon@lemmy.world · edit-2 4 months ago

In my personal experience given that’s how I general manage to shortcut a lot of labour intensive intellectual tasks, using intuition to guess possible answers/results and then working backwards from them to determine which one is right and even prove it, is generally faster (I guess how often it’s so depends on how good one’s intuition is in a given field, which in turn correlates with experience in it) because it’s usually faster to show that a result is correct than to arrive at it (and if it’s not, you just do it the old fashion way).

That said, it’s far from guaranteed faster and for those things with more than one solution might yield working but sub-optimal ones.

Further, merelly just the intuition step does not yield a result that can be trusted without validation.

Maybe by being used as intuition is in this process, LLMs can help accelerate the search for results in subjects one has not enough experience in to have good intuition on but has enough experience (or there are ways or tools to do it inherent to that domain) to do the “validation of possible results” part.

CaptainSpaceman@lemmy.world · 4 months ago

Its not just a calculator though.

Image generation requires no fact checking whatsoever, and some of the tools can do it well.

That said, LLMs will always have limitations and true AI is still a ways away.

pixel_prophet@lemm.ee · 4 months ago

The biggest disappointment in the image generation capabilities was the realisation that there is no object permanence there in terms of components making up an image so for any specificity you’re just playing whackamole with iterations that introduce other undesirable shit no matter how specific you make your prompts.

They are also now heavily nerfing the models to avoid lawsuits by just ignoring anything relating to specific styles that may be considered trademarks, problem is those are often industry jargon so now you’re having to craft more convoluted prompts and get more mid results.

sudneo@lemm.ee · 4 months ago

It does require fact-checking. You might ask a human and get someone with 10 fingers on one hand, you might ask people in the background and get blobs merged on each other. The fact check in images is absolutely necessary and consists of verifying that the generate image adheres to your prompt and that the objects in it match their intended real counterparts.

I do agree that it’s a different type of fact checking, but that’s because an image is not inherently correct or wrong, it only is if compared to your prompt and (where applicable) to reality.

elephantium@lemmy.world · 4 months ago

Image generation requires no fact checking whatsoever

Sure it does. Let’s say IKEA wants to use midjourney to generate images for its furniture assembly instructions. The instructions are already written, so the prompt is something like “step 3 of assembling the BorkBork kitchen table”.

Would you just auto-insert whatever it generated and send it straight to the printer for 20000 copies?

Or would you look at the image and make sure that it didn’t show a couch instead?

If you choose the latter, that’s fact checking.

That said, LLMs will always have limitations and true AI is still a ways away.

I can’t agree more strongly with this point!

catloaf@lemm.ee · 4 months ago

It doesn’t? Have you not seen any of the articles about AI-generated images being used for misinformation?

KubeRoot@discuss.tchncs.de · 4 months ago

That’s not really right, because verifying solutions is usually much easier than finding them. A calculator that can take in arbitrary sets of formulas and produce answers for variables, but is sometimes wrong, is an entirely different beast than a calculator that can plug values into variables and evaluate expressions to check if they’re correct.

As a matter of fact, I’m pretty sure that argument would also make quantum computing pointless - because quantum computers are probability based and can provide answers for difficult problems, but not consistently, so you want to use a regular computer to verify those answers.

Perhaps a better comparison would be a dictionary that can explain entire sentences, but requires you to then check each word in a regular dictionary and make sure it didn’t mix them up completely? Though I guess that’s actually exactly how LLMs operate…

assassin_aragorn@lemmy.world · 4 months ago

It’s only easier to verify a solution than come up with a solution when you can trust and understand the algorithms that are developing the solution. Simulation software for thermodynamics is magnitudes faster than hand calculations, but you know what the software is doing. The creators of the software aren’t saying “we don’t actually know how it works”.

In the case of an LLM, I have to verify everything with no trust whatsoever. And that takes longer than just doing it myself. Especially because an LLM is writing something for me, it isn’t doing complex math.

Danksy@lemmy.world · edit-2 4 months ago

If a solution is correct then a solution is correct. If a correct solution was generated randomly that doesn’t make it less correct. It just means that you may not always get correct solutions from the generating process, which is why they are checked after.

KubeRoot@discuss.tchncs.de · 4 months ago

Except when you’re doing calculations, a calculator can run through an equation substituting the given answers and see that the values match… Which is my point of calculators not being a good example. And the case of a quantum computer wasn’t addressed.

I agree that LLMs have many issues, are being used for bad purposes, are overhyped, and we’ve yet to see if the issues are solvable - but I think the analogy is twisting the truth, and I think the current state of LLMs being bad is not a license to make disingenuous comparisons.

assassin_aragorn@lemmy.world · 4 months ago

Its left to be seen in the future then

RecluseRamble@lemmy.dbzer0.com · edit-2 4 months ago

The problem is people thinking the tool is a “calculator” (or fact-checker or search engine) while it’s just a text generator. It’s great for generating text.

But even then it can’t keep a paragraph stable during the conversation. For me personally, the best antidote against the hype was to use the tool.

I don’t judge people believing it’s more than it is though. The industry is intentionally deceiving everyone about this and we also intuitively see intelligence when someone can eloquently express themselves. Seeing that in software seems magical.

We now have a great Star Trek like human machine interface. We only need real intelligence in the backend.

Nomecks@lemmy.ca · 4 months ago

deleted by creator

Zerfallen@lemmy.world · edit-2 4 months ago

It would be a great comment if it represented reality, but as an analogy it’s completely off.

LLM-based AI represents functionality that nothing other than the human mind and extensive research or singular expertise can replicate. There is no already existing ‘second, better calculator’ that has the same breadth of capabilities, particularly in areas involving language.

If you’re only using it as a calculator (which was never the strength of an LLM in the first place), for problems you could already solve with a calculator because you understand what is required, then uh… yeah i mean use a calculator, that is the appropriate tool.

ramirezmike@programming.dev · 4 months ago

do you know what an analogy is??

MentalEdge@sopuli.xyz · 4 months ago

Altman going “yeah we could make it get things right 100% of the time, but that would be boring” has such “my girlfriend goes to another school” energy it’s not even funny.

lectricleopard@lemmy.world · 4 months ago

The Chinese Room thought experiment is a good place to start the conversation. AI isn’t intelligent, and it doesn’t hallucinate. Its not sentient. It’s just a computer program.

People need to stop using personifying language for this stuff.

TubularTittyFrog@lemmy.world · 4 months ago

that’s not fun and dramatic and clickbaity though

lectricleopard@lemmy.world · 4 months ago

I think it’s that, and something even worse as well. There are probably many well meaning people working on these things thinking they really are creating and guiding and intelligence. It’s an opportunity to feel like a god and a tech wizard at the same time.

TheDarksteel94@sopuli.xyz · 4 months ago

Technically, humans are just bio machines, running very complicated software. AI just isn’t there yet.

ClamDrinker@lemmy.world · edit-2 4 months ago

It will never be solved. Even the greatest hypothetical super intelligence is limited by what it can observe and process. Omniscience doesn’t exist in the physical world. Humans hallucinate too - all the time. It’s just that our approximations are usually correct, and then we don’t call it a hallucination anymore. But realistically, the signals coming from our feet take longer to process than those from our eyes, so our brain has to predict information to create the experience. It’s also why we don’t notice our blinks, or why we don’t see the blind spot our eyes have.

AI representing a more primitive version of our brains will hallucinate far more, especially because it cannot verify anything in the real world and is limited by the data it has been given, which it has to treat as ultimate truth. The mistake was trying to turn AI into a source of truth.

Hallucinations shouldn’t be treated like a bug. They are a feature - just not one the big tech companies wanted.

When humans hallucinate on purpose (and not due to illness), we get imagination and dreams; fuel for fiction, but not for reality.

GoodEye8@lemm.ee · 4 months ago

I think you’re giving a glorified encyclopedia too much credit. The difference between us and “AI” is that we can approach knowledge from a problem solving position. We do approximate the laws of physics, but we don’t blindly take our beliefs and run with it. We put we come up with a theory that then gets rigorously criticized, then come up with ways to test that theory, then be critical of the test results and eventually we come to consensus that based on our understandings that thing is true. We’ve built entire frameworks to reduce our “hallucinations”. The reason we even know we have blind spots is because we’re so critical of our own “hallucinations” that we end up deliberately looking for our blind spots.

But the “AI” doesn’t do that. It can’t do that. The “AI” can’t solve problems, it can’t be critical of itself or what information its giving out. All our current “AI” can do is word vomit itself into a reasonable answer. Sometimes the word vomit is factually correct, sometimes it’s just nonsense.

You are right that theoretically hallucinations cannot be solved, but in practicality we ourselves have come up with solutions to minimize it. We could probably do something similar with “AI” but not when the AI is just a LLM that fumbles into sentences.

ClamDrinker@lemmy.world · edit-2 4 months ago

I’m not sure where you think I’m giving it too much credit, because as far as I read it we already totally agree lol. You’re right, methods exist to diminish the effect of hallucinations. That’s what the scientific method is. Current AI has no physical body and can’t run experiments to verify objective reality. It can’t fact check itself other than be told by the humans training it what is correct (and humans are fallible), and even then if it has gaps in what it knows it will fill it up with something probable - but which is likely going to be bullshit.

All my point was, is that to truly fix it would be to basically create an omniscient being, which cannot exist in our physical world. It will always have to make some assumptions - just like we do.

Eranziel@lemmy.world · edit-2 4 months ago

The fundamental difference is that the AI doesn’t know anything. It isn’t capable of understanding, it doesn’t learn in the same sense that humans learn. A LLM is a (complex!) digital machine that guesses the next most likely word based on essentially statistics, nothing more, nothing less.

It doesn’t know what it’s saying, nor does it understand the subject matter, or what a human is, or what a hallucination is or why it has them. They are fundamentally incapable of even perceiving the problem, because they do not perceive anything aside from text in and text out.

GoodEye8@lemm.ee · 4 months ago

It doesn’t need to verify reality, it needs to be internally consistent and it’s not.

For example I was setting up logging pipeline and one of the filters didn’t work. There was seemingly nothing wrong with configuration itself and after some more tests with dummy data I was able to get it working, but it still didn’t work with the actual input data. So I have the working dummy example and the actual configuration to chatGPT and asked why the actual configuration doesn’t work. After some prompts going over what I had already tried it ended up giving me the exact same configuration I had presented as the problem. Humans wouldn’t (or at least shouldn’t) make that error because it would be internally inconsistent, the problem statement can’t be the solution.

But the AI doesn’t have internal consistency because it doesn’t really think. It’s not making sure what it’s saying is logical based on the information it knows, it’s not trying to make assumptions to solve a problem, it can’t even deduce that something true is actuality true. All it can do is predict what we would perceive as the answer.

bastion@feddit.nl · edit-2 4 months ago

Indeed. It doesn’t even trend towards consistency.

It’s much like the pattern-matching layer of human consciousness. Its function isn’t to filter for truth, its function is to match knowns and potentials to patterns in its environment.

AI has no notion of critical thinking. It is purely positive “thinking”, in a technical sense - it is positing based on what it “knows”, but there is no genuine concept of self, nor even of critical thinking, nor even a non-conceptual logic or consistency filter.

KillingTimeItself@lemmy.dbzer0.com · 4 months ago

ok so to give you an um ackshually here.

Technically if we were to develop a real general artificial general intelligence, it would be limited to the amount of knowledge that it has, but so is any given human. And it’s advantage would still be scale of operations compared to a human, since it can realistically operate on all known theoretical and practical information, where as for a human that’s simply not possible.

Though presumably, it would also be influenced by AI posting that we already have now, to some degree, the question is how it responds to that, and how well it can determine the difference between that and real human posting.

the reason why hallucinations are such a big problem currently is simply due to the fact that it’s literally a predictive text model, it doesn’t know anything. That simply wouldn’t be true for a general artificial intelligence. Not that it couldn’t hallucinate, but it wouldn’t hallucinate to the same degree, and possibly with greater motives in mind.

A lot of the reason human biology tends to obfuscate certain things is simply due to the way it’s evolved, as well as it’s potential advantages in our life. The reason we can’t see our blindspots is due to the fact that it would be much more difficult to process things otherwise. It’s the same reason our eyesight is flipped as well. It’s the same reason pain is interpreted the way that it is.

a big mistake you are making here is stating that it must be fed information that it knows to be true, this is not inherently true. You can train a model on all of the wrong things to do, as long it has the capability to understand this, it shouldn’t be a problem.

For predictive models? This is probably the case, but you can also poison the well so to speak, when it comes to those even.

ClamDrinker@lemmy.world · edit-2 4 months ago

Yes, a theoretical future AI that would be able to self-correct would eventually become more powerful than humans, especially if you could give it ways to run magnitudes more self-correcting mechanisms at the same time. But it would still be making ever so small assumptions when there is a gap in the information it has.

It could be humble enough to admit it doesn’t know, but it can still be mistaken and think it has the right answer when it doesn’t. It would feel neigh omniscient, but it would never truly be.

A roundtrip around the globe on glass fibre takes hundreds of milliseconds, so even if it has the truth on some matter, there’s no guarantee that didn’t change in the milliseconds it needed to become aware that the truth has changed. True omniscience simply cannot exists since information (and in turn the truth encoded by that information) also propagates at the speed of light.

a big mistake you are making here is stating that it must be fed information that it knows to be true, this is not inherently true. You can train a model on all of the wrong things to do, as long it has the capability to understand this, it shouldn’t be a problem.

The dataset that encodes all wrong things would be infinite in size, and constantly change. It can theoretically exist, but realistically it will never happen. And if it would be incomplete it has to make assumptions at some point based on the incomplete data it has, which would open it up to being wrong, which we would call a hallucination.

KillingTimeItself@lemmy.dbzer0.com · 4 months ago

It could be humble enough to admit it doesn’t know, but it can still be mistaken and think it has the right answer when it doesn’t. It would feel neigh omniscient, but it would never truly be.

yeah and so are humans, so i mean, shit happens. Even then it’d likely be more accurate than a human just based off of the very fact that it knows more subjects than any given human. And all humans alive, because it’s knowledge is based off of the written works of the entirety of humanity, theoretically.

A roundtrip around the globe on glass fibre takes hundreds of milliseconds, so even if it has the truth on some matter, there’s no guarantee that didn’t change in the milliseconds it needed to become aware that the truth has changed. True omniscience simply cannot exists since information (and in turn the truth encoded by that information) also propagates at the speed of light.

well yeah, if we’re defining the ultimate truth as something that propagates through the universe at the highest known speed possible. That would be how that works, since it’s likely a device of it’s own accord, and or responsive to humans, it likely wouldn’t matter, as it would just wait a few seconds anyway.

The dataset that encodes all wrong things would be infinite in size, and constantly change. It can theoretically exist, but realistically it will never happen. And if it would be incomplete it has to make assumptions at some point based on the incomplete data it has, which would open it up to being wrong, which we would call a hallucination.

at that scale yes, but at this scale, with our current LLM technology, which was what i was talking about specifically, it wouldn’t matter. But even at that scale i don’t think it would classify as a hallucination, because a hallucination is a very specific type of being wrong. It’s literally pulling something out a thin air, and a theoretical general intelligence AI wouldn’t be pulling shit out of thin air, at best it would elaborate on what it knows already, which might be everything, or nothing, depending on the topic. But it shouldn’t just make something up out of thin air. It could very well be wrong about something, but that’s not likely to be a hallucination.

ClamDrinker@lemmy.world · edit-2 4 months ago

Yes, it would be much better at mitigating it and beat all humans at truth accuracy in general. And truths which can be easily individually proven and/or remain unchanged forever can basically be 100% all the time. But not all truths are that straight forward though.

What I mentioned can’t really be unlinked from the issue, if you want to solve it completely. Have you ever found out later on that something you told someone else as fact turned out not to be so? Essentially, you ‘hallucinated’ a truth that never existed, but you were just that confident it was correct to share and spread it. It’s how we get myths, popular belief, and folklore.

For those other truths, we simply ascertain the truth to be that which has reached a likelihood we consider it to be certain. But ideas and concepts we have in our minds constantly float around on that scale. And since we cannot really avoid talking to other people (or intelligent agents) to ascertain certain truths, misinterpretations and lies can sneak in to cause us to treat as truth that which is not. To avoid that would mean the having to be pretty much everywhere to personally interpret the information straight from the source. But then things like how fast it can process those things comes in to play. Without making guesses about what’s going to happen, you basically can’t function in reality.

Queen HawlSera@lemm.ee · edit-2 4 months ago

You assume the physical world is all there is or that the AI has any real intelligence at all. It’s a damn chinese room.

SolNine@lemmy.ml · 4 months ago

The simple solution is not to rely upon AI. It’s like a misinformed relative after a jar of moonshine, they might be right some of the time, or they might be totally full of shit.

I honestly don’t know why people are obsessed with relying on AI, is it that difficult to look up the answer from a reliable source?

force@lemmy.world · edit-2 4 months ago

is it that difficult to look up the answer from a reliable source?

With the current state of search engines and their content (almost completely unrelated garbage and shitty blogs make in like 3 minutes with 1/4 of the content poorly copy-pasted out of context from stackoverflow and most of the rest being pop-ups and ads), YES

SEO ““engineers”” deserve the guillotine

funkless_eck@sh.itjust.works · 4 months ago

because some jobs have to produce a bunch of bullshit text that no one will read quickly, or else parse a bunch of bullshit text for a single phrase in the midst of it all and put it in a report.

ZILtoid1991@lemmy.world · 4 months ago

Sites like that can be blacklisted with web browser plugins. Vastly improved my DuckDuckGo experience for a while, but it’ll be a Whack-A-Mole game from both sides, and yet again my searches are littered with SEO garbage at best, and AI-generated SEO garbage full with made up stuff at worst.

sebinspace@lemmy.world · 4 months ago

If it keeps me from going to stack and interacting with those degenerates, yes

KillingTimeItself@lemmy.dbzer0.com · 4 months ago

it’s only going to get worse, especially as datasets deteriorate.

With things like reddit being overrun by AI, and also selling AI training data, i can only imagine what mess that’s going to cause.

Cyberflunk@lemmy.world · 4 months ago

Hallucinations, like depression, is a multifaceted issue. Training data is only a piece of it. Quantized models, overfitted training models rely on memory at the cost of obviously correct training data. Poorly structured Inferences can confuse a model.

Rest assured, this isn’t just training data.

KillingTimeItself@lemmy.dbzer0.com · 4 months ago

yeah there’s also this stuff as well, though i consider that to be a more technical challenge, rather than a hard limit.

vegetal@lemmy.world · 4 months ago

I think you are spot on. I tend to think the problems may begin to outnumber the potentials.

KillingTimeItself@lemmy.dbzer0.com · 4 months ago

and we haven’t even gotten into the problem of what happens when you have no more data to feed it, do you make more? That’s an impossible task.

Lmaydev@programming.dev · 4 months ago

Honestly I feel people are using them completely wrong.

Their real power is their ability to understand language and context.

Turning natural language input into commands that can be executed by a traditional software system is a huge deal.

Microsoft released an AI powered auto complete text box and it’s genius.

Currently you have to type an exact text match in an auto complete box. So if you type cats but the item is called pets you’ll get no results. Now the ai can find context based matches in the auto complete list.

This is their real power.

Also they’re amazing at generating non factual based things. Stories, poems etc.

noodlejetski@lemm.ee · 4 months ago

Their real power is their ability to understand language and context.

…they do exactly none of that.

breakingcups@lemmy.world · 4 months ago

No, but they approximate it. Which is fine for most use cases the person you’re responding to described.

FarceOfWill@infosec.pub · 4 months ago

They’re really, really bad at context. The main failure case isn’t making things up, it’s having text or image in part of the result not work right with text or image in another part because they can’t even manage context across their own replies.

See images with three hands, where bow strings mysteriously vanish etc.

FierySpectre@lemmy.world · 4 months ago

New models are like really good at context, the amount of input that can be given to them has exploded (fairly) recently… So you can give whole datasets or books as context and ask questions about them.

Lmaydev@programming.dev · 4 months ago

They do it much better than anything you can hard code currently.

Blue_Morpho@lemmy.world · 4 months ago

So if you type cats but the item is called pets get no results. Now the ai can find context based matches in the auto complete list.

Google added context search to Gmail and it’s infuriating. I’m looking for an exact phrase that I even put in quotes but Gmail returns a long list of emails that are vaguely related to the search word.

Lmaydev@programming.dev · 4 months ago

That is indeed a poor use. Searching traditionally first and falling back to it would make way more sense.

Blue_Morpho@lemmy.world · 4 months ago

It shouldn’t even automatically fallback. If I am looking for an exact phrase and it doesn’t exist, the result should be “nothing found”, so that I can search somewhere else for the information. A prompt, “Nothing found. Look for related information?” Would be useful.

But returning a list of related information when I need an exact result is worse than not having search at all.

Apytele@sh.itjust.works · 4 months ago

deleted by creator

hedgehogging_the_bed@lemmy.world · 4 months ago

Searching with synonym matching is almost.decades old at this point. I worked on it as an undergrad in the early 2000s.and it wasn’t new then, just complicated. Google’s version improved over other search algorithms for a long time.and then trashed it by letting AI take over.

Lmaydev@programming.dev · edit-2 4 months ago

Google’s algorithm has pretty much always used AI techniques.

It doesn’t have to be a synonym. That’s just an example.

Typing diabetes and getting medical services as a result wouldn’t be possible with that technique unless you had a database of every disease to search against for all queries.

The point is AI means you don’t have to have a giant lookup of linked items as it’s trained into it already.

hedgehogging_the_bed@lemmy.world · 4 months ago

Yes, synonym searching doesn’t strictly mean the thesaurus. There are a lot of different ways to connect related terms and some variation in how they are handled from one system to the next. Letting machine learning into the mix is a very new step in a process that Library and Information Sci has been working on for decades.

Th4tGuyII@kbin.social · 4 months ago

Exactly. The big problem with LLMs is that they’re so good at mimicking understanding that people forget that they don’t actually have understanding of anything beyond language itself.

The thing they excel at, and should be used for, is exactly what you say - a natural language interface between humans and software.

Like in your example, an LLM doesn’t know what a cat is, but it knows what words describe a cat based on training data - and for a search engine, that’s all you need.

Voroxpete@sh.itjust.works · 4 months ago

That’s called “fuzzy” matching, it’s existed for a long, long time. We didn’t need “AI” to do that.

Lmaydev@programming.dev · edit-2 4 months ago

No it’s not.

Fuzzy matching is a search technique that uses a set of fuzzy rules to compare two strings. The fuzzy rules allow for some degree of similarity, which makes the search process more efficient.

That allows for mis typing etc. it doesn’t allow context based searching at all. Cat doesn’t fuzz with pet. There is no similarity.

Also it is an AI technique itself.

hedgehogging_the_bed@lemmy.world · 4 months ago

Bullshit, fuzzy matching is a lot older than this AI LLM.

Lmaydev@programming.dev · 4 months ago

I didn’t say LLM. AI has existed since the 50s/60s. Fuzzy matching is an AI technique.

not_amm@lemmy.ml · 4 months ago

That’s why I only use Perplexity. ChatGPT can’t give me sources unless I pay, so I can’t trust information it gives me and it also hallucinated a lot when coding, it was faster to search in the official documentation rather than correcting and debugging code “generated” by ChatGPT.

I use Perplexity + SearXNG, so I can search a lot faster, cite sources and it also makes summaries of your search, so it saves me time while writing introductions and so.

It sometimes hallucinates too and cites weird sources, but it’s faster for me to correct and search for better sources given the context and more ideas. In summary, when/if you’re correcting the prompts and searching apart from Perplexity, you already have something useful.

BTW, I try not to use it a lot, but it’s way better for my workflow.

Hugin@lemmy.world · 4 months ago

Prisencolinensinainciusol an Italian song that is complete gibberish but made to sound like an English language song. That’s what AI is right now.

https://www.youtube.com/watch?v=RObuKTeHoxo

noughtnaut@lemmy.world · 4 months ago

Oh that is hilarious! Just on my first listen but I don’t quite get the lyrics - am deeply disappointed that the video doesn’t have subs. 🤭

Found it on Spotify. It’s so much worse with lyrics. Thank you for sharing a version without them! 🙏

Flying Squid@lemmy.world · 4 months ago

The Italians actually have a name for that kind of gibberish talking that sounds real. I did some VO work on a project being directed by an Italian guy and he explained what he wanted me to do by explaining the term to me first. I’m afraid it’s been way too long since he told me for me to remember it though.

Another example would be the La Linea cartoons, where the main character speaks a gibberish which seems to approximate Italian to my ears.

https://www.youtube.com/watch?v=ldff__DwMBc

yildolw@lemmy.world · 4 months ago

Jazz musicians have a name for gibberish talking that sounds real: scat

We have to stop ignoring AI’s scat problem

Gen Alpha has a name for gibberish talking that sounds real: skibidi toilet

We have to stop ignoring AI’s skibidi toilet problem

ALostInquirer@lemm.ee · 4 months ago

Why do tech journalists keep using the businesses’ language about AI, such as “hallucination”, instead of glitching/bugging/breaking?

superminerJG@lemmy.world · 4 months ago

hallucination refers to a specific bug (AI confidently BSing) rather than all bugs as a whole

Blackmist@feddit.uk · 4 months ago

Honestly, it’s the most human you’ll ever see it act.

It’s got upper management written all over it.

ALostInquirer@lemm.ee · edit-2 4 months ago

(AI confidently BSing)

Isn’t it more accurate to say it’s outputting incorrect information from a poorly processed prompt/query?

vithigar@lemmy.ca · 4 months ago

No, because it’s not poorly processing anything. It’s not even really a bug. It’s doing exactly what it’s supposed to do, spit out words in the “shape” of an appropriate response to whatever was just said

ALostInquirer@lemm.ee · edit-2 4 months ago

When I wrote “processing”, I meant it in the sense of getting to that “shape” of an appropriate response you describe. If I’d meant this in a conscious sense I would have written, “poorly understood prompt/query”, for what it’s worth, but I see where you were coming from.

Danksy@lemmy.world · 4 months ago

It’s not a bug, it’s a natural consequence of the methodology. A language model won’t always be correct when it doesn’t know what it is saying.

vrighter@discuss.tchncs.de · 4 months ago

it never knows what it’s saying

Danksy@lemmy.world · 4 months ago

That was what I was trying to say, I can see that the wording is ambiguous.

TheDarksteel94@sopuli.xyz · 4 months ago

Oh, at some point it will lol

ALostInquirer@lemm.ee · 4 months ago

Yeah, on further thought and as I mention in other replies, my thoughts on this are shifting toward the real bug of this being how it’s marketed in many cases (as a digital assistant/research aid) and in turn used, or attempted to be used (as it’s marketed).

Danksy@lemmy.world · 4 months ago

I agree, it’s a massive issue. It’s a very complex topic that most people have no way of understanding. It is superb at generating text, and that makes it look smarter than it actually is, which is really dangerous. I think the creators of these models have a responsibility to communicate what these models can and can’t do, but unfortunately that is not profitable.

machinin@lemmy.world · 4 months ago

https://en.m.wikipedia.org/wiki/Hallucination_(artificial_intelligence)

The term “hallucinations” originally came from computer researchers working with image producing AI systems. I think you might be hallucinating yourself 😉

ALostInquirer@lemm.ee · 4 months ago

Fun part is, that article cites a paper mentioning misgivings with the terminology: AI Hallucinations: A Misnomer Worth Clarifying. So at the very least I’m not alone on this.

blazeknave@lemmy.world · 4 months ago

Ty. As soon as I saw the headline, I knew I wouldn’t be finding value in the article.

ALostInquirer@lemm.ee · 4 months ago

It’s not a bad article, honestly, I’m just tired of journalists and academics echoing the language of businesses and their marketing. “Hallucinations” aren’t accurate for this form of AI. These are sophisticated generative text tools, and in my opinion lack any qualities that justify all this fluff terminology personifying them.

Also frankly, I think students have one of the better applications for large-language model AIs than many adults, even those trying to deploy them. Students are using them to do their homework, to generate their papers, exactly one of the basic points of them. Too many adults are acting like these tools should be used in their present form as research aids, but the entire generative basis of them undermines their reliability for this. It’s trying to use the wrong tool for the job.

You don’t want any of the generative capacities of a large-language model AI for research help, you’d instead want whatever text-processing it may be able to do to assemble and provide accurate output.

SulaymanF@lemmy.world · 4 months ago

We also have to stop calling it hallucinations. The proper term in psychology for making stuff up like this is “Confabulations.”

xia@lemmy.sdf.org · 4 months ago

Yeah! Just like water’s “wetness” problem. It’s kinda fundamental to how the tech operates.

CrayonRosary@lemmy.world · 4 months ago

More importantly, we need to stop ignoring criminal case eye witness’ hallucinatory testimony.

Possibly linux@lemmy.zip · 4 months ago

What do you think we are working on?

Cyberflunk@lemmy.world · 4 months ago

Holy shit. Dunning Kruger is fully engaged in these post comments