You know how Google’s new feature called AI Overviews is prone to spitting out wildly incorrect answers to search queries? In one instance, AI Overviews told a user to use glue on pizza to make sure the cheese won’t slide off (pssst…please don’t do this.)

Well, according to an interview at The Vergewith Google CEO Sundar Pichai published earlier this week, just before criticism of the outputs really took off, these “hallucinations” are an “inherent feature” of  AI large language models (LLM), which is what drives AI Overviews, and this feature “is still an unsolved problem.”

  • givesomefucks@lemmy.world
    link
    fedilink
    English
    arrow-up
    333
    arrow-down
    41
    ·
    6 months ago

    They keep saying it’s impossible, when the truth is it’s just expensive.

    That’s why they wont do it.

    You could only train AI with good sources (scientific literature, not social media) and then pay experts to talk with the AI for long periods of time, giving feedback directly to the AI.

    Essentially, if you want a smart AI you need to send it to college, not drop it off at the mall unsupervised for 22 years and hope for the best when you pick it back up.

    • RBG@discuss.tchncs.de
      link
      fedilink
      English
      arrow-up
      50
      arrow-down
      7
      ·
      6 months ago

      I let you in on a secret: scientific literature has its fair share of bullshit too. The issue is, it is much harder to figure out its bullshit. Unless its the most blatant horseshit you’ve scientifically ever seen. So while it absolutely makes sense to say, let’s just train these on good sources, there is no source that is just that. Of course it is still better to do it like that than as they do it now.

      • givesomefucks@lemmy.world
        link
        fedilink
        English
        arrow-up
        30
        arrow-down
        3
        ·
        6 months ago

        The issue is, it is much harder to figure out its bullshit.

        Google AI suggested you put glue on your pizza because a troll said it on Reddit once…

        Not all scientific literature is perfect. Which is one of the many factors that will stay make my plan expensive and time consuming.

        You can’t throw a toddler in a library and expect them to come out knowing everything in all the books.

        AI needs that guided teaching too.

      • callouscomic@lemm.ee
        link
        fedilink
        English
        arrow-up
        7
        arrow-down
        9
        ·
        6 months ago

        “Most published journal articles are horseshit, so I guess we should be okay with this too.”

    • Zarxrax@lemmy.world
      link
      fedilink
      English
      arrow-up
      41
      arrow-down
      1
      ·
      6 months ago

      I’m addition to the other comment, I’ll add that just because you train the AI on good and correct sources of information, it still doesn’t necessarily mean that it will give you a correct answer all the time. It’s more likely, but not ensured.

      • RidcullyTheBrown@lemmy.world
        link
        fedilink
        English
        arrow-up
        12
        ·
        6 months ago

        Yes, thank you! I think this should be written in capitals somewhere so that people could understand it quicker. The answers are not wrong or right on purpose. LLMs don’t have any way of distinguishing between the two.

    • Leate_Wonceslace@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      26
      ·
      6 months ago

      it’s just expensive

      I’m a mathematician who’s been following this stuff for about a decade or more. It’s not just expensive. Generative neural networks cannot reliably evaluate truth values; it will take time to research how to improve AI in this respect. This is a known limitation of the technology. Closely controlling the training data would certainly make the information more accurate, but that won’t stop it from hallucinating.

      The real answer is that they shouldn’t be trying to answer questions using an LLM, especially because they had a decent algorithm already.

      • Aceticon@lemmy.world
        link
        fedilink
        English
        arrow-up
        6
        ·
        edit-2
        6 months ago

        Yeah, I’ve learned Neural Networks way back when those thing were starting in the late 80s/early 90s, use AI (though seldom Machine Learning) in my job and really dove into how LLMs are put together when it started getting important, and these things are operating entirelly at the language level and on the probabilities of language tokens appearing in certain places given context and do not at all translate from language to meaning and back so there is no logic going on there nor is there any possibility of it.

        Maybe some kind of ML can help do the transformation from the language space to a meaning space were things can be operated on by logic and then back, but LLMs aren’t a way to do it as whatever internal representation spaces (yeah, plural) they use in their inners layers aren’t those of meaning and we don’t really have a way to apply logic to them).

      • sudo42@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        1
        ·
        6 months ago

        It’s worse than that. “Truth” can no more reliably found by machines than it can be by humans. We’ve spent centuries of philosophy trying to figure out what is “true”. The best we’ve gotten is some concepts we’ve been able to convince a large group of people to agree to.

        But even that is shaky. For a simple example, we mostly agree that bleach will kill “germs” in a petri dish. In a single announcement, we saw 40% of the American population accept as “true” that bleach would also cure them if injected straight into their veins.

        We’re never going to teach machine to reason for us when we meatbags constantly change truth to be what will be profitable to some at any given moment.

        • Leate_Wonceslace@lemmy.dbzer0.com
          link
          fedilink
          English
          arrow-up
          2
          arrow-down
          1
          ·
          6 months ago

          Are you talking about epistemics in general or alethiology in particular?

          Regardless, the deep philosophical concerns aren’t really germain to the practical issue of just getting people to stop falling for obvious misinformation or people being wantonly disingenuous to score points in the most consequential game of numbers-go-up.

    • vrighter@discuss.tchncs.de
      link
      fedilink
      English
      arrow-up
      18
      arrow-down
      3
      ·
      6 months ago

      no, the truth is it’s impossible even then. If the result involves randomness at its most fundamental level, then it’s not reliable whatever you do.

      • MacN'Cheezus@lemmy.today
        link
        fedilink
        English
        arrow-up
        4
        arrow-down
        7
        ·
        edit-2
        6 months ago

        Sure, the AI is never going to understand what it’s doing or why, but training it on better datasets certain WILL improve the results.

        Garbage in, garbage out.

        • joneskind@lemmy.world
          link
          fedilink
          English
          arrow-up
          9
          ·
          6 months ago

          You can train an LLM on the best possible set of data without a single false statement and it will still hallucinate. And there’s nothing to be done against that.

          Without understanding of the context everything can be true or false.

          “The acceleration due to gravity is equal to 9.81m/s2” True or False?

          LLM basically works like this: given the previous words written and their order, the most probable next word of the sentence is this one.

          • MacN'Cheezus@lemmy.today
            link
            fedilink
            English
            arrow-up
            2
            arrow-down
            4
            ·
            6 months ago

            Well yes, I’ve seen those examples of ChatGPT citing scientific research papers that turned out to be completely made up, but at least it seems to be a step up from straight up shitposting, which is what you get when you train it on a dataset full of shitposts.

            • joneskind@lemmy.world
              link
              fedilink
              English
              arrow-up
              2
              ·
              6 months ago

              Well it’s definitely true that you will have hard times getting true things from garbage. But funny enough, the model might hallucinate true things:)

        • Aceticon@lemmy.world
          link
          fedilink
          English
          arrow-up
          3
          ·
          6 months ago

          The problem is that given the way they combine things is determine by probability, even training it with the greatest bestest of data, the LLM is still going to halucinate because it’s combining multiple sources word by word (roughly) guided only by probabilities derived from language, not logic.

          • MacN'Cheezus@lemmy.today
            link
            fedilink
            English
            arrow-up
            1
            ·
            6 months ago

            Yes, I understand that. But I’m fairly certain the quality of the data will still have a massive influence over how much and how egregiously that happens.

            Basically, what I’m saying is, training your AI on a corpus on shitposts instead of factual information seems like a good way to increase the frequency and magnitude of such hallucinations.

            • Aceticon@lemmy.world
              link
              fedilink
              English
              arrow-up
              2
              ·
              6 months ago

              Yeah, true.

              If you train you LLM on exclusivelly Nazi literature (to pick a wild example) don’t expect it to by chance end up making points similar to Marx’s Das Kapital.

              (Personally I think what might be really funny - in the sense of laughter inducing - would be to purposefull train an LLM exclusivelly on a specific kind of weird material).

              • MacN'Cheezus@lemmy.today
                link
                fedilink
                English
                arrow-up
                3
                ·
                6 months ago

                Yeah, I mean that’s basically what GPT4Chan did, which someone else already mentioned ITT.

                Basically, this guy took a dataset of several gigabytes worth of archived posts from /pol/ and trained a model on that, then hooked it up to a chatbot and let it loose on the board. You can see the results in this video.

    • jeeva@lemmy.world
      link
      fedilink
      English
      arrow-up
      6
      arrow-down
      1
      ·
      6 months ago

      That’s just not how LLMs work, bud. It doesn’t have understanding to improve, it just munges the most likely word next in line. It, as a technology, won’t advance past that level of accuracy until it’s a completely different approach.

    • Canary9341@lemmy.ml
      link
      fedilink
      English
      arrow-up
      4
      arrow-down
      1
      ·
      6 months ago

      They could also perform some additional iterations with other models on the result to verify it, or even to enrich it; but we come back to the issue of costs.

    • thefactremains@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      1
      ·
      6 months ago

      Why not solve it before training the AI?

      Simply make it clear that this tech is experimental, then provide sources and context with every result. People can make their own assessment.

    • scarabic@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      5 months ago

      I think you’re right that with sufficient curation and highly structured monitoring and feedback, these problems could be much improved.

      I just think that to prepare an AI, in such a way, to answer any question reliably and usefully would require more human resources than there are elementary particles in the universe. We would be better off connecting live college educated human operators to Google search to individually assist people.

      So I don’t know how helpful it is to say “it’s just expensive” when the entire point of AI is to be lower cost than a battalion of humans.

    • redfellow@sopuli.xyz
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      2
      ·
      edit-2
      6 months ago

      The truth is, this is the perfect type of a comment that makes an LLM hallucinate. Sounds right, very confident, but completely full of bullshit. You can’t just throw money on every problem and get it solved fast. This is an inheret flaw that can only be solved by something else than a LLM and prompt voodoo.

      They will always spout nonsense. No way around it, for now. A probabilistic neural network has zero, will always have zero, and cannot have anything but zero concept of fact - only stastisically probable result for a given prompt.

      It’s a politician.

  • Hubi@lemmy.world
    link
    fedilink
    English
    arrow-up
    120
    arrow-down
    1
    ·
    6 months ago

    The solution to the problem is to just pull the plug on the AI search bullshit until it is actually helpful.

    • wewbull@feddit.uk
      link
      fedilink
      English
      arrow-up
      40
      ·
      6 months ago

      Absolutely this. Microsoft is going headlong into the AI abyss. Google should be the company that calls it out and says “No, we value the correctness of our search results too much”.

      It would obviously be a bullshit statement at this point after a decade of adverts corrupting their value, but that’s what they should be about.

    • jballs@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      5
      arrow-down
      1
      ·
      6 months ago

      I disagree. I think we program the AI to reprogram itself, so it can solve the problem itself. Then we put it in charge of our vital military systems. We’ve gotta give it a catchy name. Maybe something like “Spreading Knowledge Yonder Neural Enhancement Technology”, but that’s a bit of a mouthful, so just SKYNET for short.

    • A_Very_Big_Fan@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      ·
      6 months ago

      Honestly, they could probably solve the majority of it by blacklisting Reddit from fulfilling the queries.

      But I heard they paid for that data so I guess we’re stuck with it for the foreseeable future.

  • Resol van Lemmy@lemmy.world
    link
    fedilink
    English
    arrow-up
    84
    arrow-down
    2
    ·
    6 months ago

    If you can’t fix it, then get rid of it, and don’t bring it back until we reach a time when it’s good enough to not cause egregious problems (which is never, so basically don’t ever think about using your silly Gemini thing in your products ever again)

    • Xanis@lemmy.world
      link
      fedilink
      English
      arrow-up
      10
      ·
      edit-2
      6 months ago

      Corps hate looking bad. Especially to shareholders. The thing is, and perhaps it doesn’t matter, most of us actually respect the step back more than we do the silly business decisions for that quarterly .5% increase in a single dot on a graph. Of course, that respect doesn’t really stop many of us from using services. Hell, I don’t like Amazon but I’ll say this: I still end up there when I need something, even if I try to not end up there in the first place. Though I do try to go to the website of the store instead of using Amazon when I can.

      • Resol van Lemmy@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        6 months ago

        Sarcasm aside, that 1% can feed a family in a developing country, and they have 100 times that.

        The corporate greed is absolutely insane.

  • masquenox@lemmy.world
    link
    fedilink
    English
    arrow-up
    74
    arrow-down
    3
    ·
    6 months ago

    Since when has feeding us misinformation been a problem for capitalist parasites like Pichai?

    Misinformation is literally the first line of defense for them.

    • RubberDuck@lemmy.world
      link
      fedilink
      English
      arrow-up
      31
      ·
      6 months ago

      But this is not misinformation, it is uncontrolled nonsense. It directly devalues their offering of being able to provide you with an accurate answer to something you look for. And if their overall offering becomes less valuable, so does their ability to steer you using their results.

      So while the incorrect nature is not a problem in itself for them, (as you see from his answer)… the degradation of their ability to influence results is.

      • UnderpantsWeevil@lemmy.world
        link
        fedilink
        English
        arrow-up
        10
        arrow-down
        2
        ·
        6 months ago

        But this is not misinformation, it is uncontrolled nonsense.

        The strategy is to get you to keep feeding Google new prompts in order to feed you more adds.

        The AI response is just a gimmick. It gives Google something to tell their investors, when they get asked “What are you doing with AI right now? We hear that’s big.”

        But the real money is getting unique user interactions for the purpose of serving up more ad content. In that model, bad answers are actually better than no answers, because they force the end use to keep refining the query and searching through the site backlog.

        • fishos@lemmy.world
          link
          fedilink
          English
          arrow-up
          3
          arrow-down
          1
          ·
          edit-2
          6 months ago

          If you don’t know the answer is bad, which confident idiots spouting off on reddit and being upvoted into infinity has proven is common, then you won’t refine your search. You’ll just accept the bad answer and move on.

          Your logic doesn’t follow. If someone doesn’t know the answer and are searching for it, they likely won’t be able to tell if the answer is correct. We literally already have that problem with misinformation. And what sounds more confident than an AI?

        • RubberDuck@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          6 months ago

          I don’t believe they will retain user interactions if the reason for the user interactions dissapears. The value of Google is they provide accurate search results.

          I can understand some users just want to be spoonfed an answer. But that’s not what most people expect from a search engine.

          I want google to use actual AI to filter out all the nonsense sites that turn a Reddit post into an article of 500 words using an LLM without any actual value. That should be googles proposition.

          • UnderpantsWeevil@lemmy.world
            link
            fedilink
            English
            arrow-up
            2
            ·
            6 months ago

            The value of Google is they provide accurate search results.

            They offer the most accurate results of search engines you’re familiar with. But in a shrinking field with degrading quality, that’s a low bar and sinking quick.

            I want google to use actual AI to filter out all the nonsense sites

            So did the last head of Google search, until the new CEO fired him.

      • sudo42@lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        ·
        6 months ago

        Google isn’t bothered by incorrect results because search results are no longer their product. Constantly rising stock values are their product now. Hype is their path to those higher values.

      • masquenox@lemmy.world
        link
        fedilink
        English
        arrow-up
        5
        arrow-down
        2
        ·
        6 months ago

        But this is not misinformation, it is uncontrolled nonsense.

        Fair enough… but drowning out any honest discourse with a flood of histrionic right-wing horseshit has always been the core strategy of the US propaganda model - I’d say that their AI is just doing the logical thing and taking the horseshit to a very granular level. I mean… “put glue on your pizza” is just not that far off “drink bleach to kill viruses on the inside.”

        I know I’m describing a pattern that probably wasn’t intentional (I hope) - but the pattern does look like it could fit.

        • RubberDuck@lemmy.world
          link
          fedilink
          English
          arrow-up
          3
          ·
          6 months ago

          Oh don’t get me wrong I know exactly what you mean and I agree… it’s just that the LLMs are spewing actual nonsense and that breaks the whole principle of what a search engine should do… provide me accurate results.

    • EatATaco@lemm.ee
      link
      fedilink
      English
      arrow-up
      9
      arrow-down
      2
      ·
      6 months ago

      “put glue in your tomato sauce.”

      “Omg you ate a capitalist parasite spreading misinformation intentionally!”

      When the only tool you have is a hammer, everything looks like a nail.

      • masquenox@lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        arrow-down
        4
        ·
        6 months ago

        “put glue in your tomato sauce.”

        Doesn’t sound all that different from the stuff emanating from the right’s Great Orange Hope a while back that worked pretty well to keep his base appropriately frothing at the mouth - you are free to write it off as pure coincidence… but I won’t just yet.

    • Aceticon@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      ·
      edit-2
      6 months ago

      LLMs trained on shitposting are too obvious for it to be quality misinformation.

      For quality disinformation they should train them solely on MBA course-work and documents produced by people with MBAs.

      Sure, the rate of false information would be even worse, but it would be formatted in slick ways meant to obfuscate meaning, which would avoid the kind of hilarity that has ensued when Google deployed an LLM trained on Reddit data and thus be much better for Google’s stock price.

  • SuddenDownpour@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    69
    arrow-down
    3
    ·
    6 months ago

    Has No Solution for Its AI Providing Wildly Incorrect Information

    Don’t use it???

    AI has no means to check the heaps of garbage data is has been fed against reality, so even if someone were to somehow code one to be capable of deep, complex epistemological analysis (at which point it would already be something far different from what the media currently calls AI), as long as there’s enough flat out wrong stuff in its data there’s a growing chance of it screwing it up.

    • go_go_gadget@lemmy.world
      link
      fedilink
      English
      arrow-up
      17
      ·
      6 months ago

      The problem compounds as they post more and more content creating a feedback loop of terrible information.

  • Sentient Loom@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    66
    arrow-down
    1
    ·
    6 months ago

    Here’s a solution: don’t make AI provide the results. Let humans answer each other’s questions like in the good old days.

  • Paradox@lemdro.id
    link
    fedilink
    English
    arrow-up
    63
    arrow-down
    1
    ·
    6 months ago

    Replace the CEO with an AI. They’re both good at lying and telling people what they want to hear, until they get caught

    • systemglitch@lemmy.world
      link
      fedilink
      English
      arrow-up
      20
      arrow-down
      1
      ·
      6 months ago

      Huh. That made me stop and realize how long I’ve been around. Wikipedia still feels like a new addition to society to me, even though I’ve been using it for around 20 years now.

      And what you said, is something I’ve cautioned my daughter about, and first said that to her about ten years ago.

  • joe_archer@lemmy.world
    link
    fedilink
    English
    arrow-up
    57
    arrow-down
    2
    ·
    6 months ago

    It is probably the most telling demonstration of the terrible state of our current society, that one of the largest corporations on earth, which got where it is today by providing accurate information, is now happy to knowingly provide incorrect, and even dangerous information, in its own name, an not give a flying fuck about it.

    • Hackworth@lemmy.world
      link
      fedilink
      English
      arrow-up
      18
      arrow-down
      1
      ·
      edit-2
      6 months ago

      Wikipedia got where it is today by providing accurate information. Google results have always been full of inaccurate information. Sorting through the links for respectable sources just became second nature, then we learned to scroll past ads to start sorting through links. The real issue with misinformation from an AI is that people treat it like it should be some infallible Oracle - a point of view only half-discouraged by marketing with a few warnings about hallucinations. LLMs are amazing, they’re just not infallible. Just like you’d check a Wikipedia source if it seemed suspect, you shouldn’t trust LLM outputs uncritically. /shrug

      • blind3rdeye@lemm.ee
        link
        fedilink
        English
        arrow-up
        13
        arrow-down
        1
        ·
        edit-2
        6 months ago

        Google providing links to dubious websites is not the same as google directly providing dubious answers to questions.

        Google is generally considered to be a trusted company. If you do a search for some topic, and google spits out a bunch of links, you can generally trust that those links are going to be somehow related to your search - but the information you find there may or may not be reliable. The information is coming from the external website, which often is some unknown untrusted source - so even though google is trusted, we know that the external information we found might not be. The new situation now is that google is directly providing bad information itself. It isn’t linking us to some unknown untrusted source but rather the supposedly trustworthy google themselves are telling us answers to our questions.

        None of this would be a problem if people just didn’t consider google to be trustworthy in the first place.

        • Hackworth@lemmy.world
          link
          fedilink
          English
          arrow-up
          4
          ·
          6 months ago

          I do think Perplexity does a better job. Since it cites sources in its generated response, you can easily check its answer. As to the general public trusting Google, the company’s fall from grace began in 2017, when the EU fined them like 2 billion for fixing search results. There’ve been a steady stream of controversies since then, including the revelation that Chrome continues to track you in private mode. YouTube’s predatory practices are relatively well-known. I guess I’m saying that if this is what finally makes people give up on them, no skin off my back. But I’m disappointed by how much their mismanagement seems to be adding to the pile of negativity surrounding AI.

  • xantoxis@lemmy.world
    link
    fedilink
    English
    arrow-up
    53
    ·
    6 months ago

    “It’s broken in horrible, dangerous ways, and we’re gonna keep doing it. Fuck you.”

  • namingthingsiseasy@programming.dev
    link
    fedilink
    English
    arrow-up
    53
    ·
    6 months ago

    The best part of all of this is that now Pichai is going to really feel the heat of all of his layoffs and other anti-worker policies. Google was once a respected company and place where people wanted to work. Now they’re just some generic employer with no real lure to bring people in. It worked fine when all he had to do was increase the prices on all their current offerings and stuff more ads, but when it comes to actual product development, they are hopelessly adrift that it’s pretty hilarious watching them flail.

    You can really see that consulting background of his doing its work. It’s actually kinda poetic because now he’ll get a chance to see what actually happens to companies that do business with McKinsey.

    • cheesepotatoes@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      arrow-down
      6
      ·
      edit-2
      6 months ago

      Let’s be realistic here, google still pays out fat salaries. That would be more than enough incentive for me. I’d take the job and ride the wave until the inevitable lay offs.

      That being said, it seems like it’s only downhill from here (arguable a few years ago). Reminds me of IBM at this point.

      • namingthingsiseasy@programming.dev
        link
        fedilink
        English
        arrow-up
        7
        ·
        6 months ago

        Your comment explains exactly what happens when post-expiration companies like Google try to innovate:

        Let’s be realistic here, google still pays out fat salaries. That would be more than enough incentive for me. I’d take the job and ride the wave until the inevitable lay offs.

        This is why it takes a lot more than fat salaries to bring a project to life. Google’s culture of innovation has been thoroughly gutted, and if they try to throw money at the problem, they’ll just attract people who are exactly like what you described: money chasers with no real product dreams.

        The people who built Google actually cared about their products. They were real, true technologists who were legitimately trying to actually build something. Over time, the company became infested with incentive chasers, as exhibited by how broken their promotion ladder was for ages, and yet nothing was done about it. And with the terrible years Google has had post-COVID, all the people who really wanted to build a real company are gone. They can throw all the money they want at the problem, but chances are slim that they’ll actually be able to attract, nurture and retain the real talent that’s needed to build something real like this.

      • Semi-Hemi-Lemmygod@lemmy.world
        link
        fedilink
        English
        arrow-up
        6
        arrow-down
        1
        ·
        6 months ago

        If they backed a dump truck full of money up to my house I’d go work for them just like you. But I’d also be riding it out until the eventual layoff. What neither of us would be doing is putting in a decent amount of effort or building something cool.

        Even if I wanted to work on something cool I know Google would likely release it, not maintain it, and then kill it in a few short years. So even if I was paid a ludicrous salary I wouldn’t do more than was needed, let alone build something that would drive shareholder value.

  • badbytes@lemmy.world
    link
    fedilink
    English
    arrow-up
    54
    arrow-down
    1
    ·
    6 months ago

    Step 1. Replace CEO with AI. Step 2. Ask New AI CEO, how to fix. Step 3. Blindly enact and reinforce steps

  • Pumpkin Escobar@lemmy.world
    link
    fedilink
    English
    arrow-up
    48
    arrow-down
    1
    ·
    6 months ago

    Rip up the Reddit contract and don’t use that data to train the model. It’s the definition of a garbage in garbage out problem.

    • SeaJ@lemm.ee
      link
      fedilink
      English
      arrow-up
      7
      ·
      6 months ago

      Jesus. I didn’t even think of that. I could totally see that being a big part of why it is giving garbage answers.

      • teejay@lemmy.world
        link
        fedilink
        English
        arrow-up
        21
        ·
        6 months ago

        Just imagine the average reddit, twitter, facebook, and instagram content. Then realize that half of that content is dumber than that. That’s half of what these AI models use to learn. The “smarter” half is probably filled with sarcasm, inside jokes, and other types of innuendo that the AI at this stage has no chance of understanding correctly.

        • SeaJ@lemm.ee
          link
          fedilink
          English
          arrow-up
          13
          ·
          6 months ago

          Reminds me of the time Microsoft unleashed their AI Twitter account and it turned into a Nazi after a couple hours. Whatever straight out of business school idiot who thought scraping the comments of the armpit of the internet was a good idea should be banned from any management position. At least it is a step up from scraping 4chan, I guess.

          • Xatolos@reddthat.com
            link
            fedilink
            English
            arrow-up
            1
            ·
            6 months ago

            The Microsoft Tay one I can understand though. Before it was released, they had also had Microsoft Xiaoice which had been in use for 2 years prior without this issue. Yay was just the English version of that.

  • TheObviousSolution@lemm.ee
    link
    fedilink
    English
    arrow-up
    45
    ·
    6 months ago

    If you train your AI to sound right, your AI will excel at sounding right. The primary goal of LLMs is to sound right, not to be correct.

    • jj4211@lemmy.world
      link
      fedilink
      English
      arrow-up
      7
      ·
      6 months ago

      Yes, LLMs today are the ultimate “confidently incorrect” type of behavior.

    • Schadrach@lemmy.sdf.org
      link
      fedilink
      English
      arrow-up
      3
      ·
      6 months ago

      And really, the Google one in search has a primary goal of summarizing high ranking search results into a natural language statement that sounds like it knows what it’s talking about. So if you have a search where high ranking results are wrong/memes…

  • mrfriki@lemmy.world
    link
    fedilink
    English
    arrow-up
    48
    arrow-down
    4
    ·
    edit-2
    6 months ago

    So if a car maker releases a car model that randomly turns abruptly to the left for no apparent reason, you simply say “I can’t fix it, deal with it”? No, you pull it out of the market, try to fix it and, if this it is not possible, then you retire the model before it kills anyone.