• TranquilTurbulence@lemmy.zip
    link
    fedilink
    English
    arrow-up
    29
    ·
    10 days ago

    Since basically all data is now contaminated, there’s no way to get massive amounts of clean data for training the next generation of LLMs. This should make it harder to develop them beyond the current level. If an LLMs wasn’t smart enough for you yet, there’s a pretty good chance that it won’t be in a long time.

    • artifex@piefed.social
      link
      fedilink
      English
      arrow-up
      11
      ·
      10 days ago

      Didn’t Elon breathlessly explain how the plan was to have Grok rewrite and expand on the current corpus of knowledge so that the next Grok could be trained on that “superior” dataset, which would forever rid it of the wokeness?

    • Xylight@lemdro.id
      link
      fedilink
      English
      arrow-up
      6
      ·
      edit-2
      10 days ago

      A lot of LLMs now use intentionally synthesized, or AI generated training data. It doesn’t seem to affect them too adversely.

    • Tollana1234567@lemmy.today
      link
      fedilink
      arrow-up
      4
      ·
      10 days ago

      law of diminishing returns, LLM train thier data on AI slop of LLM, that is trained other llm, all the way down to “normal human written slop”