• onslaught545@lemmy.zip
      link
      fedilink
      English
      arrow-up
      7
      arrow-down
      2
      ·
      10日前

      Not all LLMs are the same. You can absolutely take a neural network model and train it yourself on your own dataset that doesn’t violate copyright.

      • Mika@sopuli.xyz
        link
        fedilink
        English
        arrow-up
        8
        ·
        10日前

        I can almost guarantee that hundred billion params LLMs are not trained on that, and are trained on the whole web scraped to the furthest extent.

        The only sane and ethical solution going forward is to force to opensource all LLMs. Use the datasets generated by humanity - give back to humanity.

            • Mika@sopuli.xyz
              link
              fedilink
              English
              arrow-up
              2
              arrow-down
              1
              ·
              9日前

              Article directly complains about AI artwork. You know what LLM even means?

                • Mika@sopuli.xyz
                  link
                  fedilink
                  English
                  arrow-up
                  1
                  ·
                  9日前

                  Then you should provably know that image gen existed long before MLLMs and was already a menace to artists back then.

                  And that MLLM is generally a layered combo of lots of preexisting tools, where LLM is used as a medium that allows to attach OCR inputs and give more accurate instructions to image gen AI part.

        • Skullgrid@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          arrow-down
          7
          ·
          10日前

          The only sane and ethical solution going forward is to force to opensource all LLMs.

          Jesus fucking christ. There are SO GODDAMN MANY open source LLMs, even from fucking scumbags like facebook. I get that there’s subtleties to the argument on the ProAI vs AntiAI side, but you guys just screech and scream.

          https://github.com/eugeneyan/open-llms

          • Mika@sopuli.xyz
            link
            fedilink
            English
            arrow-up
            4
            ·
            10日前

            even meta

            Lol, ofc meta, they have the biggest bigdata out there, full of private data.

            Most of the opensources are recompilations of existing opensource LLMs.

            And the page you’ve listed is <10b mostly, bar LLMs with huge financing, and generally either copropate or Chinese behind them.

          • vrighter@discuss.tchncs.de
            link
            fedilink
            English
            arrow-up
            2
            ·
            8日前

            there are barely any. I can’t name a single one offhand. Open weights means absolutely nothing about the actual source of those weights.

      • Riskable@programming.dev
        link
        fedilink
        English
        arrow-up
        7
        arrow-down
        2
        ·
        10日前

        Training an AI is orthogonal to copyright since the process of training doesn’t involve distribution.

        You can train an AI with whatever TF you want without anyone’s consent. That’s perfectly legal fair use. It’s no different than if you copy a song from your PC to your phone.

        Copyright really only comes into play when someone uses an AI to distribute a derivative of someone’s copyrighted work. Even then, it’s really the end user that is even capable of doing such a thing by uploading the output of the AI somewhere.