TL;DR: The big tech AI company LLMs have gobbled up all of our data, but the damage they have done to open source and free culture communities are particularly insidious. By taking advantage of those who share freely, they destroy the bargain that made free software spread like wildfire.

  • melfie@lemy.lol
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    2
    ·
    edit-2
    1 day ago

    The LLMs are not distributing the GPL code, their weights are being trained on it. You can’t just have Copilot pump out something that works like the Linux kerne or Blender, except with different code that isn’t subject to the GPL license. At best, the AI can learn from it and assist humans with developing a proprietary alternative. In that case, it’s not really that much better than having humans study a GPL codebase and make a proprietary alternative without AI. It’s still going to cost a lot of money to replicate the thing no matter what, so why not just save money and use the GPL code and contribute back? Also, it’s going to be hard to sell your proprietary alternative, because why wouldn’t people just use the FOSS version?

    • yoasif@fedia.ioOP
      link
      fedilink
      arrow-up
      4
      ·
      1 day ago

      You can’t “train” on code you haven’t copied. That is kind of obvious, right? So did they have the right to copy and then reproduce the work without attribution?

      • melfie@lemy.lol
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 day ago

        Yeah, I guess this is a bit of gray area. With GPL, you only have rights to code if it was distributed to you. In the case of GPL code that has only been distributed to select people and none of those people distributed it to the general public, but GitHub still trained their models on the private repo, then that would technically be in violation of the license. This would be a more niche scenario, though, since the intent normally is public distribution.