Those claiming AI training on copyrighted works is “theft” misunderstand key aspects of copyright law and AI technology. Copyright protects specific expressions of ideas, not the ideas themselves. When AI systems ingest copyrighted works, they’re extracting general patterns and concepts - the “Bob Dylan-ness” or “Hemingway-ness” - not copying specific text or images.

This process is akin to how humans learn by reading widely and absorbing styles and techniques, rather than memorizing and reproducing exact passages. The AI discards the original text, keeping only abstract representations in “vector space”. When generating new content, the AI isn’t recreating copyrighted works, but producing new expressions inspired by the concepts it’s learned.

This is fundamentally different from copying a book or song. It’s more like the long-standing artistic tradition of being influenced by others’ work. The law has always recognized that ideas themselves can’t be owned - only particular expressions of them.

Moreover, there’s precedent for this kind of use being considered “transformative” and thus fair use. The Google Books project, which scanned millions of books to create a searchable index, was ruled legal despite protests from authors and publishers. AI training is arguably even more transformative.

While it’s understandable that creators feel uneasy about this new technology, labeling it “theft” is both legally and technically inaccurate. We may need new ways to support and compensate creators in the AI age, but that doesn’t make the current use of copyrighted works for AI training illegal or unethical.

For those interested, this argument is nicely laid out by Damien Riehl in FLOSS Weekly episode 744. https://twit.tv/shows/floss-weekly/episodes/744

  • FatCat@lemmy.worldOP
    link
    fedilink
    English
    arrow-up
    4
    arrow-down
    2
    ·
    4 months ago

    It’s funny you mention the Katy Perry chord case, because Damien Riehl, who made the argument I referenced in my original post, actually talked about this exact case in the podcast I mentioned. He noted that Katy Perry was initially sued and a jury awarded $2.8 million over a very simple melody that appeared over 8,000 times in Riehl’s dataset of generated melodies. However, after Riehl gave his TED talk about his “All the Music” project in early 2020, the judge reversed the jury verdict, saying the melody was unoriginal and therefore uncopyrightable.

    • Capricorn_Geriatric@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      4 months ago

      Agreed.

      I didn’t listen to the podcast so I wouldn’t know, but honestly, she was lucky. She’s popular and her publishers had an interest in the case (they’d lose out on profits if she lost). And she initially did lose. It was only because of the publicity of the case that it was overruled (although money did help as well).

      Unfortunately, this could’ve happened to any smaller artist, and it routinely happens with patent trolls I pointed to. Unfortunately, I don’t have a lawsuit I can point to, but given the volume, one surely exists.

      Also, it’s not as if I approve of the current state of copyright in the US (or EU for that matter).

      Originally copyright was meant to protect rights of the author, but in time it was bastardised into the concept we have today where artist sign off their rights to publishers.

      So my proposal is - if corporations like copyright, let them have it. I won’t watch Disney movies outside of Disney+ ors the system we’ve got and have to live with, why not let the corporatios feel it as well?

      Why would Google, which makes loads of money from those demonetizations on one side of the law now be allowed to use copyrighted works of others for profit, while Internet users in the US get a fine or their service cut for alleged copright infringement while those in Germany get a stern letter with a big fake fine?

      Big Tech shouldn’t get to profit both from the false copyright infringement claims as well as getting to use the actual copyrighted content to generate a profit.

      This whole AI copyright situation is just a symptom of an ailing global copyright policy that needs to be fixed, and slapping an AI-free-for-all band-aid on top isn’t a fix.

      My train of thought is this: If we don’t let a simple AI exceotion into the books, either training AI on copyrighted content stays illegal, or the entire system gets a reimagining.

      If it stays the same, this will not mean much. Piracy sites and torrenting exists despite the current state of copyright law. I don’t see why AI could’t exist in this way. This has the huge plus of keeping AI outside the hands of Big Tech. Hopefully this also means it’s harder for harmful uses of AI to be legal.

      Alternatively, we get a better copyright system for everyone, assuming it isn’t made to only benefit the corporations.