In order to help train its AI models, Meta (and others) have been using pirated versions of copyrighted books, without the consent of authors or publishers. The company behind Facebook and Instagram faces an ongoing class-action lawsuit brought by authors including Richard Kadrey, Sarah Silverman, and Christopher Golden, and one in which it has already scored a major (and surprising) victory: The Californian court concluded last year that using pirated books to train its Llama LLM did qualify as fair use.

You’d think this case would be as open-and-shut as it gets, but never underestimate an army of high-priced lawyers. Meta has now come up with the striking defense that uploading pirated books to strangers via BitTorrent qualifies as fair use. It further goes on to claim that this is double good, because it has helped establish the United States’ leading position in the AI field.

Meta further argues that every author involved in the class-action has admitted they are unaware of any Llama LLM output that directly reproduces content from their books. It says if the authors cannot provide evidence of such infringing output or damage to sales, then this lawsuit is not about protecting their books but arguing against the training process itself (which the court has ruled is fair use).

Judge Vince Chhabria now has to decide whether to allow this defense, a decision that will have consequences for not only this but many other AI lawsuits involving things like shadow libraries. The BitTorrent uploading and distribution claims are the last element of this particular lawsuit, which has been rumbling on for three years now, to be settled.

  • Paranoidfactoid@lemmy.world
    link
    fedilink
    English
    arrow-up
    24
    ·
    2 hours ago

    So meta gets to claim fair use with pure digital duplication, but archive.org doesn’t when they scan physical copies of books and only lend out the same number of copies as they own in warehouses. That’s piracy.

    Got it.

  • Dr. Moose@lemmy.world
    link
    fedilink
    English
    arrow-up
    5
    ·
    2 hours ago

    Honestly I agree with Meta here but this should apply to everyone. I think most people here conflate their hate for Meta with the factual reality of intellectual property.

    • SpaceMan9000@lemmy.world
      link
      fedilink
      English
      arrow-up
      6
      ·
      46 minutes ago

      I can hate both.

      People can also hate the fact that if you have enough money you can make everything legal.

  • Archangel1313@lemmy.ca
    link
    fedilink
    English
    arrow-up
    68
    arrow-down
    1
    ·
    9 hours ago

    I absolutely love the fact that all these companies are laying the legal groundwork to destroy intellectual property rights altogether. If they win enough of these cases, then every pirate on the open seas sails under a flag of amnesty.

    • jabjoe@feddit.uk
      link
      fedilink
      English
      arrow-up
      6
      ·
      2 hours ago

      Not all IP is self surviving. Even CopyRight isn’t always a bad thing, if you think of small artists, for example. My fear is about CopyLeft mainly as I feel it’s been incredible successful in pushing forwards openness. The megacorps hating it, tells you it is doing its job. Only of the things they love about LLM and code is it can license wash away CopyLeft.

  • TheObviousSolution@lemmy.ca
    link
    fedilink
    English
    arrow-up
    20
    arrow-down
    1
    ·
    edit-2
    8 hours ago

    So we can pirate books as well as long as we aren’t able to reproduce them verbatim from memory as well?

    Judge Vince Chhabria either accepts whatever bribes and offers he’s probably getting offered and sides with Meta, or it will eventually go on to the Supreme Court where they most definitely will. That’s the part of this that will work the most under an administration of no accountability.

  • Iconoclast@feddit.uk
    link
    fedilink
    English
    arrow-up
    2
    arrow-down
    18
    ·
    3 hours ago

    I’m getting the feeling that the average Lemming is a pro-piracy advocate only for as long as it’s them financially benefiting from it but the script interestingly flips when a company they don’t like does the same thing.

    If money wasn’t an issue, there’s be no reason to pirate anything. It’s a financial decision. There’s no practical difference between earning fifty bucks and saving that much - in both cases you’re left with 50 more bucks to spend.

    • kossa@feddit.org
      link
      fedilink
      English
      arrow-up
      4
      ·
      2 hours ago

      I feel you have it the wrong way around. The “average Lemming” is pissed, because private piracy is prosecuted and punished while Meta’s is not.

      I, for once, couldn’t care less whether Meta pirates the shit out of all the books if I am allowed to do the same ¯\_(ツ)_/¯

    • ilinamorato@lemmy.world
      link
      fedilink
      English
      arrow-up
      16
      arrow-down
      1
      ·
      3 hours ago

      There’s a pretty big difference in scale, and the perpetrator, and whether or not they’re benefiting monetarily, and much more.

      • Iconoclast@feddit.uk
        link
        fedilink
        English
        arrow-up
        1
        arrow-down
        19
        ·
        3 hours ago

        Stealing is wrong whether it’s for personal or business use. Which one is more wrong is besides the point.

        • ilinamorato@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          2 hours ago

          It’s not really beside the point, from most reasonable perspectives. A multi-billion-dollar company enriching itself on the backs of starving authors so that it can go on enriching itself on the backs of its users is significantly different from a small number of comparatively destitute individuals stealing some temporary enjoyment for themselves. They are both wrong, but the discussion is utterly useless if you don’t talk about the harm involved and who benefits.

    • sonofearth@lemmy.world
      link
      fedilink
      English
      arrow-up
      5
      ·
      3 hours ago

      A person downloading a pirated copy of a book w/o any DRM for their own leisure use on their own device is different from a multi trillion dollar corporation who is using those books to train an LLM to make AI Slop and make money from it w/o even crediting the authors for their work.

      • Iconoclast@feddit.uk
        link
        fedilink
        English
        arrow-up
        1
        arrow-down
        6
        ·
        3 hours ago

        The difference is only in scale. Stealing is stealing independent of if it’s for personal use or not.

        • sonofearth@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          edit-2
          11 minutes ago

          Nothing is being stolen here. Just an illegal copy. Copy is made for varying reasons here and have different moral aspects.

        • CoolCat@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          2 hours ago

          Scale is not the only difference. The companies who do this end up making money with something trained on someone’s else’s work. If a regular Joe Shmoe pirates a book, they don’t earn anything with it.

    • Paranoidfactoid@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      2 hours ago

      Little guys who get caught get the book thrown at them. But oligarchs get to carve out a legal right to pirate for profit. It’s this disparity people are pissed off about here.

    • yabbadabaddon@lemmy.zip
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 hours ago

      I support piracy because I think:

      • consolidation ruins concurrency
      • I hate ads in services I pay for
      • copyright laws kills creativity
      • the USA is lobbying the entire world to adopt their view and fuck them
      • those laws are there to benefit big companies and not individuals
      • it is legal where I live because we pay state tax to have access to media
  • melfie@lemy.lol
    link
    fedilink
    English
    arrow-up
    16
    ·
    edit-2
    10 hours ago

    Looking forward to Jellyfin getting a LLM to train locally on movie preferences so everyone’s library is fair use. Wait, is this why LLMs are being shoehorned into everything? 🤔

  • Goodlucksil@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    34
    ·
    13 hours ago

    Classic “the end justifies the means” (bad) defense. If ISPs can send letter for torrenting, and Facebook torrented a lot, Facebook deserves a fair punishment.

  • ☂️-@lemmy.ml
    link
    fedilink
    English
    arrow-up
    8
    ·
    11 hours ago

    sure. thanks meta, anna’s archive will help me with my reading list, thanks.

  • ArbitraryValue@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    8
    ·
    12 hours ago

    We’re going to end up in a situation where whatever is necessary to train AI is permitted, and the main question is whether that will be through (re)interpretation of existing law or the passage of a new law.

    • ctrl_alt_esc@lemmy.ml
      link
      fedilink
      English
      arrow-up
      6
      ·
      12 hours ago

      Good thing I have a local model running that’s constantly learning, for precisely this reason

        • XLE@piefed.social
          link
          fedilink
          English
          arrow-up
          2
          ·
          10 hours ago

          If anything, this is proof you should be next in line for a large venture capital infusion!

  • ryathal@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    5
    arrow-down
    1
    ·
    12 hours ago

    Arguing that training models isn’t fair use us going to be a massive uphill battle, it’s basically reading the book but with a computer. It’s not actually a big deal to people, unless you hold the copyright to a ton of works and want to get a percentage of all the AI income these companies have made.

    Torrenting the books is likely absolutely copyright infringement, but that has relatively low payout compared to the money these companies are getting for their models. The training being fair use means that rights holders can’t try to take any money from the model’s use. The statutory limits for infringement even at per work levels aren’t significant compared to the legal cost of proving it happened.

    • OfCourseNot@fedia.io
      link
      fedilink
      arrow-up
      5
      arrow-down
      1
      ·
      11 hours ago

      There’s an argument to be made that it is, in fact, not ‘reading’. The training of the model could be considered a lossy compression of the data. And streaming movies in a lossy compression format is not fair use, is it?

      • ryathal@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        2
        ·
        9 hours ago

        The model doesn’t stream out anyone’s content though. The article mentions that the plaintiffs have provided no examples of a prompt that creates anything substantial.

        Streaming a lossy compression would generally be infringement, but there is definitely a point where it becomes not infringement if it’s lossy enough.

        What a model generally stores, is factual information that isn’t copyright in the first place. It’s storing word counts, sentence lengths, sentiment analysis, and so on.

      • Fatal@piefed.social
        link
        fedilink
        English
        arrow-up
        3
        ·
        10 hours ago

        It’s not the storage of the information that matters as much as the presentation. Google’s search index stores a huge amount of copyrighted material, even losslessly. But they only present small snippets at a time which is not considered copyright infringement. The question really is whether or not the information being presented by the models is in a format which is considered copyright infringement. So far, courts have not found that they are.

  • HaunchesTV@feddit.uk
    link
    fedilink
    English
    arrow-up
    2
    arrow-down
    1
    ·
    10 hours ago

    Just spitballing…

    If you were to train a model on just one book, as long as you don’t prompt it to create an exact copy (maybe just some indiscernible differences) then presumably that’s fair use.

    Then, since we know AI generated work can’t be copyrighted, does that essentially create a copyright-free version of the text which can be freely distributed?

  • Grimy@lemmy.world
    link
    fedilink
    English
    arrow-up
    3
    ·
    edit-2
    13 hours ago

    They didn’t say seeding is fair use, just inherently part of torrenting. Good thing Sarah Silverman has pc gamer there to pander for her.