• QuadratureSurfer@lemmy.world
      link
      fedilink
      English
      arrow-up
      60
      arrow-down
      2
      ·
      1 day ago

      To anyone who is reading this comment without reading through the article. This ruling doesn’t mean that it’s okay to pirate for building a model. Anthropic will still need to go through trial for that:

      But he rejected Anthropic’s request to dismiss the case, ruling the firm would have to stand trial over its use of pirated copies to build its library of material.

      • Artisian@lemmy.world
        link
        fedilink
        English
        arrow-up
        18
        ·
        edit-2
        1 day ago

        I also read through the judgement, and I think it’s better for anthropic than you describe. He distinguishes three issues:

        A) Use any written material they get their hands on to train the model (and the resulting model doesn’t just reproduce the works).

        B) Buy a single copy of a print book, scan it, and retain the digital copy for a company library (for all sorts of future purposes).

        C) Pirate a book and retain that copy for a company library (for all sorts of future purposes).

        A and B were fair use by summary judgement. Meaning this judge thinks it’s clear cut in anthropics favor. C will go to trial.

  • BlameTheAntifa@lemmy.world
    link
    fedilink
    English
    arrow-up
    12
    ·
    edit-2
    1 day ago

    Anakin: “Judge backs AI firm over use of copyrighted books”
    Padme: “But they’ll be held accountable when they reproduce parts of those works or compete with the work they were trained on, right?”
    Anakin: “…”
    Padme: “Right?”

  • the_q@lemmy.zip
    link
    fedilink
    English
    arrow-up
    52
    arrow-down
    12
    ·
    1 day ago

    An 80 year old judge on their best day couldn’t be trusted to make an informed decision. This guy was either bought or confused into his decision. Old people gotta go.

        • FaceDeer@fedia.io
          link
          fedilink
          arrow-up
          13
          arrow-down
          1
          ·
          1 day ago

          Is it this?

          First, Authors argue that using works to train Claude’s underlying LLMs was like using works to train any person to read and write, so Authors should be able to exclude Anthropic from this use (Opp. 16).

          That’s the judge addressing an argument that the Authors made. If anyone made a “false equivalence” here it’s the plaintiffs, the judge is simply saying “okay, let’s assume their claim is true.” As is the usual case for a preliminary judgment like this.

          • MeaanBeaan@lemmy.world
            link
            fedilink
            English
            arrow-up
            3
            ·
            23 hours ago

            Wait, the authors argued that? Why? That’s literally the opposite of the thing they needed to argue.

          • ag10n@lemmy.world
            link
            fedilink
            English
            arrow-up
            6
            arrow-down
            11
            ·
            1 day ago

            Page 6 the judge writes the LLM “memorized” the content and could “recite” it.

            Neither is true in training or use of LLMs

            • FaceDeer@fedia.io
              link
              fedilink
              arrow-up
              12
              arrow-down
              1
              ·
              1 day ago

              The judge writes that the Authors told him that LLMs memorized the content and could recite it. He then said “for purposes of argument I’ll assume that’s true,” and even despite that he went ahead and ruled that LLM training does not violate copyright.

              It was perhaps a bit daring of Anthropic not to contest what the Authors claimed in that case, but as it turns out the result is an even stronger ruling. The judge gave the Authors every benefit of the doubt and still found that they had no case when it came to training.

            • Artisian@lemmy.world
              link
              fedilink
              English
              arrow-up
              3
              arrow-down
              2
              ·
              1 day ago

              Depends on the content and the method. There are tons of ways to encrypt data, and under relevant law they may still count as copies. There are certainly weaker NN models where we can extract a lot of the training data, even if it’s not easy, from the model parameters (even if we can’t find a prompt that gets the model to regurgitate).

  • AbouBenAdhem@lemmy.world
    link
    fedilink
    English
    arrow-up
    14
    arrow-down
    4
    ·
    1 day ago

    IMO the focus should have always been on the potential for AI to produce copyright-violating output, not on the method of training.

    • Sculptus Poe@lemmy.world
      link
      fedilink
      English
      arrow-up
      14
      arrow-down
      2
      ·
      edit-2
      1 day ago

      If you try to sell “the new adventures of Doctor Strange, Jonathan Strange and Magic Man.” existing copyright laws are sufficient and will stop it. Really, training should be regulated by the same laws as reading. If they can get the material through legitimate means it should be fine, but pulling data that is not freely accessible should be theft, as it is already.

      • Imgonnatrythis@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        6
        ·
        1 day ago

        I have a freely accessible document that I have a cc license for that states it is not to be used for commercial use. This is commercial use. Your policy would allow for that document to be used though since it is accessible. This kind of policy discourages me from easily sharing my works as others profit from my efforts and my works are more likely to be attributed to a corporate beast I want nothing to do with then to me.

        I’m all for copyright reform and simpler copyright law, but these companies need to be held to standard copyright rules and not just made up modifications. I’m convinced a perfectly decent LLM could be built without violating copyrights.

        I’d also be ok sharing works with a not for profit open source LLM and I think others might as well.

        • Sculptus Poe@lemmy.world
          link
          fedilink
          English
          arrow-up
          7
          arrow-down
          1
          ·
          edit-2
          1 day ago

          It means what it means, “freely” pulls its own weight. I didn’t say “readily” accessible. Torrents could be viewed as “readily” accessible but it couldn’t be viewed as “freely” accessible because at the very least you bear the guilt of theft. Library books are “freely” accessible, and if somehow the training involved checking out books and returning them digitally, it should be fine. If it is free to read into neurons it is free to read into neural systems. If payment for reading is expected then it isn’t free.

          • Womble@lemmy.world
            link
            fedilink
            English
            arrow-up
            5
            ·
            1 day ago

            Civil cases of copyright infringment are not theft, no matter what the MPIA have trained you to believe.

    • Artisian@lemmy.world
      link
      fedilink
      English
      arrow-up
      4
      ·
      edit-2
      1 day ago

      Plantifs made that argument and the judge shoots it down pretty hard. That competition isn’t what copyright protects from. He makes an analogy with teachers teaching children to write fiction: they are using existing fantasy to create MANY more competitors on the fiction market. Could an author use copyright to challenge that use?

      Would love to hear your thoughts on the ruling itself (it’s linked by reuters).

      • Cort@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        24 minutes ago

        Orcs and dwarves (with a v) are creations of Tolkien, if the fantasy stories include them, it’s a violation of copyright the same as including Mickey mouse.

        My argument would have been to ask the ai for the bass line to Queen & David Bowie’s Under Pressure. Then refer to that as a reproduction of copyrighted material. But then again, AI companies probably have better lawyers than vanilla ice.

  • Grimy@lemmy.world
    link
    fedilink
    English
    arrow-up
    25
    arrow-down
    25
    ·
    1 day ago

    80% of the book market is owned by 5 publishing houses.

    They want to create a monopoly around AI and kill open source. The copyright industry is not our friend. This is a win, not a loss.

        • Sentient Loom@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          2
          arrow-down
          3
          ·
          1 day ago

          used to train both commercial

          commercial training is, in this case, stealing people’s work for commercial gain

          and open source language models

          so, uh, let us train open-source models on open-source text. There’s so much of it that there’s no need to steal.

          ?

          I’m not sure why you added a question mark at the end of your statement.

          • gaylord_fartmaster@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            arrow-down
            2
            ·
            1 day ago

            I’m not sure why you added a question mark at the end of your statement.

            I was questioning whether or not you would see that as a benefit. Clearly you don’t.

            Are you also against libraries letting people borrow books since those are also lost sales for the authors, or are you just a luddite?

            • Sentient Loom@sh.itjust.works
              link
              fedilink
              English
              arrow-up
              3
              ·
              1 day ago

              libraries letting people borrow books

              This is so far from analogous that it’s almost a nonsequitur.

              are you just a luddite?

              No, and you don’t even believe such nonsense. You’re grasping, ineffectively.

    • SonOfAntenora@lemmy.world
      link
      fedilink
      English
      arrow-up
      8
      arrow-down
      2
      ·
      edit-2
      1 day ago

      Cool than, try to do some torrenting out there and don’t hide that. Tell us how it goes.

      The rules don’t change. This just means AI overlords can do it, not that you can do it too

      • OfCourseNot@fedia.io
        link
        fedilink
        arrow-up
        5
        arrow-down
        1
        ·
        1 day ago

        I’ve been pirating since Napster, never have hidden shit. It’s usually not a crime, except in America it seems, to download content, or even share it freely. What is a crime is to make a business distributing pirated content.

        • SonOfAntenora@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          arrow-down
          1
          ·
          1 day ago

          I know but you see what they’re doing with ai, a small server used for piracy and sharing is punished, in some cases, worse than a theft. AI business are making bank (or are they? There is still no clear path to profitability) on troves pirated content. This (for small guys like us) is not going to change the situation. For instance, if we used the same dataset to train some AI in a garage and with no business or investor behind things would be different. We’re at a stage where AI is quite literally to important to fail for somebody out there. I’d argue that AI is, in fact going to be shielded for this reason regardless of previous legal outcomes.