Apparently, stealing other people’s work to create product for money is now “fair use” as according to OpenAI because they are “innovating” (stealing). Yeah. Move fast and break things, huh?

“Because copyright today covers virtually every sort of human expression—including blogposts, photographs, forum posts, scraps of software code, and government documents—it would be impossible to train today’s leading AI models without using copyrighted materials,” wrote OpenAI in the House of Lords submission.

OpenAI claimed that the authors in that lawsuit “misconceive[d] the scope of copyright, failing to take into account the limitations and exceptions (including fair use) that properly leave room for innovations like the large language models now at the forefront of artificial intelligence.”

  • Pete Hahnloser@beehaw.org
    link
    fedilink
    arrow-up
    18
    ·
    10 months ago

    Any reasonable person can reach the conclusion that something is wrong here.

    What I’m not seeing a lot of acknowledgement of is who really gets hurt by copyright infringement under the current U.S. scheme. (The quote is obviously directed toward the UK, but I’m reasonably certain a similar situation exists there.)

    Hint: It’s rarely the creators, who usually get paid once while their work continues to make money for others.

    Let’s say the New York Times wins its lawsuit. Do you really think the reporters who wrote the infringed-upon material will be getting royalty checks to be made whole?

    This is not OpenAI vs creatives. OK, on a basic level it is, but expecting no one to scrape blogs and forum posts rather goes against the idea of the open internet in the first place. We’ve all learned by now that what goes on the internet stays there, with attribution totally optional unless you have a legal department. What’s novel here is the scale of scraping, but I see some merit to the “transformational” fair-use defense given that the ingested content is not being reposted verbatim.

    This is corporations vs corporations. Framing it as millions of people missing out on what they’d have otherwise rightfully gotten is disingenuous.

    • lemmyvore@feddit.nl
      link
      fedilink
      English
      arrow-up
      11
      ·
      edit-2
      10 months ago

      This isn’t about scraping the internet. The internet is full of crap and the LLMs will add even more crap to it. It will shortly become exponentially harder to find the meaningful content on the internet.

      No, this is about dipping into high quality, curated content. OpenAI wants to be able to use all existing human artwork without paying anything for it, and then flood the world with cheap knockoff copies. It’s that simple.

      • towerful@programming.dev
        link
        fedilink
        arrow-up
        5
        ·
        10 months ago

        Shortly? It’s happening already. I notice it when using Google and Duckduckgo. There are always a few hits that are AI written blog spam word soup

        • lemmyvore@feddit.nl
          link
          fedilink
          English
          arrow-up
          4
          ·
          10 months ago

          Unfortunately you haven’t seen the full impact of LLMs yet. What you’re seeing now is stuff that’s already been going on for a decade. SEO content generators have been a thing for many years and used by everybody from small business owners to site chains pinching ad pennies.

          When the LLM crap will kick in you won’t see anything except their links. I wouldn’t be surprised if we’ll have to go back to 90s tech and use human-curated webrings and directories.

          • emptiestplace@lemmy.ml
            link
            fedilink
            arrow-up
            1
            ·
            10 months ago

            It’s especially amusing when you consider that it’s not even fully autonomous yet; we’re actively doing this to ourselves.

    • MudMan@kbin.social
      link
      fedilink
      arrow-up
      5
      ·
      10 months ago

      Yep. The effect of this as currently framed is that you get data ownership clauses in EULAs forever and only major data brokers like Google or Meta can afford to use this tech at all. It’s not even a new scenario, it already happened when those exact companies were pushing facial recognition and other big data tools.

      I agree that the basics of modern copyright don’t work great with ML in the mix (or with the Internet in the mix, while we’re at it), but people are leaning on the viral negativity to slip by very unwanted consequences before anybody can make a case for good use of the tech.