ChatGPT is full of sensitive private information and spits out verbatim text from CNN, Goodreads, WordPress blogs, fandom wikis, Terms of Service agreements, Stack Overflow source code, Wikipedia pages, news blogs, random internet comments, and much more.



Every time you load a webpage you are making a local copy of it for your own use, if it is on the open web you are implicitly given permission to make a copy of it for your own use. You are not given permission to then distribute those copies which is where LLMs may get into trouble, but making a copy for the purpose of training is not a breach of copyright as far as I can understand or have heard.
Yes, you do make a copy of a web page. And every time you load a video game you make a copy into RAM. However, this copying is all permitted under user license - you’re allowed to make minor copies as part of the process of running the software and playing the media.
Case in point, the UK courts ruled that playing pirated games was illegal, because when you load the game from a disc you copy it into RAM, and this copying is not licensed by the player.
OpenAI does not have any license for copying into its database. The terms and conditions of web pages say you’re allowed to view them, not allowed to take the data and use it for things. They don’t explicitly prohibit this (yet), but the lack of a prohibition does not mean a license is implied. OpenAI can only hope for a fair use exemption, and I don’t think they qualify because a) it isn’t really “research” but product development, and even if it is research b) it is purely for commercial gain.
Could you point to the judgement on playing copied games was illegal in the UK? I can only find articles about specifically DS copy cartridges which are very obviously intended to make/use unlicensed copies of games to distribute.
Even so, that again hinges on right to distribute, not right to make a copy for personal use. If a game is made freely available on the web for you to play it is not illegal to download that game to play offline or study it.
I’d have to go digging, sorry I don’t have the time right now. It was to do with piracy on the OG X-Box. It wasn’t the main part of the case, just a tangential point inside the judge’s ruling.
Downloading a game to play it would be copyright infringement. Downloading involves making a copy on your device. However one copy really isn’t worth the hassle of claiming against, so it never happens. This is why all the Napster cases inflated the counts of infringement by including everyone you connected to as if you had uploaded a complete copy to them, that’s the only way to make the claim worthwhile. Also in the US uploading to someone carried punitive damages, similar cases didn’t work so well in the UK with actual damages.
Downloading it to study is fair use under the research exemption, particularly if it’s a non-commercial activity.
Copyright infringement happens all the time, but the vast majority of cases aren’t worth prosecuting, and there’s no penalty for a rights holder not to prosecute. Meanwhile, with Trademarks, the rights holder absolutely can lose their rights if they don’t prosecute every infringement they become aware of.