Two authors are suing OpenAI for training ChatGPT with their books. Could they win?

sabbah@lemmy.world · 2 years ago

Two authors are suing OpenAI for training ChatGPT with their books. Could they win?

Ragnell@kbin.social · edit-2 2 years ago

I think there’s an argument that using someone’s art or writing to train an AI is like charging for a screening of a movie in your garage. You’re using their work and labor for something that will make a profit without their permission. It’s not like Fair Use for educational purpose, the AI isn’t a human being who can make a choice as to what they do with their education, it’s a mathematical prediction engine that is going to be use for industry purposes.

I can read someone else’s book. I can read someone else’s book to a child. I can’t post someone else’s book on my website and charge 5 bucks to read it. I can’t reprint someone’s book on my website with ads. So why can someone use someone else’s book to develop an LLM chatboot that will be placed on a website that gains ad revenue? Or that will be sold to software companies to write technical instructions or code?

With that in mind, that the lawsuit here is based on COPYING the book to an internal database to train on, based on scanning it, they are arguing that the book was reproduced to gain a profit, basically the same thing as pirating a movie and selling tickets to a private screening.

Pamasich@kbin.social · edit-2 2 years ago

I can’t post someone else’s book on my website and charge 5 bucks to read it.

No, but you can read someone else’s book and then later write a book inspired by theirs and sell that.

Which is what ai does, as far as I know.

I’m not trying to argue with the rest of your comment, but that middle part looks like false equivalency to me. “I can do this but not that, so why would ai developers be allowed to do this completely different thing” just has no logic to it.

The AI isn’t redistributing copies of even sections of the book, it just learnt from it. It’s like when you read books and gain an understanding of how they are structured and such and then you write your own book based on what you’ve learnt from reading books.

Ragnell@kbin.social · 2 years ago

Also, screw it. I’ll say it. If the LLM chatbot producing text from having scanned other books is the same as a person being inspired by reading books, then the LLM should get PAID.

If not, then it’s just a tool. And it’s a tool they built using uncompensated labor.

Zerfallen@lemmy.world · 2 years ago

If i learn from the internet (or observation in the real world: public art, street fashion, design, language, etc) am i not allowed to use that knowledge in my job without compensating every source i had used to gather my knowledge? We remix information we have seen to create something new, and it looks like ChatGPT just does the same, not a full reproduction that replaces the market for the original/source.

Ragnell@kbin.social · edit-2 2 years ago

Does it learn the same? Then why can ChatGPT not discern truth from fiction? Why can’t it use critical thinking principles to determine accuracy based on source?

It’s just binary math at the bottom of it, logic gates. Your brain is analog, fundamentally different. You’re interpreting sine wave signals, the computer is interpreting square wave signals. Square wave signals that have been rectified to the point that it appears to a human being that it’s sine wave signals, but when we get down to the basics of how the mind works it’s a sheer cliff in the computer and a gentle curve on the human. Things go down VERY differently.

We do more than just predict the average best word based on what we’ve heard before when we construct a sentence. We consider the true meaning of the word and whether it best represents our internal thoughts. ChatGPT has no internal thoughts.

And that’s where things break down. Because again, if it WAS comparable to a human than it is a PERSON and not a product, NO ONE SHOULD BE SELLING IT in that case. But if it’s just a product, then it’s not comparable to you doing the work of forming a sentence. It’s basing it’s words by comparing to the training model as narrowed down by it’s instructions. It is not comparing to its own original thoughts. The people who wrote the words in the training model contributed to the building of this tool, and should have been consulted before their words were used.

Ragnell@kbin.social · edit-2 2 years ago

An LLM is mathematically calculating the probability of the words being used. That is not inspiration.

I said right in the comment, it’s not like using the book to educate a child. A child will grow up and make their own decisions. The LLM has no ability to choose a different life path. The LLM is not getting IDEAS from the book. The LLM is a mathematical engine that will produce what has been asked for, and it will do that by calculating the most likely words to be used based on what has been fed to it.

The LLM is a machine used to make profit for its programmer, it is not an independent person creating out of inspiration.

Don’t believe the hype. They have NOT produced actual Artificial Intelligence.

intensely_human@lemm.ee · 2 years ago

Are the AIs reprinting? Seems like they are quoting, and when there’s not verbatim content whatever’s coming out is a derivative work transformed by combination with the rest of the training set and the prompt.

Like, have we seen a chatbot post a passage of a story or textbook without it being in a context like “hey quote me some of that story or textbook”?

Dylpickles@lemmy.world · 2 years ago

It would be cool to see some kind of legal or practical protection creators can place on their work that would prevent AIs from being able to use them for training.

MajorHavoc@lemmy.world · 2 years ago

It exists. It is copyright. We just haven’t seen the ends of the current batch of lawsuits just yet.

trafficnab@kbin.social · 2 years ago

I feel like things created by AI are transformative enough that it’s hard to argue that the resultant works inherently infringe on any copyrights by the very nature of how they were created

Ragnell@kbin.social · 2 years ago

I really need you to read this: https://softwarecrisis.dev/letters/llmentalist/

snipgan@kbin.social · 2 years ago

I really think artists/authors/etc. are going about this the wrong way. ChatGPT and other trained models aren’t really the issue here. How the data is available and collected by other software and groups is.

What we should be really talking about is data privacy. Who can and how easily access one’s data they put on the internet.

tinwhiskers@kbin.social · edit-2 2 years ago

Well of course, putting it on the open internet is very intentionally making it available for everyone to see. If you don’t want everyone to see it, don’t put it on the open internet. The issue is what people do with it, not whether they can access it. Copyright forbids distributing copyrighted data. The entire point of that it is so that you can make it available to be seen but protected from people copying it. However, there is no distribution or storage of copyrighted material with an LLM - there is no copy. I think OpenAI will be OK, but these things are never certain when the big lawyers are let loose.

Distributing the training dataset, though, that could well be a problem.