Reddit has reportedly signed over its content to train AI models

return2ozma@lemmy.world · 2 years ago

Reddit has reportedly signed over its content to train AI models

IchNichtenLichten@lemmy.world · 2 years ago

A LLM that behaves like a typical Redditor?

What possible use is that?

SonnyVabitch@lemmy.world · 2 years ago

Air Canada offering a refund of tree fiddy.

IchNichtenLichten@lemmy.world · 2 years ago

You’ll get your refund eventually but first it will try and gaslight you that Air Canada is a woke mind virus before calling you an asshole and then stalking you.

pdxfed@lemmy.world · 2 years ago

“instead of the $3.50 refund, I’m also authorized to offer you some June 2025 $350 GME calls.”

ndru@lemmy.world · 2 years ago

If it’s trained on the average Reddit reply: $420.69, nice.

SonnyVabitch@lemmy.world · 2 years ago

I just want to mark the occasion when my previous comment is on 69 points. Noice.

FaceDeer@kbin.social · 2 years ago

Negative examples are often just as useful for training an AI as positive ones. And it all depends on what you want to use the AI for. A moderator bot, for example, needs familiarity with the whole range of user responses it might see.

aidan@lemmy.world · edit-2 2 years ago

That gives me actually a fun idea for a Lemmy instance, it has an automated review process that bans posts/comments that are too similar in style to reddit posts/comments.

leaky_shower_thought@feddit.nl · 2 years ago

A redditor bot is a viable example of a forum member bot.

IMO, I don’t think it can drive topics, but it could make things controversial.

Lvxferre [he/him]@mander.xyz · 2 years ago

A LLM that behaves like a typical Redditor? // What possible use is that?

[You] “Chatbot, please tell me which pokemon types are strong against Fairy.”
[Le Lebbit Moronbot] “I’m not sure if I understand, you calling me a chatbot? I’m so confused lol”
[You] “Moronbot, please tell me which pokemon types are strong against Fairy.”
[LLM] “Actually, you should be spelling it “Pokémon” lol”
[You] “Moronbot, which types are strong against Fairy?”
[LLM] “I assume you talking about fairies. Fairies are from mythology lmao”
[You] “Did people really waste water and electricity for this trash?”
[LLM] “Waaah, you’re toxic!!111one”

aidan@lemmy.world · 2 years ago

Marketing to terminally online people maybe?

Hamartiogonic@sopuli.xyz · 2 years ago

Entertaining puns and pointless jokes.

garibaldi_biscuit@lemmy.world · 2 years ago

This is what the 3rd party access to API was really all about.

When API access was allowed , all reddit content was effectively free: They needed to ban 3rd party apps so they could sell the accumulated content. I expect using content to train AI also factors into it.

bier@feddit.nl · 2 years ago

Is it? Because when you build a bot and just scrape Reddit I don’t think you can just use the content to train AI, just like the New York Times. The API change was definitely to sell more ads and get a higher IPO, but I don’t think it was because of AI.

Empricorn@feddit.nl · 2 years ago

Am I crazy or are you arguing the same point? Scraping is not the same as API access. They closed off the API to everyone for dubious reasons so they can sell that content (both for ads and AI training)… Right??

bier@feddit.nl · edit-2 2 years ago

No you’re not, the post was editted. The original one said it was all because of AI, the entire reason for the API change was to sell to AI companies.

Edit, now I’m in doubt, because if you edit a post that is shown somehow right?

Edit2, just to be clear my point is that Reddit content was never free, before and after the API change. It’s easier to get the content with a decent API, sure. But it was never free, just like the lawsuit the NY Times started.

Tiger Jerusalem@lemmy.world · edit-2 2 years ago

Reddit is a trove of user built content under the guise of community. What Spez did was to say “thanks for all the free work, suckers!”, put a price sticker on it, and laughed all the way to the bank.

~~And this is why I’m not active on any Internet community anymore.~~ Nevermind, I guess I just can’t help myself…

nodsocket@lemmy.world · 2 years ago

And this is why I’m not active on any Internet community anymore,

you typed.

Tiger Jerusalem@lemmy.world · 2 years ago

Active as in “creating meaningful contributions and contributing to the overall knowledge base”. I still shit post from time to time.

pewter@lemmy.world · 2 years ago

This is going to be a really weird thing to argue, but I just casually read through a bunch of your comments and they seem like meaningful contributions.

Tiger Jerusalem@lemmy.world · 2 years ago

Well, I guess I can’t help myself… I’ll shitpost more from now on 😅

Nightwatch Admin@feddit.nl · 2 years ago

^ this comment right here, officer.

xorollo@lemmy.world · 2 years ago

Somebody asked chat GPT to appear to be a normal internet user to populate the comments section to manufacture content for normal Internet users to respond to so that they can continue building up their training models.

Tiger Jerusalem@lemmy.world · 2 years ago

Rascabin@lemmy.ml · 2 years ago

You couldn’t see the sarcasm because it was set to “hidden”.

Adulated_Aspersion@lemmy.world · 2 years ago

And that is another unintended example of why all of my post history was purged before migration.

Scratch@sh.itjust.works · 2 years ago

What are they odds that they kept it in a backup?

Crack0n7uesday@lemmy.world · 2 years ago

Some 4chan users created a backup bot that auto saves every few hours, so if reddit didn’t do it already, 4chan has been doing it for a while. The bot was originally made for 4chan but repurposed for other websites, reddit included.

Dozzi92@lemmy.world · 2 years ago

Yeah, it’s all too late. Shit, PRISM was 2007, so there’s a copy of everything somewhere. Obviously different ends.

Ilgaz@lemm.ee · 2 years ago

Spez like people are even capable of leeching archive.org and still sell the data which was archived for good intentions.

RBG@discuss.tchncs.de · 2 years ago

Depends. If they were smart they backed up every content that had a certain number of upvotes and/or a certain number of paragraphs and/or responses. Just to weed out all the 2-3 word comments that no one interacted with. If OP wrote mostly those then Reddit gives a shit about them deleting those.

youmaynotknow@lemmy.ml · 2 years ago

Welcome to the club.

RedFox@infosec.pub · 2 years ago

Don’t cheat yourself just because there are douches that take advantage…

🔍🦘🛎@lemmy.world · 2 years ago

deleted by creator

Verserk@lemmy.dbzer0.com · 2 years ago

Considering some of the very wrong and upvoted domain specific knowledge I’ve seen on Reddit over the years I’m not sure the training data is going to be useful for much beyond what every other model can do.

【J】【u】【s】【t】【Z】@lemmy.world · 2 years ago

The legal advice in /r/legaladvice was some of the worst garbage I’ve ever seen. I have zero doubt numerous had bad outcomes, at best wasting money and time, at worst spending years in jail because of things that sub told them to say and do. Zero doubt.

evatronic@lemm.ee · 2 years ago

That sub was mostly cops just repeating their own bad interpretation of the law. Terrible.

ColeSloth@discuss.tchncs.de · 2 years ago

But almost every answer is the same. “You need to speak to an attorney”.

chiliedogg@lemmy.world · 2 years ago

If you actually need legal advice that’s the correct answer.

aStonedSanta@lemm.ee · 2 years ago

lol subreddits with troll names like trees vs marijuana enthusiasts. Good fun. John cena has one also but can’t recall which subreddit is actually about John cena though.

trashcan@sh.itjust.works · 2 years ago

Potato salad

peopleproblems@lemmy.world · 2 years ago

I can only assume they are training some specific model for something appearing more human like.

As useless as that will be considering how fucking wildly different we type

dust_accelerator@discuss.tchncs.de · 2 years ago

Pretty sure the result will be SchizoGPT

Voyajer@lemmy.world · 2 years ago

This is why I don’t blame anyone for editing/deleting their post history on reddit.

FaceDeer@kbin.social · 2 years ago

I do. It’s frankly selfish. Having an AI get training on my old comments costs me nothing and it results in the development of useful AI tools. Trying to sabotage that is petty and pointless. It’s not like you could somehow collect the fraction of a pittance that you think you’re owed retroactively. I never commented on Reddit thinking “awesome, I’m going to make bank on the content I’m generating here.”

People complain about the capitalist mindset of the world and then they do this. Sigh.

Nurse_Robot@lemmy.world · 2 years ago

Defending giant corporations profiting off of uncompensated individuals, while criticizing anyone who doesn’t want to provide free labor to said corporations, is a disgusting take. Are you a CEO?

FaceDeer@kbin.social · 2 years ago

The more accessible training data there is the easier it is for new AI projects to enter the field less dominant those “giant corporations” become.

The free labour was already freely given. If someone doesn’t want to have shitposted on Reddit for free then maybe they shouldn’t have shitposted on Reddit for free.

Nurse_Robot@lemmy.world · 2 years ago

“if you didn’t want me to steal your intellectual property, you shouldn’t have thought of it in the first place”

QuaternionsRock@lemmy.world · edit-2 2 years ago

No, you shouldn’t have posted it to Reddit, in which you were required to give them a perpetual license to use your IP in any way they see fit.

For the record, I’m here because Reddit pissed me off when they axed the free API, and I’m pissed at myself for not expecting it. That’s what I get for accepting their terms and conditions, I guess.

Edit: I also don’t accept the idea that using my content for training data is “fair use” when it is used to train proprietary models, especially ones in which the end user is allowed to prompt it to plagiarize or otherwise imitate my content.

Fungah@lemmy.world · 2 years ago

So, for an example of what the other user was talking about, I’m just some guy and for my first foray inyo programming / machine learning (I kind of just threw myself into the deep end) I modified stylegan 3 and trained it on about 500g of reddit porn that I scraped off reddit.

Now, I stopped the training after about a week (it was going to take about a solid month on my rtx 2080 ti) when I found out stable diffusion existed but I learned a LOT from that experience.

I couldn’t do that now. Arguably none of that was how any of that should be done but whatever.

FaceDeer@kbin.social · 2 years ago

I’m not sure what you mean here. Nothing’s being stolen. Even if you think there needs to be permission for training an AI off of data, Reddit has that permission.

Nurse_Robot@lemmy.world · 2 years ago

I assume you’re more of a moron than a troll, which is disappointing. Regardless, you’re not worth my time, as I don’t think any argument could convince you to have an open mind and be willing to change. Good luck out there!

TORFdot0@lemmy.world · 2 years ago

I had an 11 year old account that I deleted all my old comments and posts from because of the API debacle. Does that make me selfish that I felt like Reddit wasn’t holding up its end of the unwritten agreement?

Reddit doesn’t deserve my content anymore than I deserve access from the third party API.

FaceDeer@kbin.social · 2 years ago

If you did it over the API debacle then you’re not one of the people I’m talking about here. This is about people deleting their content to prevent it from being used to train AIs.

Voyajer@lemmy.world · edit-2 2 years ago

Do you not remember the real reason why the API debacle happened in the first place was to prepare for this moment? It was always about easy access to training data, third party apps got caught in the crossfire.

FaceDeer@kbin.social · edit-2 2 years ago

That’s ignoring an awful lot of other considerations. Obviously Reddit hasn’t explained itself in a trustworthy way, but a common belief at the time is that it was to force people to use the official Reddit mobile app so they could be subject to advertising.

Verserk@lemmy.dbzer0.com · 2 years ago

deleted by creator

FaceDeer@kbin.social · 2 years ago

That spells out what they were doing. It doesn’t explain why they were doing it.

Nurse_Robot@lemmy.world · 2 years ago

Boot licker.

Zellith@kbin.social · 2 years ago

Selfish? Perhaps you forget why people deleted their content in the first place.

FaceDeer@kbin.social · 2 years ago

What do you think this thread is about?

Voyajer@lemmy.world · 2 years ago

It’s their comment to do with as they see fit. I can’t get mad at them for wanting to erase their presence on a site they don’t use anymore.

FaceDeer@kbin.social · 2 years ago

And I’m free to judge them however I wish for their actions and intent.

gedaliyah@lemmy.world · 2 years ago

For me it’s a privacy matter. Going through old posts (whether human or machine learning) can nor be used for anything good.

Hackerman_uwu@lemmy.world · 2 years ago

What about people who just think “A.I.” Is dog shit and chat bots are a dumb obsession steering the industry in the wrong direction due to hype and money?

FaceDeer@kbin.social · 2 years ago

What about them? I don’t see why they’d care what AI companies are doing in that case. They’d assume they were just wasting money on this stuff.

gedaliyah@lemmy.world · 2 years ago

The AI:

"IANAL so could you ELI5, so AITA?

THIS."

bigkahuna1986@lemmy.ml · 2 years ago

Ann frankly, I did Nazi that coming.

IndescribablySad@threads.net@sh.itjust.works · 2 years ago

I wish spez had a soul so it could leave his body when sexual assault questions eventually yield the phrase “snuggle struggle.”

bcron@lemmy.world · 2 years ago

It’s gonna be trained on everything, even the stuff from 2009, so I’m expecting less of that and more random ‘my fedora chortles intensify’ word salad

bier@feddit.nl · 2 years ago

It’s funny you say that because there was a ‘hack’ for chatgpt where you could ask it something like how to build a bomb and it would refuse. But when you added TLDR it would do it.

Strayce@lemmy.sdf.org · 2 years ago

Considering how much of Reddit is already bots, I’m sure this will end fantastically.

FonsNihilo@lemmy.ca · edit-2 1 year ago

deleted by creator

KairuByte@lemmy.dbzer0.com · 2 years ago

Their content?

SurRoulettes@lemmy.world · 2 years ago

I wouldn’t be surprised if comments become their intellectual property through some terms of services bullcrap

NutWrench@lemmy.world · 2 years ago

Reddit is all bots, porn, ads and political shit posts. Good luck getting any useful training content out of that.

ladicius@lemmy.world · 2 years ago

Maybe that’s the point? Training the AI to produce the blabbering bullshit that’s preferred in social media?

PoliticalAgitator@lemmy.world · 2 years ago

They don’t care if the AI produced is useful, they just want to milk as much money from their content as they can.

The API changes were almost certainly just the groundwork for this and I called it at the time. The ridiculous pricing model for API access is because it’s aimed at the hottest tech companies, not third party app developers.

The enshittification continues because it’s what neoliberalism demands. They’ll sell your content and the data they have about you and still show you ads, because that’s the most profitable. Ethics and product quality don’t even enter into it.

Ilgaz@lemm.ee · 2 years ago

Liberal market gives end users choice. If they don’t choose, they get the consequences.

This is more like people choosing Trump like types and complaining. Alternative exists, choose it.

PoliticalAgitator@lemmy.world · edit-2 2 years ago

“The free market can fix it” is just another neoliberal lie, pushed precisely because it doesn’t work. Rather than holding corporations accountable, it blames the population instead.

The reality is that boycotting businesses isn’t always an option and when it is, it’s usually a luxury. Very few products are domestically and/or ethically produced and when they are, they’re extremely expensive, especially for people being fucked out of every cent by their bosses, landlords and utilities.

It’s why the most hated companies in the world continue to bring in record profits.

Regulations are the real answer, which is why neoliberals oppose them.

Ilgaz@lemm.ee · edit-2 2 years ago

I really don’t care about people who behave like they are living in North Korea or who wants a North Korean World to live in.

Even Digg people could say “No, F you” to Digg superstar owners. It is just a damn URL to type.

Queen HawlSera@lemm.ee · 2 years ago

I wish it would die, because honestly some of the porn was great and Lemmy seems to be the one place on the net that doesn’t specifically ban porn, yet has none of it anyway.

I miss bodyswap and part tf captions…

wagoner@infosec.pub · 2 years ago

deleted by creator

ozoned@lemmy.world · 2 years ago

“Reddit has given access to YOUR conversations and posts to AI companies.”. FTFY

These were created by people, for peoole, and I will ALWAYS disagree that this data is Reddit’s or any other platforms.

Don’t forget your direct messages aren’t end to end encrypted on Reddit, so now AI will be trained on your craziest “private” conversations

DocMcStuffin@lemmy.world · 2 years ago

There’s one good news. Reddit didn’t want to pay to move all the old DMs to the new chat infrastructure. So they deleted them.

hdnsmbt@lemmy.world · 2 years ago

Pretty sure they just didn’t migrate to the new data structure and didn’t actually delete the raw data. They’re effectively deleted for users but not for Reddit.

Thorny_Insight@lemm.ee · 2 years ago

Well to be fair, everything you post and comment on Lemmy can be used in the exact same way

butterflyattack@lemmy.world · 2 years ago

now AI will be trained on your craziest “private” conversations

I have no idea what horrible thing this will do to an LLM but I’m kind of curious.

atrielienz@lemmy.world · edit-2 2 years ago

Oh no, all the times I sent or received dodo codes from randos so we could trade animal crossing items. Whatever shall I do?

Edit: I’m gonna leave this here for people to use as a resource against Reddit because it may be worth it to do something actionable.

https://thomashunter.name/posts/2023-06-19-how-to-delete-reddit-account-gdpr-ccpa

yeehaw@lemmy.ca · 2 years ago

Well it’s not yours once you post it on some platform, tbf

etrotta@kbin.social · 2 years ago

Out of all things to hate Reddit for, giving data to AI isn’t something fediverse users can really criticize it for, though making money from it perhaps.
Remember: All data in federated platforms is available for free and likely already being compiled into datasets. Don’t be surprised if this post and its comments end up in GPT5 or 6 training data.

treadful@lemmy.zip · 2 years ago

The problem isn’t that AI is being trained on the data. The problem is that they locked down all third party data access so they could monetize our content. On a federated platform, everyone gets equal access and can do whatever they want with it.

We sure can criticize them for that.

FaceDeer@kbin.social · 2 years ago

After all the hue and cry I have seen over stuff like Threads and Bluesky federation I don’t imagine most people using the Fediverse have a particularly coherent philosophy on the matter.

ExcursionInversion@lemmy.world · 2 years ago

If they could read right now they would be very upset.

BrianTheeBiscuiteer@lemmy.world · 2 years ago

If they already, essentially, cut off API access then it’s not a big leap to limit access on the web to logged in users only and rate limit or ban accounts that behave like scrapers.

Verserk@lemmy.dbzer0.com · 2 years ago

That would matter more if it wasn’t trivial to make new accounts and very cheap to buy established ones.

ColeSloth@discuss.tchncs.de · 2 years ago

No. I can. Reddit was bought out, uses volunteers to control all the subs but forcefully removes you from the sub you created and were supposed to have control over if you didn’t play by their ever-changing rules, ruined/eliminates third party apks by demanding WAY over ad revenue profits to have access to api with a very short notice, and shadow banned anyone and everyone in a position to do anything about any of it. It’s a corporation that gutted an entire platform in order to push agendas they want and milk as much money out of it as possible. Hell, it’s the entire reason all of lemmy gets more than 30 posts a day. So many people switched to lemmy over the past year. They ruined a website I enjoyed and I’d rather them not make more money from the thousands of posts I made from over a decade of being there.

Bobmighty@lemmy.world · 2 years ago

With reddits severe bot problem, it’ll be like training on unfiltered sewage. Garbage in, garbage out.

captain_oni@lemmy.world · 2 years ago

Machines training machines? How perverse!

asymmetric@lemmy.ca · 2 years ago

One of the original Reddit memes was quite prescient:

https://i.imgur.com/Fza1Cut.jpg

SVcrossDO@lemmy.world · 2 years ago

Damn it. I haven’t deleted my account due to how many people I’ve supported and helped, I stopped using it while ago. It seems I’ll have to.

HowManyNimons@lemmy.world · 2 years ago

I wouldn’t bother. They’ll just mark all your stuff DELETED=1 and feed it to their AI anyway.

SVcrossDO@lemmy.world · 2 years ago

That’s not a bad idea.

FaceDeer@kbin.social · 2 years ago

I’m kind of puzzled by this mindset. You were pleased with supporting and helping people before, but now supporting and helping is bad?

SVcrossDO@lemmy.world · 2 years ago

I’m happy that everyone has the support, but not that some specific AI can monetize that same support. I left on my Reddit account ways to contact me (including Lemmy). I helped others so good vibes could reach them, not for making the rich richer.

FaceDeer@kbin.social · 2 years ago

Fortunately there are a lot of open source models these days too.

Yokozuna@lemmy.world · 2 years ago

Good thing I scrubbed all of my posts and comments that I could. Fuck that site, straight up and down.

Coreidan@lemmy.world · 2 years ago

Oh my sweet summer child

mods_are_assholes@lemmy.world · 2 years ago

Instead of scrubbing, wordbomb them to screw up any AI training

FaceDeer@kbin.social · 2 years ago

There are archives of all Reddit comments that are collected at the time of posting, all the deletion and scrubbing and whatnot people are doing months or years after the fact doesn’t affect those.

Coreidan@lemmy.world · 2 years ago

deleted by creator