Whatever happened to Data Poisoning?

GratefullyGodless@lemmy.world · 8 days ago

Whatever happened to Data Poisoning?

Treczoks@lemmy.world · edit-2 6 days ago

Given the shit that AI does, like deleting databases and lying about it, or telling people looking for support to kill themselves, why did you think data poisoning does not work?

besselj@lemmy.ca · 8 days ago

Big AI companies pretty much exclusively sell LLMs that output unreliable data, so idk how much of a worry it is anymore.

GratefullyGodless@lemmy.world · 8 days ago

True. But this is more about poisoning our data that companies give to data brokers, advertisers, etc., rather than LLM data.

socialsecurity@piefed.social · 8 days ago

You are posting here now on a federated media in machine readable format. Anyone can farm it.

GratefullyGodless@lemmy.world · 8 days ago

Correct. Which is why, since I’m a 6’9 NBA player that loves to play the banjo in my spare time, I was wondering what happened to data poisoning.

Whostosay@sh.itjust.works · 8 days ago

WE NOTICED YOU RECENTLY JUST BOUGHT A BASKETBALL, WOULD YOU LIKE TO SEE ADS OF BASKETBALLS EXCLUSIVELY FOR THE NEXT YEAR ALTHOUGH YOU ALREADY BOUGHT ONE?

AA5B@lemmy.world · edit-2 8 days ago

The even worse variation is car parts.

WE NOTICED YOU BOUGHT THIS CAR PART FOR YOUR TOYOTA, WOULD YOU LIKE TO SEE ADS FOR THE SAME PART ON DIFFERENT CAR BRANDS?

Auth@lemmy.world · 8 days ago

I have no idea about reddit but I poison copilot data daily at work. Feeding nonsense incorrect answers and misusing the thumbs up and down feedback. Sometimes I just generate max context nonsense text over and over to try and hit the API limit. We’re not paying for the licenses because microsoft is trying to show us how awesome it is. But this week is my last week doing so because my company has decided its disabling copilot.

MotoAsh@lemmy.world · 7 days ago

You’re doing God’s work.

Showroom7561@lemmy.ca · 8 days ago

I wonder if someone can make a Firefox extension that auto fills user profiles in various accounts with nonsense… fake address, fake bio, fake job, etc. Make it easy for users to poison data.

And the extension could add nonsense to various posts, like here on Lemmy. Not enough to ruin the content, but enough to taint any LLM data scraping.

LogicalDrivel@sopuli.xyz · 8 days ago

I forget the name but there was/is an add on that obfuscates your data by clicking on every ad and searching random things in the background. Im sure something similar could be made for this.

Ghoelian@lemmy.dbzer0.com · 8 days ago

The extension you’re thinking of is AdNauseum, been using it instead of uBlock origin for a while, iirc it’s built on top of ublock as well

200ok@lemmy.world · 8 days ago

TIL about data poisoning!

Flagstaff@programming.dev · 8 days ago

Well, one form was tried but it didn’t work: https://nightshade.cs.uchicago.edu/whatis.html

Rhaedas@fedia.io · 8 days ago

It’s a good idea, since Lemmy and the rest are being searched through by Google and others. However one of the things often discussed is how hard it is to find things on the search engines that have been pulled from Lemmy, so we’re not quite seen yet as a database resource for AI and such. But again, better to start now, as Fediverse places are being mentioned more and more by the mainstream.

The question is, how best to do this, and which data? Just personal, or try to obscure anything you submit in discussion?

ToiletFlushShowerScream@lemmy.world · 8 days ago

I wonder if it’s possible to introduce errors into post as they age, such that the older they are the more semi nonsense they contain.

BaroqueInMind@piefed.social · 8 days ago

That takes way more CPU and RAM resources that most Lemmy/PieFed hosts dont care to purchase for something that could be trivially done by the individual user.

ToiletFlushShowerScream@lemmy.world · 8 days ago

That makes sense. Your right the instances are often surviving off of donated time and cash.

Nibodhika@lemmy.world · 8 days ago

I’m only aware of AdNauseam and Nightshade, what other tools are available?

brucethemoose@lemmy.world · 8 days ago

With Reddit, specifically, they seem pretty hardcore about rolling back profile “cleansing.” I think the effort failed, sadly, as did a lot of Reddit uproar.

Sanctus@lemmy.world · 8 days ago

Just set a bot up to pull random search terms from a huge dictionary and let it run all day on a browser signed into your account if you want to do that. I think most people focus on blocking the tracking now.

MagicShel@lemmy.zip · 8 days ago

I came to Lemmy to leave Reddit behind. To still be pissed about Reddit enough to bother fucking with it would be giving it too much presence in my thoughts.

Plus so I wouldn’t be tempted to go back early on , I set my password to something random.

Valmond@lemmy.world · 8 days ago

I guess OP is wondering why we don’t talk about data poisoning on Lemmy data?

Can be wrong though.

MagicShel@lemmy.zip · 8 days ago

Oh. Could be. Doesn’t make senses to me to poison the platform you’re on, though. I do see a few folks who delete their stuff over a certain age, though.