- cross-posted to:
- technology@lemmy.world
- cross-posted to:
- technology@lemmy.world
Burner accounts on social media sites can increasingly be analyzed to identify the pseudonymous users who post to them using AI in research that has far-reaching consequences for privacy on the Internet, researchers said.
The finding, from a recently published research paper, is based on results of experiments correlating specific individuals with accounts or posts across more than one social media platform. The success rate was far greater than existing classical deanonymization work that relied on humans assembling structured data sets suitable for algorithmic matching or manual work by skilled investigators. Recall—that is, how many users were successfully deanonymized—was as high as 68 percent. Precision—meaning the rate of guesses that correctly identify the user—was up to 90 percent.
The findings have the potential to upend pseudonymity, an imperfect but often sufficient privacy measure used by many people to post queries and participate in sometimes sensitive public discussions while making it hard for others to positively identify the speakers. The ability to cheaply and quickly identify the people behind such obscured accounts opens them up to doxxing, stalking, and the assembly of detailed marketing profiles that track where speakers live, what they do for a living, and other personal information. This pseudonymity measure no longer holds.
*if you’re fucking stupid and leak personal details across multiple accounts
you say this, but do I have to sacrifice being connected to online communities that are more local to my area? A huge privacy issue for me is just participating in online communities for my state and my city. I want to remain anonymous, but I also want to participate in these more local discussions. Just being subscribed to those communities narrows down their search by like 99%. Sure I could create a burner account to participate in those communities, but then I look like an astroturfing bot to other users because I don’t participate in any other conversations across reddit or lemmy or whatever.
How does one connect with their local community digitally without making a massive sacrifice to privacy? It feels unavoidable.
For community specific stuff, maybe use a separate account. That way, your anonymous accounts leak less. In jerboa for example, it’s easy to switch accounts. On PC, different accounts can be logged in on different instances.
Being subscribed to those communities (n a single website.
If people would get the fuck off Reddit and decide it was ok to have multiple websites to log into, it would be harder. Internet centralization is a personal security risk.
Humans: invents groundbreaking technologies to share information and freely associate, breaking down multiple societal barriers and creating genuine goodness in the world
Also humans: immediately make it awful and use it to singularly subjugate nearly every living person on earth
Using words like “deanonymized” and “pseudonymity” probably doesn’t help, either (well, it helps them).
Anyway, bold of us to presume that they will even care about accuracy prior to deployment.
This is something we’re gonna see a lot more of, and I don’t mean specifically “LLMs doing privacy violations”, though that’ll probably be a lot of it.
LLMs are really good at taking unstructured data (e.g. all your social media posts, usernames, aliases, writing style, hints about your location, time of activity, etc) and turning it into structured data. (e.g. name=this, city=that, political preference=them, etc). Why do you think most early uses of LLMs that were quickly deployed were just article summarizer tools? Unstructured data (articles) > Structured data (bullet points)
This is really good for surveillance, because it means they can take all your activity and condense it down into something that’s easier to parse and correlate. Other tools have existed to do this for a long time, (mostly in the hands of intelligence agencies) but this just makes it more accessible and easy to use, and adds some complexity to how it can operate.
I think we’re gonna see a lot more use of LLMs for things like this. Taking something unstructured, and making it structured, because hallucinations and things like that are a lot less common when the task is just reorganizing existing information, rather than coming up with something new. (though of course, hallucinations will never go away, and are still gonna be pretty prevalent)
That could be deanonymizing your accounts, or it could just be things like looking through all your files to sort them into better predefined categories, or things like what Mozilla does with their tab groups where you can have it suggest other tabs that would fit into that group, and a local model figures out which tabs belong in which topic (with pretty good accuracy in my experience.)
Unfortunately, companies have very little interest in making your life easier by doing things like sorting your files for you, because they already are quite disinterested in making their systems easy to use if it doesn’t directly generate a profit (cough cough- Microslop), and have a much larger interest in doing things like tracking you to sell you some new crap.






