So this has been going around my head for a while now: What if they do not care about their users per se but want the few users they get to exploit the federation to shamelessly crawl the fediverse?
I mean… they get enough users that will subscribe to enough of the fediverse to make instances of every shape and size proactively deliver them our post and interaction data with free shipping, right?
So is defederating in the end not only a prevention against company controlled content that might flood the fediverse, but a measure to protect the users on the fediverse right now from ending up in Meta’s databases just in the same way they would if they just had used facebook in the first place?
They could do classic web crawling, yes. But that is -super slow -easy to detect -easy to block -illegal for companies to do for the sake of selling shit in many places, since the users have not given you consent to use their data
I think they try to pull the WhatsApp stunt here: when you sign up to WhatsApp, WhatsApp will send your whole contact list to Meta and update it on every change in order to “connect the phone numbers on your phone with WhatsApp users” (or so they say). They have structured this process in a way that they’re not at fault, but the user is. Since the user “sent” them the numbers, they are not the ones who need consent to use the data, the user needed that. Same with the fediverse. “No. We didn’t steal any data without consent! Our users should have had that consent when they subscribed to technology@lemmy.world! The data was pushed to us from there, we ain’t doin’ nothin’ wrong!”
I don’t think anyone needs consent to do research using your public posts though. You can literally scrape the whole Twitter and run sentiment analysis and nobody can do anything about it for example.
Yes, you can. Yet, that will not give you the interaction history (who liked what and such) and is way less convenient to do compared to “set up ActivityPub in own app real quick and have the whole fediverse send shit to me nicely formatted with interaction data ready to be used”. Legal issues arise in some spots when doing web-scraping-things like when you copy and use copyrighted imagery or happen to scrape stuff you weren’t allowed to see for some reason.
All of those hurdles are out of the way automatically when you literally just use the inner workings of the service the data is from. No user can complain when Mate collects data sent to them via ActivityPub. That is literally what this protocol is used to do and the inner core of any application running it. If you don’t want your data to be sent to other instances around the world: Don’t use the protocol, right?
They can get the data in many different ways, this is just the most convenient one.
Interesting thanks for that I didn’t think of it that way!