Almost certainly this isn’t anything to do with scraping. Like with Reddit, those with a stake in Twitter stand to benefit from AI and, as far as I know, there’s no mass reposting (retweeting?) effort to something like Mastodon.
That would be trivial to block anyway, since it would be easy to identity the service accounts and source IP’s of the requests. No need to impact average users.
What’s more likely is he hasn’t paid the bill for his cloud infrastructure and no longer has the capacity to serve so many users.
IMO, that’s what you get when you fire half of your staff.
I’m not so sure, there are a lot of businesses and people training their AI models right now and sites like reddit or twitter are very attractive huge collections of user generated content. It’s not the most outrageous assumption that they’ll try to get that data for free by scraping instead of paying for API access.
But also, hasn’t that boat left already for several AI companies? They’ve already trained it up, no need to scrape again, they just use what they got last time for their core training, it’s only the last couple of years/months they’re missing.
Almost certainly this isn’t anything to do with scraping. Like with Reddit, those with a stake in Twitter stand to benefit from AI and, as far as I know, there’s no mass reposting (retweeting?) effort to something like Mastodon.
That would be trivial to block anyway, since it would be easy to identity the service accounts and source IP’s of the requests. No need to impact average users.
What’s more likely is he hasn’t paid the bill for his cloud infrastructure and no longer has the capacity to serve so many users.
IMO, that’s what you get when you fire half of your staff.
I’m not so sure, there are a lot of businesses and people training their AI models right now and sites like reddit or twitter are very attractive huge collections of user generated content. It’s not the most outrageous assumption that they’ll try to get that data for free by scraping instead of paying for API access.
But also, hasn’t that boat left already for several AI companies? They’ve already trained it up, no need to scrape again, they just use what they got last time for their core training, it’s only the last couple of years/months they’re missing.
Is just ridiculously false. If you think it is true, make a service to do this trivial thing for people, and become a millionaire overnight.
Funnily enough, I do. I’m an SRE myself.
Services like Akamai have tools that are literally designed to block requests from known bad locations and IP ranges.