A purported leak of 2,500 pages of internal documentation from Google sheds light on how Search, the most powerful arbiter of the internet, operates.
The leaked documents touch on topics like what kind of data Google collects and uses, which sites Google elevates for sensitive topics like elections, how Google handles small websites, and more. Some information in the documents appears to be in conflict with public statements by Google representatives, according to Fishkin and King.
You mean hosting your own crawler/indexer? That doesn’t really sound like a thing you could do cost-effectively.
No problem we crowdsource the crawling torrent style.
We outsourced that to google for reasonnable performance reason. But they shit the bed so now there’s no choice but to do it ourselves.
ooh that might be an interesting app to run on veilid
What is that and how does it apply ?
Source: https://en.wikipedia.org/wiki/Veilid
Federated bookmarks?
Federated directories. We’re going back to Yahoo like it’s 1995
I loved Geocities!
Neocities is trying to be a modern reincarnation https://neocities.org/
I’m so ready for something like this. I’ve cleaned up my bookmarks and been waiting for alternatives to search engines.
SearxNG
Right!
Ars
You could use Common Crawl, it’s run by a non profit
https://en.wikipedia.org/wiki/Common_Crawl
Look up the yacy repo in github