A purported leak of 2,500 pages of internal documentation from Google sheds light on how Search, the most powerful arbiter of the internet, operates.
The leaked documents touch on topics like what kind of data Google collects and uses, which sites Google elevates for sensitive topics like elections, how Google handles small websites, and more. Some information in the documents appears to be in conflict with public statements by Google representatives, according to Fishkin and King.
Can’t wait for selfhosted web search to become better.
You mean hosting your own crawler/indexer? That doesn’t really sound like a thing you could do cost-effectively.
No problem we crowdsource the crawling torrent style.
We outsourced that to google for reasonnable performance reason. But they shit the bed so now there’s no choice but to do it ourselves.
ooh that might be an interesting app to run on veilid
What is that and how does it apply ?
Source: https://en.wikipedia.org/wiki/Veilid
Right!
Ars
Federated bookmarks?
Federated directories. We’re going back to Yahoo like it’s 1995
I loved Geocities!
Neocities is trying to be a modern reincarnation https://neocities.org/
I’m so ready for something like this. I’ve cleaned up my bookmarks and been waiting for alternatives to search engines.
SearxNG
You could use Common Crawl, it’s run by a non profit
https://en.wikipedia.org/wiki/Common_Crawl
Look up the yacy repo in github
If they’re taking tips from Google, why would they get better?
Google actually was good, so there’s probably some good information in this documentation. If nothing else we can perhaps figure out what “went wrong.”
Edit: I’ve been reading the blog post that appears to be the main person the leak was shared with and there’s a lot of in-depth analysis being done there, but I’m not seeing a link to the actual documents. This is a huge article, though, I might be overlooking it.
That was an interesting read. Thanks for linking to it.
What are the current contenders?
Ars Technica this week: Bing outage shows just how little competition Google search really has
The referenced search engine comparison by Rohan “Seirdy” Kumar
can’t emphasise too much that this piece is a very necessary read for anyone who wants to know about search; not just because it says good things about us, but because of the depth of research which has been put in here. Most times you encounter an article about indexes they are just taking whatever a (meta)search engine says about themselves, not even looking at privacy policies for “relationships with microsoft” etc. or doing any comparative work.
YaCy, Mwmbl, Alexandria, Stract, Marginalia to name a few.