This is exactly what I came here to find. Thank you for posting it. If I can be so bold selfhosters should really be leaning this way searxng is great but it still uses big tech.
The other thing we need is a way to identify good crawling agents or *smol agents over corporate bots that just steal content.
If selfhosters can unite and build a good index perhaps searching can go back to the way it was vs a vector to sell you more and collect your data.
Or how about YaCy. It’s self-hostable & you can have your own web index and start your own web-crawler.
It’s peer-to-peer too
This is exactly what I came here to find. Thank you for posting it. If I can be so bold selfhosters should really be leaning this way searxng is great but it still uses big tech.
The other thing we need is a way to identify good crawling agents or *smol agents over corporate bots that just steal content.
If selfhosters can unite and build a good index perhaps searching can go back to the way it was vs a vector to sell you more and collect your data.
I personally love yacy.