Anubis is awesome and I want to talk about it

SmokeyDope@piefed.social · edit-2 1 month ago

Anubis is awesome and I want to talk about it

non_burglar@lemmy.world · 1 month ago

Anubis is an elegant solution to the ai bot scraper issue, I just wish the solution to everything wasn’t just spending compute everywhere. In a world where we need to rethink our energy consumption and generation, even on clients, this is a stupid use of computing power.

quick_snail@feddit.nl · 1 month ago

We have memory hard cryptographic functions

sudo@programming.dev · 1 month ago

I’ve repeatedly stated this before: Proof of Work bot-management is only Proof of Javascript bot-management. It is nothing to a headless browser to by-pass. Proof of JavaScript does work and will stop the vast majority of bot traffic. That’s how Anubis actually works. You don’t need to punish actual users by abusing their CPU. POW is a far higher cost on your actual users than the bots.

Last I checked Anubis has an JavaScript-less strategy called “Meta Refresh”. It first serves you a blank HTML page with a <meta> tag instructing the browser to refresh and load the real page. I highly advise using the Meta Refresh strategy. It should be the default.

I’m glad someone is finally making an open source and self hostable bot management solution. And I don’t give a shit about the cat-girls, nor should you. But Techaro admitted they had little idea what they were doing when they started and went for the “nuclear option”. Fuck Proof of Work. It was a Dead On Arrival idea decades ago. Techaro should strip it from Anubis.

I haven’t caught up with what’s new with Anubis, but if they want to get stricter bot-management, they should check for actual graphics acceleration.

SmokeyDope@piefed.social · edit-2 1 month ago

Something that hasn’t been mentioned much in discussions about Anubis is that it has a graded tier system of how sketchy a client is and changing the kind of challenge based on a a weighted priority system.

The default bot policies it comes with has it so squeaky clean regular clients are passed through, then only slightly weighted clients/IPs get the metarefresh, then its when you get to moderate-suspicion level that JavaScript Proof of Work kicks. The bot policy and weight triggers for these levels, challenge action, and duration of clients validity are all configurable.

It seems to me that the sites who heavy hand the proof of work for every client with validity that only last every 5 minutes are the ones who are giving Anubis a bad wrap. The default bot policy settings Anubis comes with dont trigger PoW on the regular Firefox android clients ive tried including hardened ironfox. meanwhile other sites show the finger wag every connection no matter what.

Its understandable why some choose strict policies but they give the impression this is the only way it should be done which Is overkill. I’m glad theres config options to mitigate impact normal user experience.

sudo@programming.dev · 1 month ago

Anubis is that it has a graded tier system of how sketchy a client is and changing the kind of challenge based on a a weighted priority system.

Last I checked that was just User-Agent regexes and IP lists. But that’s where Anubis should continue development, and hopefully they’ve improved since. Discerning real users from bots is how you do proper bot management. Not imposing a flat tax on all connections.

rtxn@lemmy.world · edit-2 1 month ago

POW is a far higher cost on your actual users than the bots.

That sentence tells me that you either don’t understand or consciously ignore the purpose of Anubis. It’s not to punish the scrapers, or to block access to the website’s content. It is to reduce the load on the web server when it is flooded by scraper requests. Bots running headless Chrome can easily solve the challenge, but every second a client is working on the challenge is a second that the web server doesn’t have to waste CPU cycles on serving clankers.

POW is an inconvenience to users. The flood of scrapers is an existential threat to independent websites. And there is a simple fact that you conveniently ignored: it fucking works.

sudo@programming.dev · 1 month ago

Its like you didn’t understand anything I said. Anubis does work. I said it works. But it works because most AI crawlers don’t have a headless browser to solve the PoW. To operate efficiently at the high volume required, they use raw http requests. The vast majority are probably using basic python requests module.

You don’t need PoW to throttle general access to your site and that’s not the fundamental assumption of PoW. PoW assumes (incorrectly) that bots won’t pay the extra flops to scrape the website. But bots are paid to scape the website users aren’t. They’ll just scale horizontally and open more parallel connections. They have the money.

___qwertz___@feddit.org · 1 month ago

Funnily enough, PoW was a hot topic in academia around the late 90s / early 2000, and it’s somewhat clear that the autor of Anubis has not read much about the discussion back then.

There was a paper called “Proof of work does not work” (or similar, can’t be bothered to look it up) that argued that PoW can not work for spam protection, because you have to support both low-powered consumer devices while blocking spammers with heavy hardware. And that is very valid concern. Then there was a paper arguing that PoW can still work, as long as you scale the difficulty in such a way that a legit user (e.g. only sending one email) has a low difficulty, while a spammer (sending thousands of emails) has a high difficulty.

The idea of blocking known bad actors actually is used in email quite a lot in forms of DNS block lists (DNSBLs) such as spamhaus (this has nothing to do with PoW, but such a distributed list could be used to determine PoW difficulty).

Anubis on the other hand does nothing like that and a bot developed to pass Anubis would do so trivially.

Sorry for long text.

Flipper@feddit.org · 1 month ago

At least in the beginning the scrapers just used curl with a different user agent. Forcing them to use a headless client is already a 100x increase in resources for them. That in itself is already a small victory and so far it is working beautifully.

sudo@programming.dev · 1 month ago

Well in most cases it would by Python requests not curl. But yes, forcing them to use a browser is the real cost. Not just in CPU time but in programmer labor. PoW is overkill for that though.

sudo@programming.dev · 1 month ago

Then there was a paper arguing that PoW can still work, as long as you scale the difficulty in such a way that a legit user

Telling a legit user from a fake user is the entire game. If you can do that you just block the fake user. Professional bot blockers like Cloudflare or Akamai have machine learning systems to analyze trends in network traffic and serve JS challenges to suspicious clients. Last I checked, all Anubis uses is User-Agent filters, which is extremely behind the curve. Bots are able to get down to faking TLS fingerprints and matching them with User-Agents.

quick_snail@feddit.nl · 1 month ago

Hashcash works great, what are you going on about?

sudo@programming.dev · 1 month ago

daniskarma@lemmy.dbzer0.com · edit-2 1 month ago

I don’t think you have a usecase for Anubis.

Anubis is mainly aimed against bad AI scrappers and some ddos mitigation if you have a heavy service.

You are getting hit exactly the same, anubis doesn’t put up a block list or anything. It just put itself in front of the service. The load on your server and the risk you take it’s very similar anubis or not anubis here. Most bots are not AI scrappers they are just proving. So the hit on your server is the same.

What you want is to properly set up fail2ban or, even better, crowdsec. That would actually block and ban bots that try to prove your server.

If you are just self-hosting with Anubis the only thing you are doing is deriving the log noise towards Anubis logs and making your devices do a PoW every once in a while when you want to use your services.

Being honest I don’t know what you are self hosting. But at least it’s something that’s going to get ddos or AI scrapped, there’s not much point with Anubis.

Also Anubis is not a substitute for fail2ban or crowdsec. You need something to detect and ban brute force attacks. If not the attacker would only need to execute the anubis challenge get the token for the week and then they are free to attack your services as they like.

henfredemars@infosec.pub · 1 month ago

I appreciate a simple piece of software that does exactly what it’s supposed to do.

merc@sh.itjust.works · 1 month ago

The front page of the web site is excellent. It describes what it does, and it does its feature set in quick, simple terms.

I can’t tell you how many times I’ve gone to a website for some open-source software and had no idea what it was or how it was trying to do it. They often dive deep into the 300 different ways of installing it, tell you what the current version is and what features it has over the last version, but often they just assume you know the basics.

0_o7@lemmy.dbzer0.com · 1 month ago

I don’t mind Anubis but the challenge page shouldn’t really load an image. It’s wasting extra bandwidth for nothing.

Just parse the challenge and move on.

Allero@lemmy.today · 1 month ago

Afaik, you can set it up not to have any image, or have any other one.

Voroxpete@sh.itjust.works · edit-2 1 month ago

It’s actually a brilliant monetization model. If you want to use it as is, it’s free, even for large corporate clients.

If you want to get rid of the puppygirls though, that’s when you have to pay.

frongt@lemmy.zip · 1 month ago

It’s open source, so you could always just patch it without paying too. But you should support the maintainers if you think they deserve it.

quick_snail@feddit.nl · 1 month ago

Kinda sucks how it makes websites inaccessible to folks who have to disable JavaScript for security.

WhyJiffie@sh.itjust.works · 1 month ago

there’s a fork that has non-js checks. I don’t remember the name but maybe that’s what should be made more known

quick_snail@feddit.nl · 1 month ago

Please share if you know.

The only way I know how to do this is running a Tor Onion Service, since the tor protocol has built-in pow support (without js)

WhyJiffie@sh.itjust.works · edit-2 1 month ago

It’s this one: https://git.gammaspectra.live/git/go-away

the project name is a bit unfortunate to show for users, maybe change that if you will use it.

some known privacy services use it too, including the invidious at nadeko.net, so you can check there how it works. It’s one of the most popular inv servers so I guess it cannot be bad, and they use multiple kinds of checks for each visitor

WhyJiffie@sh.itjust.works · 1 month ago

ps: I was wrong it’s not a fork, but a different thing doing the same and more

Nate Cox@programming.dev · 1 month ago

Counterpoint: Anubis is not awesome: https://lock.cmpxchg8b.com/anubis.html

Cyberflunk@lemmy.world · 1 month ago

thank you! this needed said.

This post is a bit critical of a small well-intentioned project, so I felt obliged to email the maintainer to discuss it before posting it online. I didn’t hear back.

i used to watch the dev on mastodon, they seemed pretty radicalized on killing AI, and anyone who uses it (kidding!!) i’m not even surprised you didn’t hear back

great take on the software, and as far as i can tell, playwright still works/completes the unit of work. at scale anubis still seems to work if you have popular content, but does hasnt stopped me using claude code + virtual browsers

im not actively testing it though. im probably very wrong about a few things, but i know anubis isn’t hindering my personal scraping, it does fuck up perplexity and chatgpt bots, which is fun to see.

good luck Blue team!

Nate Cox@programming.dev · 1 month ago

For clarity: I didn’t write the article, it’s just a good reference.

SmokeyDope@piefed.social · 1 month ago

What use cases does perplexity do that Claude doesn’t for you?

A_norny_mousse@feddit.org · edit-2 1 month ago

At the time of commenting, this post is 8h old. I read all the top comments, many of them critical of Anubis.

I run a small website and don’t have problems with bots. Of course I know what a DDOS is - maybe that’s the only use case where something like Anubis would help, instead of the strictly server-side solution I deploy?

I use CrowdSec (it seems to work with caddy btw). It took a little setting up, but it does the job.
(I think it’s quite similar to fail2ban in what it does, plus community-updated blocklists)

Am I missing something here? Why wouldn’t that be enough? Why do I need to heckle my visitors?

Despite all that I still had a problem with bots knocking on my ports spamming my logs.

By the time Anubis gets to work, the knocking already happened so I don’t really understand this argument.

If the system is set up to reject a certain type of requests, these are microsecond transactions of no (DDOS exception) harm.

daniskarma@lemmy.dbzer0.com · edit-2 1 month ago

You are right. For most self-hosting usecases anubis is not only irrelevant, but it actually works against you. False sense of security and making your devices do extra work for nothing.

Anubis is though for public facing services that may get ddos or AI scrapped by some not targeted bot (for a target bot it’s trivial to get over Anubis in order to scrap).

And it’s never a substitute of crowdsec or fail2ban. Getting an Anubis token it’s just a matter of executing the PoW challenge. You still need a way to detect and ban malicious attacks.

quick_snail@feddit.nl · 1 month ago

With varnish and wazuh, I’ve never had a need for Anubis.

My first recommendation for anyone struggling with bots is to fix their cache.

kalleboo@lemmy.world · 1 month ago

Anubis was originally created to protect git web interfaces since they have a lot of heavy-to-compute URLs that aren’t feasible to cache (revision diffs, zip downloads etc).

After that I think it got adopted by a lot of people who didn’t actually need it, they just don’t like seeing AI scrapers in their logs.

quick_snail@feddit.nl · 1 month ago

Yes!

Also, another very simple solution is to authwall expensive pages that can’t be cached.

Miggi@discuss.tchncs.de · 1 month ago

I also used CrowdSec for almost a year, but as AI scrapers became more aggressive, CrowdSec alone wasn’t enough. The scrapers used distributed IP ranges and spoofed user agents, making them hard to detect and costing my Forgejo instance a lot in expensive routes. I tried custom CrowdSec rules but hit its limits.

Then I discovered Anubis. It’s been an excellent complement to CrowdSec — I now run both. In my experience they work very well together, so the question isn’t “A or B?” but rather “How can I combine them, if needed?”

SmokeyDope@piefed.social · edit-2 1 month ago

If crowdsec works for you thats great but also its a corporate product whos premium sub tier starts at 900$/month not exactly a pure self hosted solution.

I’m not a hypernerd, still figuring all this out among the myriad of possible solutions with different complexity and setup times. All the self hosters in my internet circle started adopting anubis so I wanted to try it. Anubis was relatively plug and play with prebuilt packages and great install guide documentation.

Allow me to expand on the problem I was having. It wasnt just that I was getting a knock or two, its that I was getting 40 knocks every few seconds scraping every page and searching for a bunch that didnt exist that would allow exploit points in unsecured production vps systems.

On a computational level the constant network activity of bytes from webpage, zip files and images downloaded from scrapers pollutes traffic. Anubis stops this by trapping them in a landing page that transmits very little information from the server side. By traping the bot in an Anubis page which spams that 40 times on a single open connection before it gives up, it reduces overall network activity/ data transfered which is often billed as a metered thing as well as the logs.

And this isnt all or nothing. You don’t have to pester all your visitors, only those with sketchy clients. Anubis uses a weighted priority which grades how legit a browser client is. Most regular connections get through without triggering, weird connections get various grades of checks by how sketchy they are. Some checks dont require proof of work or JavaScript.

On a psychological level it gives me a bit of relief knowing that the bots are getting properly sinkholed and I’m punishing/wasting the compute of some asshole trying to find exploits my system to expand their botnet. And a bit of pride knowing I did this myself on my own hardware without having to cop out to a corporate product.

Its nice that people of different skill levels and philosophies have options to work with. One tool can often complement another too. Anubis worked for what I wanted, filtering out bots from wasting network bandwith and giving me peace of mind where before I had no protection. All while not being noticeable for most people because I have the ability to configure it to not heckle every client every 5 minutes like some sites want to do.

A_norny_mousse@feddit.org · edit-2 1 month ago

If crowdsec works for you thats great but also its a corporate product

It’s also fully FLOSS with dozens of contributors (not to speak of the community-driven blocklists). If they make money with it, great.

not exactly a pure self hosted solution.

Why? I host it, I run it. It’s even in Debian Stable repos, but I choose their own more up-to-date ones.

Allow me to expand on the problem I was having. It wasnt just that I was getting a knock or two, its that I was getting 40 knocks every few seconds scraping every page and searching for a bunch that didnt exist that would allow exploit points in unsecured production vps systems.

Again, a properly set up WAF will deal with this pronto
You should not have exploit points in unsecured production systems, full stop.

On a computational level the constant network activity of bytes from webpage, zip files and images downloaded from scrapers pollutes traffic. Anubis stops this by trapping them in a landing page that transmits very little information from the server side.

And instead you leave the computations to your clients. Which becomes a problem on slow hardware.
Again, with a properly set up WAF there’s no “traffic pollution” or “downloading of zip files”.

Anubis uses a weighted priority which grades how legit a browser client is.

And apart from the user agent and a few other responses, all of which are easily spoofed, this means “do some javascript stuff on the local client” (there’s a link to an article here somewhere that explains this well) which will eat resources on the client’s machine, which becomes a real pita on e.g. smartphones.

Also, I use one of those less-than-legit, weird and non-regular browsers, and I am being punished by tools like this.

All the self hosters in my internet circle started adopting anubis so I wanted to try it. Anubis was relatively plug and play with prebuilt packages

edit: I feel like this part of OP’s argument needs to be pointed out, it explains so much:

All the self hosters in my internet circle started adopting anubis so I wanted to try it. Anubis was relatively plug and play with prebuilt packages

SmokeyDope@piefed.social · edit-2 1 month ago

why? I run it.

Mmm how to say this. i suppose what I’m getting at is like a philosophy of development and known behaviors of corporate products.

So, here’s what I understand about crowdsec. Its essentially like a centralized collection of continuously updated iptable rules and botscanning detectors that clients install locally.

In a way its crowd sourcing is like a centralized mesh network each client is a scanner node which phones home threat data to the corporate home which updates that.

Notice the optimal word, centralized. The company owns that central home and its their proprietary black box to do what they want with. And so you know what for profit companies like to do to their services over time? Enshittify them by

adding subscription tier price models
putting once free features behind paywalls,
change data sharing requirements as a condition for free access
restricting free api access tighter and tighter to encourage paid tiers,
making paid tiers cost more to do less.
Intentionally ruining features in one service to drive power users to use a different.

They can and do use these tactics to drive up profit or reduce overhead once a critical mass has been reached. I do not expect alturism and respect for usersfrom corporations, I expect bean counters using alturism as a vehicle to attract users in the growing phase and then flip the switch in their tos to go full penny pinching once they’re too big to fail.

Crowdsecs pricing updates from last year

CrowdSec updated pricing policy

Hi everyone,

Our former pricing model led to some incomprehensions and was sub-optimal for some use-cases.

We remade it entirely here. As a quick note, in the former model, one never had to pay $2.5K to get premium blocklists. This was Support for Enterprise, which we poorly explained. Premium blocklists were and are still available from the premium SaaS plan, accessible directly from the SaaS console.

Here are the updates:

Security Engine: All its embedded features (IDS, IPS and WAF) were, are and will remain free.

SAAS: The free plan offers up to three silver-grade blocklists (on top of receiving IP related to signals your security engines share). Premium plans can use any free, premium and gold-grade blocklists. Previously, we had a premium and an enterprise plan with more features. All features are now merged into a unique SaaS enterprise plan. The one starting at $31/month. As before, those are available directly from the SaaS console page: https://app.crowdsec.net

SUPPORT: The $2.5K (which were mostly support for Enterprise) are now becoming optional. Instead, a client can contract $1K for Emergency bug & security fixes and $1K for support if they want to.

BLOCKLISTS: Very specific (country targeted, industry targeted, stack targeted, etc.) or AI-enhanced are now nested in a different offer named “Platinum blocklists subscription”. You can subscribe to them, regardless of whether you use the FOSS Security Engine or not. They can be joined, tuned, and injected directly into most firewalls with regular automatic remote updates of their content. As long as you do not resell them (meaning you are the final client), you can use the subscription in any part of your company.

CTI DATA: They can be consumed through API keys with associated quotas. These are affordable and intended for use in tools like OpenCTI, MISP, The Hive, Xsoar, etc. Costs are in the range of hundreds of dollars per month. The Full CTI database can also be locally replicated at your place and constantly synced for deltas. Those are the largest plans we have, and they are usually destined to L/XL enterprises, governmental bodies, OEM & hardware vendors.

Safer together.
14
·
14
Comments Section
u/ShroomShroomBeepBeep avatar
ShroomShroomBeepBeep
•
1y ago

Whilst I’m pleased to see it made clearer, £290 a year for each security engine is still far too expensive for me to consider it.
2
u/GuitarEven avatar
GuitarEven
•
1y ago

We get that £290 is too high for individual home labs. Those offers are made for companies.
Free tier features should cover homelabs correctly.

Features that are oriented for enterprise clients.
If a company cannot invest $300 yearly in its security, no judgment and the free tier will still be very helpful until it recovers some budget margins to strengthen its security posture.
4
[deleted]
•
1y ago

Any idea why we dont have any good free / freemium (max $5 per month) app yet. Reason am asking - adguard, urigin etc had filters which matches js/domains and filters them out. Same logic can be applied atleast for the ip lists - so that these ips cann be added to iptables to block. A lot of things are easy to make. The tough ones are things like scenarios and may be ssh bw etc. I wonder why no real competition.
1
u/GuitarEven avatar
GuitarEven
•
1y ago

hi u/ElizabethThomas44

Well you actually do. To date, for free, you get:

the security engine (IDS/IPS/WAF)
all scenarios
the blocklist of IPs you are participating to detect when you use scenarios and share signals
the free tier of the console

The IPs you automatically get for free are already added to your nftables or iptables using the related remediation component.

<TL/DR> You already have it.

(damn, personal reddit account, sorry, this is Philippe@CrowdSec)
4

At the end of the day its not the thousands of anonymous users contributing their logs or Foss voulenteers on git getting a quarterly payout. They’re the product and free compute + live action pen testing ginnea pigs, no matter what PR they spin saying how much they care about the security of the plebs using their network for free.

Its always about maximizing the money with these people your security can get fucked if they dont get some use out of you. Expect at some point the tos will change so that anonymized data sharing is no longer an option for free tier.

What happens if the company goes bankrupt? Does it just stop working when their central servers shut down? Does their open source security have the possibility of being forked and run from local servers?

It doesnt have to be like this. Peer to peer Decentralized mesh networks like YaCy already show its possible for a crowdsourced network of users can all contribute to an open database. Something that can be completely run as a local Node which federates and updates the information in global node. Something like it that updates a global iptables is already a step in the right direction. In that theoretical system there is no central monopoly its like the fediverse everyone contributes to hosting the global network as a mesh which altruistic hobbyist can contribute free compute to on their own terms.

https://github.com/yacy/yacy_search_server

I"I dont see anything wrong with people getting paid" is something I see often on discussions. Theres nothing wrong with people who do work and make contributions getting paid. What’s wrong is it isnt the open source community on github or the users contributing their precious data getting paid, its a for profit centralized monopoly that controls access to the network which the open source community built for free out of alturism.

The pattern is nearly always the same. The thing that once worked well and which you relied on gets slowly worse each ToS update, while their pricing inches just a dollar higher each quarter, and you get less and less control over how you get to use their product. Its pattern recognition.

The only solution is to cut the head off the snake. If I can’t fully host all of the components, see the source code of the mechanisms at all layers, own a local copy of the global database, then its not really mine.

Again, it’s a philosophy thing. Its very easy to look at all that, shrug, and go “whatever not my problem I’ll just switch If it becomes an issue”. But the problem festers the longer its ignored or enabled for convinence. The community needs to truly own the services they run on every level, it has to be open, and for profit bean counters can’t be part of the equation especially for hosting. There are homelab hobbyist out there who will happily eat cents on a electric bill to serve an open service to a community, get 10,000 of them on a truly open source decentralized mesh network and you can accomplish great things without fear of being the product.

sudoer777@lemmy.ml · edit-2 1 month ago

I host my main server on my own hardware, and a VPN on Hetzner because my shitty ISP doesn’t let me port forward. For the past year, bots were hitting my Forgejo instance hard. I forgot to disable registration and they generated hundreds of accounts with hundreds of repos with sketchy links, generating terrabytes of traffic from my VPS, costing me money in traffic. I disabled registration and deleted the spam, and bots still kept hitting my server for several months, which would cause memory leaks over time and crash it and consume CPU, and still costed me money with terrabytes of traffic per month. A few weeks ago, I put Anubis on the VPS. Now, zero bots hit my Forgejo instance and I don’t pay for their traffic anymore. Problem solved.

Jason2357@lemmy.ca · 1 month ago

Its always code forges and wikis that are effected by this because the scrapers spider down into every commit or edit in your entire history, then come back the next day and check every “page” again to see if any changed. Consider just blocking pages that are commit history at your reverse proxy.

LOLseas@lemmy.zip · 1 month ago

This is the first time I’ve ever seen it misspelled like that. It’s ‘terabyte/terabytes’. 1,024 GBs worth of data.

sudoer777@lemmy.ml · 1 month ago

Oops, although terabyte is 1000 GB, 1024 GiB is tebibyte

LOLseas@lemmy.zip · 1 month ago

Thanks friend. I only knew of the JEDEC terms, TIL.

WorldsDumbestMan@lemmy.today · 1 month ago

Nice ads people! Good job!

Helix 🧬@feddit.org · 1 month ago

So you think techaro paid them?

WorldsDumbestMan@lemmy.today · 1 month ago

No clue, but it sounds so ad like…

TerHu@lemmy.dbzer0.com · 1 month ago

yes, please be mindful when using cloudflare. with them you’re possibly inviting in a much much bigger problem

https://www.devever.net/~hl/cloudflare

quick_snail@feddit.nl · edit-2 1 month ago

Great article, but I disagree about WAFs.

Try to secure a nonprofit’s web infrastructure with as 1 IT guy and no budget for devs or security.

It would be nice if we could update servers constantly and patch unmaintained code, but sometimes you just need to front it with something that plugs those holes until you have the capacity to do updates.

But 100% the WAF should be run locally, not a MiTM from evil US corp in bed with DHS.

mrbn@lemmy.ca · 1 month ago

When I visit sites on my cellphone, Anubis often doesn’t let me through.

cmnybo@discuss.tchncs.de · 1 month ago

I’ve never had any issues on my phone using Fennec or Firefox. I don’t have many addons installed apart from uBlock Origin. I wouldn’t be surprised if some privacy addons cause issues with Anubis though.

mrbn@lemmy.ca · 1 month ago

Yeah, my setup is almost like yours; I’m also on Firefox with unlock and the only difference is that I’m also using Privacy Badger

perishthethought@piefed.social · 1 month ago

I don’t really understand what I am seeing here, so I have to ask – are these Security issues a concern?

https://github.com/TecharoHQ/anubis/security

I have a server running a few tiny web sites, so I am considering this, but I’m always concerned about the possibility that adding more things to it could make it less secure, versus more. Thanks for any thoughts.

SmokeyDope@piefed.social · 1 month ago

Security issues are always a concern the question is how much. Looking at it they seem to at most be ways to circumvent the Anubis redirect system to get to your page using very specific exploits. These are marked as m low to moderate priority and I do not see anything that implies like system level access which is the big concern. Obviously do what you feel is best but IMO its not worth sweating about. Nice thing about open source projects is that anyone can look through and fix, if this gets more popular you can expect bug bounties and professional pen testing submissions.

artyom@piefed.social · 1 month ago

This isn’t really a security issue as much as it is a DDOS issue.

Imagine you own a brick and mortar store. And periodically one thousand fucking people sprint into your store and start recording the UPCs on all the products, knocking over every product in the store along the way. They don’t buy anything, they’re exclusively there to collect information from your store which they can use to grift investors and burn precious resources, and if they fuck your shit up in the process, that’s your problem.

This bot just sits at the door and ensures the people coming in there are actually shoppers interested in the content of some items of your store.

panda_abyss@lemmy.ca · 1 month ago

I like the quirky SPH character

Arghblarg@lemmy.ca · edit-2 1 month ago

I have a script that watches apache or caddy logs for poison link hits and a set of bot user agents, adding IPs to an ipset blacklist, blocking with iptables. I should polish it up for others to try. My list of unique IPs is well over 10k in just a few days.

git repos seem to be real bait for these damn AI scrapers.

quick_snail@feddit.nl · 1 month ago

You just described what wazuh does ootb

pedroapero@lemmy.ml · 1 month ago

Hi, there are pre-made ipset lists also, ex: https://github.com/ktsaou/blocklist-ipsets

drkt@lemmy.dbzer0.com · 1 month ago

Stop playing wack-a-mole with these fucking people and build TARPITS!

Make it HURT to crawl your site illegitimately.