During some work with Tess, I’d notice that my test instance was running horribly slow. The CPU was spiking, Postgres was not happy and using pretty much all the available compute.
Investigating, I found the culprit to be some crawler or possibly malicious actor sending a massive number of unscoped requests to /api/v3/comment/list
. What I mean by “unscoped” is without limiting it to a post ID. I’m not sure if this is a bug in Lemmy or there’s a legit use for just fetching only comments outside of a post, but I digress as that’s another discussion.
After disallowing unscoped requests to the comment list endpoint (see mitigation further down), no more issue.
The kicker seemed to be that this bot / jackass was searching by “Old” and was requesting thousands of pages deep.
Requests looked like this: GET /api/v3/comment/list?limit=50&sort=Old&page=16413
Since I shutdown Dubvee officially, I’m not keeping logs as long as I used to, but I saw other page numbers in the access log, but they were all above 10,000. From the logs I have available, the requests seem to be coming from these 3 IP addresses, but I have insufficient data to confirm this is all of them (probably isn’t).
- 134.19.178.167
- 213.152.162.5
- 134.19.179.211
Log Excerpt
Note that I log the query string as well as the URI. I’ve run a custom Nginx setup for so long, I actually don’t recall if the query string is logged by default or not. If you’re not logging the query string, you can still look for the 3 (known) IPs above making requests to /api/v3/comment/list
and see if entries similar to these show up.
2025-09-21T14:31:59-04:00 {LB_NAME}: dubvee.org, https, {LB_IP}, 134.19.179.211, - , NL, Amsterdam, North Holland, 52.37590, 4.89750, TLSv1.3, TLS_AES_256_GCM_SHA384, "GET", "/api/v3/comment/list", "limit=50&sort=Old&page=16413"
2025-09-21T14:32:00-04:00 {LB_NAME}: dubvee.org, https, {LB_IP}, 134.19.179.211, - , NL, Amsterdam, North Holland, 52.37590, 4.89750, TLSv1.3, TLS_AES_256_GCM_SHA384, "GET", "/api/v3/comment/list", "limit=50&sort=Old&page=16413"
2025-09-21T14:32:01-04:00 {LB_NAME}: dubvee.org, https, {LB_IP}, 134.19.179.211, - , NL, Amsterdam, North Holland, 52.37590, 4.89750, TLSv1.3, TLS_AES_256_GCM_SHA384, "GET", "/api/v3/comment/list", "limit=50&sort=Old&page=16413"
2025-09-21T14:32:01-04:00 {LB_NAME}: dubvee.org, https, {LB_IP}, 134.19.179.211, - , NL, Amsterdam, North Holland, 52.37590, 4.89750, TLSv1.3, TLS_AES_256_GCM_SHA384, "GET", "/api/v3/comment/list", "limit=50&sort=Old&page=16413"
2025-09-21T14:32:12-04:00 {LB_NAME}: dubvee.org, https, {LB_IP}, 134.19.179.211, - , NL, Amsterdam, North Holland, 52.37590, 4.89750, TLSv1.3, TLS_AES_256_GCM_SHA384, "GET", "/api/v3/comment/list", "limit=50&sort=Old&page=16413"
2025-09-21T14:32:13-04:00 {LB_NAME}: dubvee.org, https, {LB_IP}, 134.19.179.211, - , NL, Amsterdam, North Holland, 52.37590, 4.89750, TLSv1.3, TLS_AES_256_GCM_SHA384, "GET", "/api/v3/comment/list", "limit=50&sort=Old&page=16413"
2025-09-21T14:32:13-04:00 {LB_NAME}: dubvee.org, https, {LB_IP}, 134.19.179.211, - , NL, Amsterdam, North Holland, 52.37590, 4.89750, TLSv1.3, TLS_AES_256_GCM_SHA384, "GET", "/api/v3/comment/list", "limit=50&sort=Old&page=16413"
2025-09-21T14:32:13-04:00 {LB_NAME}: dubvee.org, https, {LB_IP}, 134.19.179.211, - , NL, Amsterdam, North Holland, 52.37590, 4.89750, TLSv1.3, TLS_AES_256_GCM_SHA384, "GET", "/api/v3/comment/list", "limit=50&sort=Old&page=16413"
Mitigation:
First, I blocked the IPs making these requests, but they would come back from a different one. Finally, I implemented a more robust solution.
My final mitigation was to simply reject requests to /api/v3/comment/list
that did not have a post ID in the query parameters. I did this by creating a dedicated location block in Nginx that is an exact match for /api/v3/comment/list
and doing the checks there.
I could probably add another check to see if the page number is beyond a reasonable number, but since I’m not sure what, if any, clients utilize this, I’m content just blocking unscoped comment list requests entirely. If you have more info / better suggestion, leave it in the comments.
# Basically an and/or for has post_id or has saved_only
map $has_post_id:$has_saved_only $comment_list_invalid{
"1:0" 1;
"0:1" 1;
"1:1" 1;
default 0;
}
server {
...
location = /api/v3/comment/list {
# You'll need the standard proxy_pass headers such as Host, etc. I load those from an include file.
include conf.d/includes/http/server/location/proxy.conf;
# Create a variable to hold a 0/1 state
set $has_post_id 0;
# If the URL query string contains 'post_id' set the variable to 1
if ($arg_post_id) {
set $has_post_id 1;
}
if ($arg_saved_only) {
set $has_saved_only 1;
}
# If the comment_list_invalid map resolves to 0, "send" a 444 resposne
# 444 is an Nginx-specific return code that immediately closes the connection
# and wastes no further resources on the request
if ($comment_list_invalid = 0) {
return 444;
}
# Otherwise, proxy pass to the API as normal
# (replace this with whatever your upstream name is for the Lemmy API
proxy_pass "http://lemmy-be/";
}
Can’t edit the post (Thanks Cloudflare! /s) but additional info:
- I truncated the log excerpts in the post. The user agent string in these requests isn’t shown here, but it is blank in the actual logs.
- This is for Lemmy admins only. It might apply to others in some form, but this seems to be specifically exploiting a Lemmy API endpoint
- My Nginx solution may have room for improvement; I was just trying to block that behavior without breaking comments in posts and move on with my day. Suggestions for improvement are welcome.
Get a blocklist and set it up.
Literally all of the IPs are known bots for up to 3 years:
- https://www.abuseipdb.com/check/134.19.178.167
- https://www.abuseipdb.com/check/213.152.162.5
- https://www.abuseipdb.com/check/134.19.179.211
Oh and maybe also a rate-limiter…
Thanks for sharing
FYI these are all on ASN 49453
The other (lazier) option is to block/challenge the ASN
That’s my normal go-to, but more than once I’ve accidentally blocked locations that Let’s Encrypt uses for secondary validation, so I’ve had to be more precise with my firewall blocks
This is for Lemmy I presume (or also for Piefed or Mbin)? You’ve modified yours heavily though, I thought, which could complicate matters. I wonder if you are having those bot scraping issues that semi-recently (a month or so ago?) started increasing in frequency. So many instances now have a human detector before letting you in whereas before it was not necessary.
Lemmy. I added a comment above since LW wouldn’t let me edit the post.
Mine’s only extended with some WAF rules and I’ve got a massive laundry list of bot user agents that it blocks, but otherwise it’s pretty bog standard.
If instances have Anubis setup correctly (i.e. not in front of
/api/...
) then that might not help them since this is calling the API endpoint.All of a sudden your edits went through - perhaps a delay caused by this same issue?
Also some related posts:
- another one reporting similar attack-like activities https://lemmy.world/post/36413045
- a month ago similarly https://lemmy.world/post/34310429
Things have been slow for me off and on in recent weeks. And today it’s quite slow.
Unfortunately, there’s many many reasons that could be the case. I’m just putting this out there since it’s easy to check for and mitigate against.
I appreciate the effort!
To everyone in this thread, if you notice a problem in Lemmy please open an issue. We are only two developers and dont have time to browse the Fediverse all day to come across such things. Only if we know about a problem can we actually fix it and make a new release.
For reference here are the issue and proposed fix: