Status update July 4th
Just wanted to let you know where we are with Lemmy.world.
Issues
As you might have noticed, things still won’t work as desired… we see several issues:
Performance
- Loading is mostly OK, but sometimes things take forever
- We (and you) see many 502 errors, resulting in empty pages etc.
- System load: The server is roughly at 60% cpu usage and around 25GB RAM usage. (That is, if we restart Lemmy every 30 minutes. Else memory will go to 100%)
Bugs
- Replying to a DM doesn’t seem to work. When hitting reply, you get a box with the original message which you can edit and save (which does nothing)
- 2FA seems to be a problem for many people. It doesn’t always work as expected.
Troubleshooting
We have many people helping us, with (site) moderation, sysadmin, troubleshooting, advise etc. There currently are 25 people in our Discord, including admins of other servers. In the Sysadmin channel we are with 8 people. We do troubleshooting sessions with these, and sometimes others. One of the Lemmy devs, @nutomic@lemmy.ml is also helping with current issues.
So, all is not yet running smoothly as we hoped, but with all this help we’ll surely get there! Also thank you all for the donations, this helps giving the possibility to use the hardware and tools needed to keep Lemmy.world running!
Cloud architect here— I’m sure someone’s probably already brought it up, but I’m curious if any cloud native services have been considered to take the place of what I’m sure are wildly expensive server machines. E.g. serve frontends from cloudfront, host the read-side API on Lambda@Edge so you can aggressively and regionally cache API responses, anything other than an SQL for the database — model it in DynamoDB for dirt cheap wicked speed, or Neptune for a graph database that’s more expensive but more featureful. Drop sync jobs for federated connections into SQS, have a lambda process that too, and it will scale as horizontally as you need to clear the queue in reasonable time.
It’s not quite as simple to develop and deploy as docker containers you can throw anywhere, but the massive scale you can achieve with that for fractions of the cost of servers or fargate with that much RAM is pretty great.
Or maybe you already tried/modeled this and discovered it’s terrible for you use case, in which case ignore me ;-)
You were so close until you mentioned trying to ditch SQL. Lemmy is 100% tied hard to it, and trying to replicate what it does without ACID and Joins is going to require a massive rewrite. More importantly - Lemmy’s docs suggest a docker-compose stack, not even k8s for now, it’s trying really hard not to tie into a single cloud provider and avoid having three cloud deployment scripts. Which means SQS, lambdas and cloudfront out in the short term. Quick question, are there any STOMP compliant vendors for SQS and lambda equivalent yet?
Also, the growth lemmy.world has seen has been far outside what any team could handle ime. Most products would have closed signups to handle current load and scale, well done to all involved!
Rust FTW
If Postgres becomes the bottleneck I wonder whether something like Citus could work to shard the data (relatively) transparently?
Staying cloud agnostic is very important and CDN services like cloudflare/cloudfront have inherrent privacy issues. IMO the stack should remain hostable on anyones home server environment.