Remember when the Internet was supposed to be decentralised for resilience?
No, sorry, I’m not that old :P
Remember: you’re never too young to have a Vietnam flashback!
Yes. Then these assholes came along…
Back during the Rant Radio days…
I remember SLAs including ‘five nines’ ensurances. That meant 99.999% uptime or an allowance of 26 seconds of downtime a month. That would be unheard of nowadays because no cloud provider can ensure that they will have that uptime.
Amazon has so much redundancy built into EC2 that I genuinely thought they’d be able to avoid this.
I may be mistaken, but I really could’ve sworn that a lot of the really strict SLA guarantees Amazon gives assume you are doing things across availability zones and/or regions. Like they’re saying “we guarantee 99.999% of uptime across regions” sort of thing. Take this with a grain of salt, it’s something I only half remember from a long time ago.
The problem comes in so many directions in real life though. Say your company has a very large database. Replicating it across regions means you’re paying for data ingress/egress and more than one region’s copy of that already sharded and/or duplicated database. It even applies when transferring data across AZs in a given region. Backing it up to S3 is expensive, backing it up to Glacier is cheaper, until you ever have to do a restore, and then you have to lay off half the staff to pay for it.
Other issues can arise, possibly through the fault of yourself, sometimes at the fault of Amazon, if data traffic routing has a glitch and data is routing to the wrong place. The onus either way is on your company to show Amazon the receipts if you expect to get credits for the overage. At larger scale, this could be hundreds of thousands of dollars in overage. Easy to torpedo smaller companies with one mistake.
They didn’t used to nickel and dime as hard as they do now, which doesn’t help, but outside of history, they set up AWS to be the biggest slippery slope of wallet-deletion, as almost every move you make costs money. Entire companies exist to manage your AWS costs (for more money, of course) and other companies’ products you may use that are hosted in your infra may accidentally delete your wallet if you don’t constantly monitor them.
Using AWS cost-efficiently is only accomplished by ostensibly day-trading your cloud resources like a high frequency stock trader, capitalizing on unpopular/weird system types, and keeping your code as portable as possible.
…but if one didn’t care about cost, one would probably get pretty good reliability out of them, sure.
Hardware? Yes
Network misconfiguration? Welll…
The cloud is just someone else’s computer. And that computer is busy printing AI videos of the President pooping out of a fighter jet, so now your files are inaccessible
Can you imagine this sentence 1 year ago much less 5 years ago?
“Oh, the deep dream stuff? Yeah, those look so trippy. What do you mean poop though? Usually it’s just dogs.”
The President of course being a convicted felon and rapist, Donald J Trump.
That’s convicted felon, rapist and pedophile, Donald J Trump, to you, mr. Twopi.
One year ago? Easily.
Five years ago? Depends on whether I was visiting 4chan at the moment.
6 months ago, I would be surprised to hear this was done by the president’s administration.
If you properly divide your instances between providers and regions and use load balancing which uses a corum of 3 availability model then it can be zero downtime pretty fairly guaranteed.
People be cheap and easy tho, so 🤷♂️
Yup. And I think I’ll add:
What do you mean we’ve blown our yearly budget in the first month.
Screw the compute budget, the tripled team size without shipping any more features is a bigger problem here.
Dividing between providers is not what people would be doing if the resilience of cloud services were as is being memed about.
Doing so is phenomenally expensive.
Doing so is phenomenally expensive.
It’s demonstrably little more expensive than running more instances on the same provider. I only say -little- because there is a marginal administrative overhead.
Only if you engineered your stack using vendor neutral tools, which is not what each cloud provider encourages you to do.
Then the adminstrative overhead of multi-cloud gets phenomenally painful.
This is why OpenTofu exists.
Yeah, Terraform or it’s FOSS fork would be ideal, but many of these infrastructures are setup by devs, using the “immediately in front of them” tools that each cloud presents. Decoupling everything back to neutral is the same nightmare as migrating any stack to any other stack.
Definitely. I go through that same nightmare every time I have to onboard some new acquisition whose devops was the startup cfo’s nephew.
Infrastructure is there to be used by apps/services. It doesn’t matter how it’s created if infrastructure across providers does not provide same API. You can’t use GCP storage SDK to call AWS s3. Even if API would be same, nothing guarantees consistent behavior. Just like JPA provides API but implementations and DBs behavior are inconsistent
You can use the S3 API to interop with basically every major provider. For most core components there are either interop APIs or libraries that translate into provider-native APIs.
It’s 100% doable to build a provider-agnostic stack from the iac all the way up to the application itself.
It’s phenomenally expensive from a practical standpoint, it takes an immense amount of engineering and devops effort to make this work for non trivial production applications.
It’s egregiously expensive from an engineering standpoint. And most definitely more expensive from a cloud bill standpoint as well.
We’re doing this right now with a non trivial production application built for this, and it’s incredibly difficult to do right. It affects EVERYTHING, from the ground up. The level of standardization and governance that goes into just making things stable across many teams takes an entire team to make possible.
In my experience using containers has removed requirements for additional engineering cost to deploy between providers because a container is the same wherever it’s running, and all the providers will offer container hosting, and most offer cluster private networking.
Deployment is simplified using something like octopus which can deploy to many destinations in a blue-green fashion with easy rollback.
Yes, containers make your application logic work.
That’s the lowest hanging fruit on the tree.
Let’s talk about persistence logic, fail forwards, data synchronization, and write queues next.
Let’s also talk about cloud provider network egress costs.
Let’s also talk about specific service dependencies that may not be replicatable across clouds, or even regions.
Oh, also provider specific deployment nuances, I AM differences, networking differences…etc
Containers are nice, but don’t really cover things like firewalls, network configuration, identity management, and a whole host of other things, the configuration of which varies between providers.
I mean, technically, you could containerize all the elements you need. Firewall, load balancers, identity management, etc. but at that point, you are creating your companies own version of the cloud services that are generally one of the big draws to the cloud already since you aren’t directly developing and maintaining those systems anymore. Once you have made “aws lite” in container form, you can then deploy that directly to the compute instances on any cloud provider. But now you need to maintain everything like you were running on prem (i.e. more developers and network engineers again) all while paying a pretty penny to multiple cloud providers and now that your infrastructure containers need to run 24/7 instead of only having your compute resources being ran on demand your costs will skyrocket so at that point why not just move back to on prem hosting.
The administrative overhead and the overhead of engineering everything to with multiple vendors is what is massive
“But we have our load balacing with 3 different AWS buckets!!!”
Also requires AWS to do the same thing which they sometimes don’t …