I guess I should also point out that I’ve used AWS at extremely large scale in t...

cyberax · 2026-05-10T19:05:20 1778439920

IAM is NOT from any lineage. It has grown organically and is complicated, just as any other policy language. AWS even uses an automatic proof assistant to verify IAM policies.

However, the secret to IAM in AWS is to NOT use IAM. Just create separate AWS accounts for separate services and only share whatever resources are needed. Then you can have dead simple IAM policies because you won't need to do granular permissions ("AWS role X can access database Y").

dotwaffle · 2026-05-11T10:11:41 1778494301

> Just create separate AWS accounts for separate services

My understanding is that different AWS accounts have different mappings of availability zones, so it's very easy to suddenly find yourself with an unexpected bandwidth bill due to all the cross-az traffic.

I've been irritated at AWS (and the other large cloud providers) that they charge $0.01/GB for cross-az traffic. That's $3.24/Mbps -- about the same I was paying for internet transit (as in: from London to anywhere in the world) 20 years ago, and this is just between two datacenters in the same city controlled by the same organisation, markup must be 10,000x or more considering these places are cross-connected with massive bundles of fiber!

cyberax · 2026-05-11T16:30:17 1778517017

> My understanding is that different AWS accounts have different mappings of availability zones, so it's very easy to suddenly find yourself with an unexpected bandwidth bill due to all the cross-az traffic.

As far as I remember, accounts within the same organization will have the same mapping. You also can use stable zone names these days, instead of the regular mappings.

And yeah, egress traffic pricing is freaking insane at this point. It's the biggest reason to NOT use AWS.

dotwaffle · 2026-05-11T17:10:11 1778519411

Insanely high S3 storage charges too. $23/TB/month? Even with the insane HDD pricing that we see today, that's paying off a drive in 1 month (at retail) that will last for 50-100 months. Sure, there's probably some encoding overhead, but it's still mad.

cyberax · 2026-05-11T18:09:52 1778522992

S3 is pretty competitive if you want similarly-performing storage with consistent millisecond-level latency, high scalability, and at least 3x redundancy. Try looking at how much it's going to cost you in enterprise SSDs :)

dotwaffle · 2026-05-11T20:06:19 1778529979

Is it 3x redundancy forever? I always just kinda assumed it was RS encoded after a while, so only 30-50% larger than a single copy. Plus, almost all object storage is written to / read from hard disks, not to SSDs. Unless they're in a caching layer that is.

I know Azure has done a bunch of work around Pyramid Codes (essentially a locally repairable EC/RS variant), and Google obviously have the Colossus infrastructure that allows variable encodings, I'd be surprised if AWS is still triple-replicated everywhere.

cyberax · 2026-05-11T20:44:59 1778532299

Yes, and S3 is multiply redundant and is designed to survive a total AZ failure. So your data is physically replicated into at least 2 different AZs and might be multiply-redundant within them. They also provide a crazy SLA for data integrity, meaning that data must never be lost.

S3 also has a reduced redundancy tier and infrequent access tiers that are quite a bit cheaper.

It _is_ expensive, but once you crunch all the numbers, it's actually not unreasonable. I'd argue that using the real S3 is overkill for most scenarios that don't need infinite scalability.

dotwaffle · 2026-05-11T21:14:42 1778534082

GCS / Azure can survive a total AZ failure too, think of 3 x Replicas as RAID-1, whereas Erasure Coding is more like RAID-6. Only it's actually more resilient:

Let's say you have 10 data blocks, and you have 4 parity blocks. You can now lose 4 servers containing a block and still be able to repair the data, whereas in 3 x Replica you can only lose 2, and have to store everything 300% of size, instead of only 140%.

And yes, it is unreasonable how much they charge for both storage and inter-az bandwidth.

cyberax · 2026-05-11T23:56:53 1778543813

The problem with erasure coding is that if a disk fails, you suddenly need to read 3-5x more data to reconstruct the missing data from parity blocks. This is especially problematic if your replicas are split across zones. The inter-zonal bandwidth is large, but not infinite.

So I'm pretty sure that you need to have at least 2 full copies in different AZs, and then likely at least some additional redundancy within a single AZ (in the form of erasure codes or a full mirror).

So that's at least 3-4x the amount of data. 1Tb of NVMe SSD capacity is around $200 and with 3x redundancy that's $600, or about 2 years of AWS S3 storage. As I said, it's expensive but not unreasonable.

dotwaffle · 2026-05-12T12:04:58 1778587498

That's what Locally Recoverable Codes (including Pyramid Codes) are designed to address: you can repair a missing block using only a subset of the blocks, which you can make sure are placed in just one zone, eliminating the cross-az bandwidth requirements. Sure, if you lose multiple blocks, you are going to end up needing that extra bandwidth, but the chances of having two blocks offline at once is very low -- if it's just for a failure rather than extended maintenance, then you have probably already repaired the first block by the time the second fails.

In fact, the RS configuration is often in the >50 data blocks and >10 parity blocks range (albeit with an LRC/nested RS config) for object stores because it's more important to have that recoverability than repair efficiency. While one large provider I worked at did have a system whereby they did effectively have two copies of the RS-encoded data (so that 130% turned into 260%) across two AZs, they were actively in the process of swapping to the blocks being evenly distributed across the AZs, near-halving the total required disk space.

As I said before, most object storage is not on SSD, it's on hard disk: it's 20% of the price per TB, and most objects are read very infrequently. I can promise you that they're not paying $200 for 1TB of SSD either... I realise prices are higher than sensible in the last 6 months, but it was fairly easy to pick up SSDs for under $50/TB at retail pricing (and hard disks for under $10/TB) only a year ago.