I'm curious about what "substantially better than RDS" means. RDS has been good enough for me for quite a while. Does it only matter once you get to a certain scale?
Craig here from Crunchy the company he's referring to. Not sure what he has in mind, but having built a lot of Heroku Postgres in the early days I definitely have thoughts on what can make a database great. There is a big gap between most developers and what you need to know to efficiently run Postgres. Without tipping too much of our hand, we're focused deeply on building an amazing developer experience for Postgres. Some things we're thinking about are how we can actively detect N+1 queries (common in almost every ORM, Rails, Django, etc) and notify you about them. We already have some big differences like shipping with connection pooling built-in so you can easily scale to 10,000s of connections, really any production Postgres setup should be running with pgbouncer, where as on a lot of providers it's either not an option or you're left to your own devices.
Good enough may be absolutely fine for a lot of people, but no lock-in to a single cloud, better developer tooling, proactive alerts and recommendations, quality support all feel like an opportunity to be better.
Kurt may have entirely other things in mind, and would be all ears if there is low hanging fruit in terms of feature or experience we can do to make Postgres even better for folks.
Some examples of the things I've missed around developer experience for a database, that Craig and the team made possible at Heroku Postgres, include:
- fork: ever had one of those "why does this bug only exist in production?" problems? It was so trivial to fork the DB and run your tests/hypothesis/whatever without the risk of actually impacting production. Same thing for _really_ testing a migration script or load test.
- follow: a similarly easy approach for getting a read replica which is super useful for generating reporting.
- dataclips: "hey, can you tell me X?" sure, and here's a URL to the results that you can refresh if you need an updated number in the future. So great for adhoc queries.
All of these are obviously doable with RDS and/or other solutions too. But the time taken to do any of the above was often measured in seconds, at most minutes. It's difficult to communicate just how impactful those kind of improvements are to your workflow. It's like it subconsciously gives you permission to tackle whole new problems, build better solutions, get answers to questions you never thought to ask before. Because the barrier to entry is so low you just do these things. You don't sit around wondering if you could.
A great developer experience around a database (one that goes beyond setup and basic ops) is a severely under appreciated thing IMO.
> - fork: ever had one of those "why does this bug only exist in production?" problems? It was so trivial to fork the DB and run your tests/hypothesis/whatever without the risk of actually impacting production. Same thing for _really_ testing a migration script or load test.
This sounds great! How does it work though? Is it using some special postgres feature or btrfs snapshots or something else completely?
Craig (the poster I jumped in to reply to) would know the specifics better than I ever did. My recollection is:
- restore from the latest snapshot (there was one whether you’d configured a custom backup schedule or not)
- replay the write ahead log over the top to catch the restore up to the point in time you asked for/when you ran the command. At least some part of this process leveraged WAL-E, which was a tool largely developed by Heroku employees and open sourced.
This was a decade or more ago though. The state of the art of postgres has moved on and I assume the team would tackle it differently if they were doing it today.
It's leveraging pretty native Postgres tooling that restores the base backup from within Postgres, then replays the WAL to the exact point and time you specify. With snapshots and other mechanisms you may get a database "up" sooner, but we've seen when we follow that approach it's so long for the PG cache to warm up that you effectively still have a useless database even though it's "up". Further Postgres itself depending on how you do it will have to go through crash recovery, which I've seen cases on some providers taking over 10 hours.
Doing the native approach in Postgres isn't perfect, but we've focused on getting the developer experience for it down so you can use your database and it "just work" and if something goes wrong you understand how to rollback seamlessly.
RDS runs pretty well! It's just irritating to use.
The good DBaaS give me a lot more power. This is true for Heroku PG, PlanetScale, Supabase, and Crunchy Data. Some of them let me fork a DB to run a PR against, some give me app level features that save me code, etc.
Most modern hosted DBs also let you run your own replicas.
I'm not really complaining about how well RDS works when your app is connected to it and it doesn't failover/go down for maintenance/etc. It works fine as a DB backend. That's just a baseline I don't think is very valuable anymore.
we’re using aiven.io and quite happy, although hard for me to compare. you can port across clouds, which is reassuring if we need to switch. Otherwise their support was helpful debugging a couple of db issues (in our own code). Wonder how they compare in this matrix if anyone knows?
aiven.io is quite good. They went broad instead of deep, so they're not as good at Postgres as the Postgres specific companies. But they're probably better Postgres than a PaaS can build by themselves.
that’s useful to know that others do postges better. We use their redis as well, so in some ways, breadth is useful for them and for us.
at the time we picked aiven, we couldn’t find any redis-specialized hosting with instances in GCP europe if I recall. so their breadth also plays in terms of locating near the customer servers (important for latency)
You can easily use pg_dump to do a "vanilla" backup to s3. Its a managed db service but if you wanted to run your own you can extract your data and move to a new db. The lock is not "complete" you are acting like you can't even extract your data.
"Only export your data with pg_dump" is one of those misfeatures that makes RDS mediocre. They don't really expose much of the power of the underlying DB.
you cannot ssh to RDS machine. so u need to get another EC2 machine and pg_dump over the network. the connection breaks - yes has happened to us multiple times.
RDS makes it very inconvenient to do anything other than use their managed services.
because RDS backup data storage is VERY EXPENSIVE even compared to s3. this is very deliberate.