Pluggable Storage Committed in Postgres

pella · on April 5, 2019

context:

"PLUGGABLE STORAGE IN POSTGRESQL" (2018 PGCONF.EU ; Andres Freund)

pdf: https://anarazel.de/talks/2018-10-25-pgconfeu-pluggable-stor...

""Why?

● ZHeap – UNDO based storage, address bloat and write amplification problems

● Columnar Storage

● Experiments

""

"PostgreSQL has long prided itself in being extensible. But for a number of years people have wished for the ability to not only introduce new datatypes and functions, but also to add new forms of storing data. The introduction of foreign data wrappers (FDWs) allowed to satisfy some of those use-cases. They however are not fully suitable for native data storage, quite fundamentally they don't allow for index creation, foreign keys, etc.

Over the last years people have on and off worked on making table storage pluggable. It looks like PostgreSQL 12 might finally get builtin support for that, based on work of Haribabu Kommi, myself and others.

This talk will go over the reasons why it is useful to make storage pluggable (my personal reason is to allow the introduction of zheap, a new undo based table storage that is nearly free of table bloat) and how the new APIs work, and what further use-cases the pluggable storage APIs have."

michelpp · on April 5, 2019

Thank for the context, this is great! I'm looking into this now to see if I can use it with my project pggraphblas, i was hoping to make an FDW where each column is backed by a matrix, making essentially a "property graph" that can be queries with straight SQL, but this looks much more interesting and also easier to use.

pella · on April 5, 2019

next step --> "ZHEAP" = a storage engine to provide better control over bloat:

* http://amitkapila16.blogspot.com/2018/03/zheap-storage-engin... ( 5 March 2018 )

* HN: https://qht.co/item?id=16526623

anarazel · on April 5, 2019

FWIW, in hindsight, this never should have been called pluggable storage, but rather pluggable table access methods, or at least pluggable table storage. But it was was in the subject on the first thread this patchset originated in (by Alvaro Herrera), and stuck...

At least the code doesn't call it pluggable storage...

nn3 · on April 5, 2019

Is that a good thing?

Instead of a single postgres this there will be a lot of subtle or very different variants, all needing completely different tuning, like it happened with MySQL. May well fragment the ecosystem significantly.

anarazel · on April 5, 2019

I think it's a significant danger.

But on the other hand our current table storage has some architectural issues that are hard to fix in an upward compatible manner (causing a number of issues around vacuum and write amplification due to hint bits etc). And even if there were an easy-ish incremental upgrade path, it's hard to e.g. move to an UNDO based MVCC implementation without regressing some workload (& introduce bugs) - allowing that development to happen in parallel and the adoption be by choice, that's much more realistic.

Additionally, the best type of table storage is also very workload dependent. E.g. for some analytics (or even just long term storage) it's quite useful to have a columnar store, but for lots of transactional workloads that's not appropriate. (Note that the current tableam interface would allow for some simple columnar store, but that'd still need a good bit of planner and executor smarts to be useful for anything but higher storage density.)

So I think on balance it's probably worth it.

greggyb · on April 5, 2019

From my perspective, columnar storage is an enormous win. One of the best things I've been able to do for clients on MSSQL has been to tell them simply to create a clustered columnstore index on their large tables for analytical workloads.

At one point we had a need to perform a diff between a yesterday-snapshot and a today-snapshot of a billion row table. If I recall correctly, this took 1 or 2 minutes with clustered columnstore vs nearly an hour for rowstore. The order of magnitude is certainly correct, even if my specific numbers are wrong.

If you're curious why we had to do a full diff between two billion-row tables, that is a story for a different time, but there were good reasons and we also removed those good reasons through other design decisions.

danudey · on April 5, 2019

MySQL's situation, as it has been for years, is:

1. Almost everyone uses InnoDB (or whatever the forks call it)

2. Customers who need to use other engines can do so if it makes sense for them

3. Tuning is basically always the same unless you're doing something extremely atypical (e.g. using TokuDB)

There's no fragmentation there in any real sense.

stubish · on April 5, 2019

I doubt it will fragment the ecosystem at all. Plugins such as PostGIS have been widely adopted by people who need it for years, and has fragmented nothing. Others end up integrated into core. If you have specialized storage requirements, you no longer need to fork, and that means less fragmentation of the ecosystem.

convolvatron · on April 5, 2019

given that postgres is free as in beer, i think in practice this just means the definition of less flexible and more polished tiers, up to and including db-as-a-service.

this seems like a nice model - more useful than the tens or hundreds of functionally equivalent linux distributions. you can start out with generic turnkey postgres-lite and unwrap as much as you like.

that said, despite quite a bit of progress in the last few years, postgres-lite could be easier to install and maintain.

dragonwriter · on April 5, 2019

> given that postgres is free as in beer

Postgres is free-as-in-speech.

Free-as-in-beer refers to free-of-charge, proprietary licensed software (e.g., SQL Server Express), not software under a permissive free software license.

_pgmf · on April 5, 2019

Sqlite has a concept called virtual tables, which sit somewhere between "pluggable storage" and postgres foreign-data-wappers. They are fairly easy to implement and can be incrementally improved by providing more sophisticated hints to the planner. I think the folks at tarantool have done some work hooking deeply into sqlite's pager/wal, presumably others (the folks who connected sqlite to lmdb?). And of course Oracle, who ripped out the sqlite btree and replaced it with BerkeleyDB.

I think sqlite4 (defunct) was going to have pluggable storage, but they ended up scrapping it. Presumably over performance concerns, but please correct me if I'm wrong.

Anyway, interested to see how this feature is adopted and to learn more.

jerrysievert · on April 5, 2019

huh. on one hand, this is awesome! something like this should be fantastic for things like citus’s columnar storage - a pluggable storage system can be a huge boon. on the other hand there have been some bugs popping up on pg-bugs like the parallel queries that look a little questionable. hopefully no negative impact for anyone upstream while everything gets hammered out. context: i’m The maintainer of a fairly popular pg extension

editing to note that the bugs popping up have nothing to do with this, just the hope that it doesn’t become ... overwhelming

macdice · on April 5, 2019

Hi Jerry, Yeah, parallelism is hard... but speaking as someone involved in several bug fixes in that area, I can tell you that we put a lot of effort into chasing those bugs down, and even made changes to allow extensions that were doing illegal things in _PG_init() to keep working under parallelism. Bugs are inevitable, and in complicated code it can take someone else's workload to uncover them. IMHO the important question is how well you deal with them, and I think we do a good job at providing work-arounds (primarily by providing GUCs so you can turn new stuff off it it's causing problems), and getting fixes into the tree ASAP. I think the worse recent case was the "DSM handle collision" one, which was hiding in code committed years ago but only discoverable with very particular timing and newer query plans. All currently known problems fixed in that department, but unfortunately those patches missed the February cut-off for 11.2 and will have to wait until May.

jerrysievert · on April 6, 2019

> but speaking as someone involved in several bug fixes in that area, I can tell you that we put a lot of effort into chasing those bugs down,

yup! i've definitely seen the efforts, and really appreciate the work that everyone is doing to solve them - it's great seeing that kind of commitment.

> IMHO the important question is how well you deal with them

you are absolutely correct! and i can't fault how quickly bugs in parallelism have been triaged and fixed. but that wasn't what i was getting at - just that there have been a lot of fairly large changes (i guess you can call it a big increase in velocity?) that still seem to have (or uncovered as you said) a lot more to do - parallelism was just an example of a big change that seems to have some pretty big impacts on stability and planning and adding another big change makes me wary since that has direct impact on code that i maintain. it wasn't an attack, and i hope that it didn't come across that way, just an observation ... the more major changes the more balls in the air (to borrow a juggling metaphor), and pg has a ton of them in the air right now.

> I think we do a good job at providing work-arounds (primarily by providing GUCs so you can turn new stuff off it it's causing problems)

GUC's are helpful, but there are times when direct access to postgresql.conf isn't really feasible ... and setting GUC's at the beginning of every session can become cumbersome/untenable.

also, GUC's aren't always feasible when some major changes have negative impact across whole stacks (the major changes in CTE's and how they impacted planning is one that i can think of off the top of my head).

> illegal things in _PG_init()

just curious, can you point to some of these? i want to do my own extension audits and considering that plv8 does/did a lot of things based on postgres/src/pl/* it would be good to make sure it's not doing any of those as well.

regardless, thanks for the response - it is appreciated.

macdice · on April 6, 2019

>> illegal things in _PG_init()

> just curious, can you point to some of these? i want to do my own extension audits and considering that plv8 does/did a lot of things based on postgres/src/pl/* it would be good to make sure it's not doing any of those as well.

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit...

joshberkus · on April 5, 2019

Wow, congrats PG devs! This has been a dream for Postgres for like 8 year now.

incognito2019 · on April 5, 2019

[flagged]

atombender · on April 5, 2019

You're casting aspersions on not just a single person here, but a team of developers who have been working on this together. Your criticisms are also so utterly lacking in detail that the person on the other end can't possibly defend themselves in any meaningful way. What are they going to do, tell you that they actually have a great track record? Also, sullying the reputation of a developer while hiding behind an alias is pretty low.

Whether the criticisms are true or not is irrelevant, of course.

anarazel · on April 5, 2019

Uh, thanks, I guess? I'm Andres, the committer mentioned.

Perhaps next time speak up before everything's committed?

petergeoghegan · on April 5, 2019

I am also a Postgres committer, and I think that that's not even remotely justified.

I have no idea who you are, but I am certain that you are doing this only because you have an axe to grind. You are a vindictive, petty person. Shame on you.

Gigablah · on April 5, 2019

Add “cowardly” to that list.

pella · on April 5, 2019

https://commitfest.postgresql.org/22/1283/

Authors: "Andres Freund (andresfreund), Álvaro Herrera (alvherre), Alexander Korotkov (smagen), Haribabu Kommi (haribabu)"

Reviewers: "Alexander Korotkov (smagen)"

iand675 · on April 5, 2019

On the one hand, you know that it's not appropriate to post things like this... on the other hand, you post it anyways?