Short answer: yes, column-level schema changes sync to Iceberg automatically[0].
Logical replication (pgoutput in v1) doesn't actually stream DDL statements. Instead, Postgres emits a fresh Relation message describing the table's current column layout right before the next change to that table. So we diff that against the last layout we knew and infer what changed.
From there we evolve the Iceberg schema in place: flush any buffered rows under the old schema first, then write a new metadata version with the change. What's handled today:
- ADD COLUMN — new field ID allocated; the column's Postgres DEFAULT is carried into Iceberg's initial-default/write-default, so existing rows read back correctly
- DROP COLUMN — removed from the current schema, existing data files untouched
- Type widening — int4→int8, float4→float8 (the changes Iceberg considers compatible)
- REPLICA IDENTITY changes
I’ve had very good experience with it last year. I used it at large scale with data that had been in iceberg previously and it worked flawlessly. It’s only improved since. Highly recommend.
I use them both depending on which feels more natural for the task, often within the same project. The interop is easy and very high performance thanks to Apache Arrow: `df = duckdb.sql(sql).pl()` and `result = duckdb.sql("SELECT * FROM df")`.
Data inlining is also currently limited to only the DuckDB catalog (ie it doesn't work with Postgres cataglogs)[0]. It's improving very quickly though and I'm sure this will be expanded soon.
Yes, currently it has its own /fetch endpoint that then makes S3 GET(s) internally. One potential gotcha depending on how you are using it, an exact byte "Range" header is always required so that the request can be mapped to page-aligned byte range requests on the S3 object. But with that constraint, it is feasible to add an S3 shim.
It is also possible to stop requiring the header, but I think it would complicate the design around coalescing reads – the layer above foyer would have to track concurrent requests to the same object.
Also, I recently started looking into olake[0] to serve the same purpose. What would you say differentiates Streambed?
[0] https://github.com/datazip-inc/olake
reply