Hacker Timesnew | past | comments | ask | show | jobs | submit | erikcw's commentslogin

Thanks for releasing this! How do you handle DDL queries? Are table changes synchronized to the Iceberg table automatically?

Also, I recently started looking into olake[0] to serve the same purpose. What would you say differentiates Streambed?

[0] https://github.com/datazip-inc/olake


Thanks for the kind words!

Short answer: yes, column-level schema changes sync to Iceberg automatically[0].

Logical replication (pgoutput in v1) doesn't actually stream DDL statements. Instead, Postgres emits a fresh Relation message describing the table's current column layout right before the next change to that table. So we diff that against the last layout we knew and infer what changed.

From there we evolve the Iceberg schema in place: flush any buffered rows under the old schema first, then write a new metadata version with the change. What's handled today:

  - ADD COLUMN — new field ID allocated; the column's Postgres DEFAULT is carried into Iceberg's initial-default/write-default, so existing rows read back correctly
  - DROP COLUMN — removed from the current schema, existing data files untouched
  - Type widening — int4→int8, float4→float8 (the changes Iceberg considers compatible)
  - REPLICA IDENTITY changes
[0] https://github.com/viggy28/streambed/pull/21

How does Lakebase compare to Ducklake[0]?

[0] https://ducklake.select/


Lakebase is for transactional use cases - this is more comparable to AWS Aurora.


Lakebase is OLTP.


I’ve had very good experience with it last year. I used it at large scale with data that had been in iceberg previously and it worked flawlessly. It’s only improved since. Highly recommend.


Simon Willison wrote a good post about Dan Woods’ work on “Autoresearching Apple's "LLM in a Flash" to run Qwen 397B locally”.

[0] https://simonwillison.net/2026/Mar/18/llm-in-a-flash/


I use them both depending on which feels more natural for the task, often within the same project. The interop is easy and very high performance thanks to Apache Arrow: `df = duckdb.sql(sql).pl()` and `result = duckdb.sql("SELECT * FROM df")`.


Link to the lectures?



Data inlining is also currently limited to only the DuckDB catalog (ie it doesn't work with Postgres cataglogs)[0]. It's improving very quickly though and I'm sure this will be expanded soon.

[0] https://ducklake.select/docs/stable/duckdb/advanced_features...


This looks really useful! Am I correct that there isn’t an S3 compatible API, just the “fetch” API?

Being able to set an S3 client’s endpoint to proxy traffic straight through this would be quite useful.


Yes, currently it has its own /fetch endpoint that then makes S3 GET(s) internally. One potential gotcha depending on how you are using it, an exact byte "Range" header is always required so that the request can be mapped to page-aligned byte range requests on the S3 object. But with that constraint, it is feasible to add an S3 shim.

It is also possible to stop requiring the header, but I think it would complicate the design around coalescing reads – the layer above foyer would have to track concurrent requests to the same object.


I’ve started using Granian[0] recently with good results.

[0] https://github.com/emmett-framework/granian


I've used SikuliX[0] in the past for similar purposes. Unfortunately the author hasn't had much time to maintain it recently.

[0] https://github.com/RaiMan/SikuliX1


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: