More

erikcw · 2026-06-01T05:02:54 1780290174

Thanks for releasing this! How do you handle DDL queries? Are table changes synchronized to the Iceberg table automatically?

Also, I recently started looking into olake[0] to serve the same purpose. What would you say differentiates Streambed?

[0] https://github.com/datazip-inc/olake

vira28 · 2026-06-01T07:22:03 1780298523

Thanks for the kind words!

Short answer: yes, column-level schema changes sync to Iceberg automatically[0].

Logical replication (pgoutput in v1) doesn't actually stream DDL statements. Instead, Postgres emits a fresh Relation message describing the table's current column layout right before the next change to that table. So we diff that against the last layout we knew and infer what changed.

From there we evolve the Iceberg schema in place: flush any buffered rows under the old schema first, then write a new metadata version with the change. What's handled today:

  - ADD COLUMN — new field ID allocated; the column's Postgres DEFAULT is carried into Iceberg's initial-default/write-default, so existing rows read back correctly
  - DROP COLUMN — removed from the current schema, existing data files untouched
  - Type widening — int4→int8, float4→float8 (the changes Iceberg considers compatible)
  - REPLICA IDENTITY changes

[0] https://github.com/viggy28/streambed/pull/21

erikcw · 2026-05-10T19:54:22 1778442862

How does Lakebase compare to Ducklake[0]?

[0] https://ducklake.select/

jeremyjh · 2026-05-10T20:36:36 1778445396

Lakebase is for transactional use cases - this is more comparable to AWS Aurora.

nikita · 2026-05-10T20:16:23 1778444183

Lakebase is OLTP.

erikcw · 2026-04-22T18:17:45 1776881865

I’ve had very good experience with it last year. I used it at large scale with data that had been in iceberg previously and it worked flawlessly. It’s only improved since. Highly recommend.

erikcw · 2026-03-24T17:37:40 1774373860

Simon Willison wrote a good post about Dan Woods’ work on “Autoresearching Apple's "LLM in a Flash" to run Qwen 397B locally”.

[0] https://simonwillison.net/2026/Mar/18/llm-in-a-flash/

erikcw · 2026-01-17T02:49:05 1768618145

I use them both depending on which feels more natural for the task, often within the same project. The interop is easy and very high performance thanks to Apache Arrow: `df = duckdb.sql(sql).pl()` and `result = duckdb.sql("SELECT * FROM df")`.

erikcw · 2026-01-11T00:01:41 1768089701

Link to the lectures?

maxbond · 2026-01-11T00:14:59 1768090499

Presumably it's this course:

https://youtube.com/@jhupoker4850

https://hopkinspokercourse.com

erikcw · 2025-11-14T15:54:35 1763135675

Data inlining is also currently limited to only the DuckDB catalog (ie it doesn't work with Postgres cataglogs)[0]. It's improving very quickly though and I'm sure this will be expanded soon.

[0] https://ducklake.select/docs/stable/duckdb/advanced_features...

erikcw · 2025-09-27T22:02:21 1759010541

This looks really useful! Am I correct that there isn’t an S3 compatible API, just the “fetch” API?

Being able to set an S3 client’s endpoint to proxy traffic straight through this would be quite useful.

shikhar · 2025-09-28T01:24:02 1759022642

Yes, currently it has its own /fetch endpoint that then makes S3 GET(s) internally. One potential gotcha depending on how you are using it, an exact byte "Range" header is always required so that the request can be mapped to page-aligned byte range requests on the S3 object. But with that constraint, it is feasible to add an S3 shim.

It is also possible to stop requiring the header, but I think it would complicate the design around coalescing reads – the layer above foyer would have to track concurrent requests to the same object.

erikcw · on Dec 19, 2024

I’ve started using Granian[0] recently with good results.

[0] https://github.com/emmett-framework/granian

erikcw · on Dec 11, 2024

I've used SikuliX[0] in the past for similar purposes. Unfortunately the author hasn't had much time to maintain it recently.

[0] https://github.com/RaiMan/SikuliX1