In the video, some debris seemed to fly away from the explosion in a wavy path (top left). I thought things only moved like that in video games. What causes that kind of movement?
The parent reads more like "it works in practice but does it work in theory?" The innovations that have come out of the DuckDB team seem to always focus on "in practice" instead of focusing on how things are supposed to (or are expected to) be done.
Not understanding Greek, I ran that through Google Translate (Greek -> English) just to see what it might say.
> AND THIS … THE MAJOR … BUT THE BOTH … ARE MEANING … PEUSIS PAN GAR SEEMS.. OBVIOUS.. SPECTRA … OF SONGS SO … MAJOR … AND THEREFORE … NOT EVEN EARS .. NO LANGUAGE
What form of Greek would that be? (I don't know much more than "ancient Greek" vs "modern Greek".)
I think the Greek doesn't read «ΠΕΥΣΗ ΠΑΝ ΓΑΡ» but «ΠΕΥΣΗΙ ΠΑΝ ΓΑΡ» with «Ι», and «ΠΕΥΣΗΙ» = «πευσῃ» could be the 2nd sg. of the future of «πυνθάνομαι» ‹learn›. «ΠΑΝ» would be ‹all, every(thing)›, «ΓΑΡ» ‹namely, because (postponed)›. ‹… you will learn, everything namely …›? I don't know. – The «ΦΑΙΝΕΤΑΙ» ‹seems, appears› is on the next line, after some missing words.
Trivia: Claude Shannon proposed the idea of predicting the next token (letter) using statistics/probabilities in the training data corpus in 1950:
"Prediction and Entropy of Printed English"
https://languagelog.ldc.upenn.edu/myl/Shannon1950.pdf
It goes back a bit further than that. His 1948 “Mathematical theory of communication” [1] already has (what we would now call) a Markov chain language model, page 7 onwards. AFAIK, this was based on his classified WWII work so it was probably a few years older than that
I was just reading Norbert Wiener's "The Human Use of Human Beings" (1950) and this quote gave me a good chuckle:
"One may get a remarkable semblance of a language like English by taking a sequence of words, or pairs of words, or triads of words, according to the statistical frequency with which they occur in the language, and the gibberish thus obtained will have a remarkably persuasive similarity to good English."
A letter is not a token, is it? Redundancy could hit 75% in long sentences, but Shannon was not predicting tokens or words, he was predicting letters (characters).
for me the issue is that DuckLake's feature of flushing inlined data to parquet is still in alpha. one of the main issues with parquet is when writing small batches you end up with a lot of parquet files that are inefficient to work with using duckdb. to solve this ducklake inlines these small writes to the dbms you choose (postgres) but for a while it couldn't write them back to parquet. last I had checked this feature didn't yet exist, and now it seems to be in alpha which is nice to see, but I'd like some better support before I consider switching some personal data projects over. https://ducklake.select/docs/stable/duckdb/advanced_features...
Data inlining is also currently limited to only the DuckDB catalog (ie it doesn't work with Postgres cataglogs)[0]. It's improving very quickly though and I'm sure this will be expanded soon.
DuckLake format has an unresolved built-in chicken and egg conflict: it requires SQL database to represent its catalog. But this is what some people are running away from when they choose Parquet format in the first place. Parquet = easy, SQL = hard, adding SQL to Parquet makes the resulting format hard. I would expect a catalog to be in Parquet format as well, then it becomes something self-bootstrapping and usable.
DuckLake is more comparable to Iceberg and Delta than to raw parquet files. Iceberg requires a catalog layer too, a file system based one at its simplest. For DuckLake any RDBMS will do, including fs-based ones like DuckDB and SQLite. The difference is that DuckLake will use that database with all its ACID goodness for all metadata operations and there is no need to implement transactional semantics over a REST or object storage API.
It is not a chicken and egg problem, it is just a requirement to have an RDBMS available for systems like DuckLake and Hive to store their catalogs in. Metadata is relatively small and needs to provide ACID r/w => great RDBMS use case.
On a Mac "Ü" is typed "option-u shift-u".
reply