So I'm not super familiar with different databases, but do understand the basics and do know how to work wit data with e.g. pandas, and do think I understand what Duckdb is useful for, but what I'm still completely missing is: how do I get data in Duckdb? I.e. how did you get that data into Duckdb? Or: suppose I have a device producing sensor data, normally I'd connect to some MySQL endpoint somehow and tell it to insert data. How does one do that with Duckdb? Or is the idea rather that you construct your Duckdb first by getting data from somewhere else (like the MySQL db in my example)?
My experience has been that most of the time you don’t tell DuckDB to insert data. One is expected to point DuckDB to an existing data file (parquet, csv, json with this new release, etc.) and either import/copy the data into DuckDB tables, or issue SQL queries directly against the file.
Think of it as a SQL engine for ad-hoc querying larger-than-memory datasets.
You can do both ways but the latter is the more useful one. Duckdb is designed to read the data very fast and to operate on it fast. So you load a csv/json/parquet and then “create table” and Duckdb lays out the data in a way that makes it fast to read.
But you(I) wouldn’t use it like a
standard db where stuff gets constantly written in, rather like a tool to effectively analyze data that’s already somewhere