Pardon the vague question, but KDB is very much institutional knowledge hidden from the outside world. People have built their livelihoods around it and use it as a hammer for all sorts of nails.
It's also extremely expensive and written in a language with origins so obtuse that it's progenitor APL needed a custom keyboard laden with mathematical symbols.
Within my firm, it's very hard to get an outside perspective, the KDB developers are true believers in KDB, but they they obviously don't want to be professionally replaced. So I'm asking the more forward leaning HN.
One nail in my job, is KDB as a data-lake and I'm being driven nuts by it. I write code in Rust that prices options. There's a lot of complex code involved in this, I use a mix of numeric simulations to calculate greeks and somewhat lengthy analytical formulas.
The data that I save to KDB is quite raw, I save the market data and derived volatility surfaces, which are themselves complex-ish models needing some carefully unit-tested code to convert in to implied vols.
Right now my desk has no proper tooling for backtesting that uses our own data. And I'm constantly being asked to do something about it, and I don't know what to do!
I'm 99% sure KDB is the wrong tool for the job, because of three things:
- It's not horizontally scalable. A divide and conquer algo on N<{small_number} cores is pointless.
- I'm scared to do queries that return a lot of data. It's non trivial to get a day's worth of data. The query will just often freeze, it doesn't even buffer. Even if I'm just trying to fetch what should be a logical partition, the wire format is really inefficient and uncompressed. I feel like I need to engineering work for trivial things.
- The main thing is that I need to do complex math to convert my raw data, order-books and vol-surfaces into useful data to backtest.
I have no idea how do do any of this in KDB. My firm is primarily a spot desk, and while I respect my colleagues, their answer is:
> Other firms are really invested in KDB and use KDB for this, just figure it out.
I'm going nuts because I'm under the assumption that these other firms are way larger and have teams of KDB-quants doing the actual research. While we have some quant traders who know a bit of KDB but they work in the spot side with far more simple math.
I keep on advocating for some Parquet style data-store with Spark/Dask/Arrow/Polars running on top of it that can be horizontally scaled and most importantly, with Polars, I can write my backtests in Rust and leverage the libraries I've already written.
I get shot down with "we use KDB here". I just don't know how I can deliver a maintainable solution to my traders with this current infrastructure. Bizarrely, and this is a financial firm, no one in a team of ~100 devs has ever touched Spark style tech other than me here.
What should I do? Are my concerns overblown? Am I misunderstanding the power of KDB?
This feels like a ‘pick your poison’ situation. You’ve been told already you won’t be allowed to dump kdb; it’s probably embedded in your infra in a bunch of ways, and ripping it out is a no-go.
OK, so, you have data in kdb. What you’re doing right now (it sounds like) is using it as literally just a raw data store. That’s the worst way to use it; a lot of work went into making it very fast to run summarization/grouping/sorting/etc all right on the kdb servers. Note that this is very unlike how an Apache project works.
Unfortunately, you wrote a rust library that probably doesn’t really distinguish your kdb storage from, say, JSON files, so you are at a crossroads.
Option 1: Get some good data cloning up, clone data over to your preferred generalized data lake tech, run rust against it.
Option 2: Go through your rust code with a fine tooth comb and figure where exactly it’s doing things that cannot be done semantically in q/k. Start slimming down your Rust lib, or more exactly, rework what queries its sending, and what state of data it expects
Option 3: dump your rust library and rewrite it in q or k.
Of these, I would be willing to bet that for an ‘ideal’ developer, meaning a 160+ IQ dev skilled in Rust, vs a 160+ IQ dev skilled in kdb, vs a 160+IQ dev skilled in say Java + Spark, Option 3 is going to be by far the least resource intensive in terms of deployed hardware, and the fastest / lowest latency.
That said, given where you’re at, a principled Rustacean who’s looking at coming to grips with kdb realtime, I think I’d recommend you think hard about Option 2. By the end of Option 2, you will probably be like “Yeah, this could be all k, or nearly all,” but you’re likely going to have some learning to do.
Think of it this way, when you’re done, you’ll be on the other side of the cabal, and can double your base rate for your next gig. :)