Regarding your question: This idea has been sitting on my todo list [1] for over 3 years now. I don't usually work with large data files in my dayjob, but occasionally working with the odd CSV on macOS frustrated me enough to put it on the list. And now that spec-driven development has matured enough to be actually useful, there's no real excuse not to build this.
I genuinely see myself caring for this long-term. I'm comfortable with the scope, and there seems to be some real interest from the community.
Cell is very much tailored towards what you're looking for. My vision was "Excel but it's (Neo)Vim". Editing files should feel just as smooth as looking at the data. I believe Xan and Cell could actually pipe into each other quite nicely for rendering more complex data.
I'd really appreciate your time to report the bugs you encountered. Looking forward to them.
Thanks for trying it, and the spreadsheet repo is great prior art — I'll dig through it.
Drag-fill. Not yet, but the parts are mostly there. The formula layer already carries abs_col/abs_row through tokenization → AST → eval, so $A2 + B$1 parses correctly today; what's missing is the editing op that copies a formula across a range and shifts only the relative components. Opened #17 for it. The tricky part isn't the rewrite, it's the keybinding — Vim doesn't have an obvious idiom for "drag," so I'm leaning toward a visual-selection + fill-from-anchor key (Y is a candidate) or a :fill command. Open to suggestions if you have a feel for what works in a modal editor. It also needs to land on top of the bulk-undo work in #8/#9 so a fill is one undo step instead of N.
bar() / inline visualizations. Love it, opened #18. The interesting design call is whether BAR returns a CellValue::Visual { … } that the renderer dispatches on (correct under column resizes, but ripples into CSV export, copy/paste, and how SUM treats a visual cell), or whether it just returns a string of block-drawing chars at eval time (trivial to add, but width gets baked in at compute time which is wrong). The first is the right answer; the second is a tempting MVP. SPARKLINE(range) is the natural follow-up once the abstraction settles.
Honestly, the current implementations are pretty naive — they pass the tests and feel snappy on the small sheets I work with, but they'd buckle pretty quickly under real load. Most of what you're asking about is already on the tracker; I opened a batch of issues citing your comment as the prompt.
Recalculation. Right now it's a full recalc on every edit: recalculate collects all formula cells, computes in-degrees across the whole formula set, topo-sorts, and evaluates top to bottom. The dirty flag gets propagated by mark_dirty but isn't actually used to prune work. It's also re-parsing every formula from its raw string on every pass. Two issues cover this: #8 introduces a batch boundary so paste/fill/CSV import trigger one recalc instead of N, and #7 adds criterion benches so we can actually tell whether the parser, the BFS, or the topo sort is the hotspot before optimizing. AST caching on Cell is the obvious next step once #7 confirms parsing dominates.
Dependency tracking. The bigger smell is in extract_deps — a range like SUM(A1:A1000) literally enumerates 1000 cell positions into the dep graph, with a HashSet per cell on each side. Fine at hundreds of cells, a disaster at hundreds of thousands. Range expansion is one of the bench cases in #7; the proper fix (interval-keyed deps so ranges stay first-class instead of fanning out) doesn't have its own issue yet — I should open one, since #7 only measures the problem.
Undo/redo. This is the worst offender right now. UndoEntry only had a single-cell variant until very recently; #12 added MultiCellEdit, but #13 tracks two destructive paths I missed — visual-mode d and p/P paste — that still don't push undo entries at all. #9 is the broader coalescing story (one dd = one undo, CSV import = one undo, etc.), tied to the batch mechanism from #8 so a single transaction produces a single undo entry. sort_by_column is also non-undoable today and belongs in that bucket.
Larger CSVs. Storage is HashMap<CellPos, Cell> — fine sparse but with overhead per cell; for very wide imports a column-oriented or arena layout would pay off. I haven't profiled it though, so this is speculative; the dependency-graph blowup will hurt before raw storage does. #7 includes a 100k-row CSV load case to put numbers on it.
And #10 is the meta-issue to lift all of this out of source comments and into actual architecture docs, which I probably should have done before posting.
So: nothing here scales today, but the architecture splits cleanly enough that none of it needs a rewrite — AST caching, dirty-set recalc, range-aware deps, and grouped undo are the four threads, and most have issues attached.
Range as first-class is the right priority. Pattern that works: keep
ranges as single AST nodes (one dep edge per range, not N), then use
interval trees on the reverse side so a cell change at C5 becomes
"find intervals covering (C, 5)" instead of scanning all formulas.
Pairs well with column-oriented storage if you go there
On the AST caching point, worth caching by structural hash of the
parsed expression, not the source string. Copy paste with relative
references produces different strings but identical AST shape, which
hits a lot in financial-model-style workbooks where parallel columns
share structure
Also worth a look: the "сalculation chain" docs in Microsot's
OOXML SpreadsheetML spec describe how they serialize the dep order
in xlsx files. Different problem (persistence vs runtime) but the
data model is informative for what level of granularity ends up
being practical
Use numerically stable algorithms for SUM and AVERAGE: https://github.com/garritfra/cell/issues/43
Keystrokes doubled in Windows Terminal: https://github.com/garritfra/cell/issues/44
Thanks!