Rust data structures with circular references

bilkow · on Nov 13, 2021

I think there is a slight mistake in the article:

> So how can a reference in a child assume the same lifetime? It can't.

Well actually, it can, because the owned Struct itself (Node) can outlive any borrow. You won't be able to mutate the parent though, as you can't mutate any value that has an active immutable borrow. That's the problem: you can't add the child node as adding it would mutate the parent and the there's an active immutable borrow (in the child node).

This reddit comment[0] has one possible solution[1] to this problem: not saving all of those parent references in the node itself, only adding when you're retrieving it. It also seems a lot better since there's less data in-tree to change if the structure itself changes.

[0] https://old.reddit.com/r/rust/comments/qstlto/rust_data_stru... [1] https://play.rust-lang.org/?version=stable&mode=debug&editio...

eliben · on Nov 14, 2021

Thanks, you're making a good point. I like how you phrased the problem here, will ponder about updating the wording of the post slightly to reflect that.

beltsazar · on Nov 13, 2021

Similar to the second approach, you can have better ergonomics and performance by using a memory arena library like slotmap. A doubly linked list implemented using slotmap: https://github.com/orlp/slotmap/blob/ce6e1e02bb2c2074d8d581e...

orlp · on Nov 13, 2021

Hi, author of slotmap here if anyone has any questions.

Slotmap is essentially the Vec + indices solution, while allowing memory to be reused. It is still completely safe, as it automatically versions each slot, and checks the version contained in the key when indexing.

kubb · on Nov 13, 2021

Does slotmap shrink and reclaim allocated memory when most of the elements it stores get deleted?

orlp · on Nov 13, 2021

SlotMap literally cannot, because it must keep track of versions of the used slots, and the slots include the memory space for the data.

DenseSlotMap sort of can. It still can't reclaim the memory used by empty slots for the same reason (at a cost of 8 bytes per slot), but can reclaim the memory used by the actual elements stored.

So if you store 1 million items at one point in a DenseSlotMap, and then clear them all and then shrink to fit (which is actually missing from the API, but I'll fix that soon) you will waste 7.6 megabytes on the empty slots but nothing on the actual data that used to be stored.

cyber_kinetist · on Nov 13, 2021

Yes, if you use the dense version of it (DenseSlotMap). It requires an additional indirection per lookup, but makes up for the fact that it’s much cache friendlier. From looking at a simple benchmark (https://www.reddit.com/r/rust/comments/gfo1uw/benchmarking_s...) it seems that insertion/deletion gets a hit but accesses are faster (although YMMV on real world use cases)

orlp · on Nov 13, 2021

This reddit post is such a shame, and I wish people would stop referring to it. The benchmarking code is not apples to apples at all in many places.

E.g. to benchmark removals they used `container.remove(index)` for data structures that support indices but for slotmap they used `slotmap.remove(keys[index])` which unfairly adds another layer of indirection.

mattgreenrocks · on Nov 13, 2021

If there’s a lot of churn on the tree, does fragmentation rear its head? Or does the version in the key come into play here?

orlp · on Nov 13, 2021

There is no fragmentation whatsoever in the traditional sense (unusable gaps left due to size mismatches), because a slotmap only stores a single type of value. Thus every slot is interchangeable and memory can always be reused.

For iteration however, there can be holes that need to be ignored, if the current number of elements in the slotmap is significantly lower than the the maximum capacity. If iteration needs to be very fast (for e.g. game engines) I do have a solution for that, which is the DenseSlotMap. It uses one extra layer of indirection for random access, but stores the actual data values contiguously in a vector, thus iteration is always fast.

mattgreenrocks · on Nov 13, 2021

Yes, holes is a better term. Was wondering if double pointers (as they were) were used, looks like it was right. Thank you for the reply!

Dowwie · on Nov 13, 2021

Slotmap seems to have a marketing problem. Why isn't it being promoted?

orlp · on Nov 13, 2021

I don't know, only have some theories.

1. The name isn't particularly catchy or descriptive. It is the correct name for the data structure, but not too many people know the data structure.

2. People don't even know what they're missing. It's not a very Google-able problem to begin with. Slotmap provides an interesting solution to (circular) ownership and safe allocator / weak pointer design problems, but people don't recognize that they're having them or that slotmap could help.

As an example of this, the doubly linked list example (https://github.com/orlp/slotmap/blob/master/examples/doubly_...) can safely remove nodes from the linked list given their handle, in O(1), even from the middle, completely safely and correctly, even in the presence of double deletions or ABA memory re-use. You can't replicate this with just pointers, without introducing heavy refcounting solutions.

eliben · on Nov 13, 2021

Thanks for the pointer!

My goal in the post was to avoid "just use this crate" approaches, because clearly there are dozens of crates that could help with this... even crates that implement BSTs :-) The goal was to look at the underlying implementation of such crates, and as the other comment pointed out it seemed like slotmap and others used approaches that mapped to one of the three I outlined in the post.

mattgreenrocks · on Nov 14, 2021

> My goal in the post was to avoid "just use this crate"

I very much appreciate this sentiment and it inspires me to actually think about blogging again. Thank you.

Rusky · on Nov 13, 2021

Another interesting variant is a cross between 1 and 2- centralize ownership of the nodes, but continue to use references rather than handles.

You can't do this with a Vec<Node> (unless you can create all the nodes up front before initializing them) because resizing the Vec would invalidate references into it.

But if you use a different "allocator" that leaves its contents in-place, you can make all the links into references with a single lifetime.

The remaining tricky part is, still, deallocation- you can't fully free a node in a structure like this because they all share a lifetime. But you can put it on a free list for future node allocations, with similar risks to the handle approach.

loeg · on Nov 13, 2021

> But if you use a different "allocator" that leaves its contents in-place, you can make all the links into references with a single lifetime.

E.g., something like Bumpalo?

shepmaster · on Nov 13, 2021

I lean towards an arena.

drran · on Nov 13, 2021

Tree and Node must own subnodes, but node need just to point to parent, it cannot own parent.

  #[derive(Debug)]
  struct Node {
    data: i32,

    left: Option<Box<Node>>,
    right: Option<Box<Node>>,
    parent: Option<*mut Node>,
  }
  
  impl Node {
    pub fn new_root(data: i32) -> Self {
        Node { data, left: None, right: None, parent: None, }
    }

    pub fn new_leaf(data: i32, parent: *mut Node) -> Self {
        Node { data, left: None, right: None, parent: Some(parent), }
    }
  }

  #[derive(Debug)]
  struct Tree {
    root: Option<Node>,
  }

  impl Tree {
    pub fn new() -> Self {
        Tree { root: None }
    }

    pub fn insert(&mut self, data: i32) -> bool {
        match self.root {
            Some(ref mut node) => Tree::insert_at(node, data),
            None => {
                self.root = Some(Node::new_root(data));
                true
            }
        }
    }

    fn insert_at(current_node: &mut Node, data: i32) -> bool {
        match (
            current_node.left.as_deref_mut(),
            current_node.right.as_deref_mut(),
        ) {
            _ if data == current_node.data => false,

            (Some(left_node), _) if data < current_node.data => Tree::insert_at(left_node, data),

            (None, _) if data < current_node.data => {
                let new_node = Node::new_leaf(data, current_node);
                current_node.left = Some(Box::new(new_node));
                true
            }

            (_, Some(right_node)) => Tree::insert_at(right_node, data),

            (_, None) => {
                let new_node = Node::new_leaf(data, current_node);
                current_node.right = Some(Box::new(new_node));
                true
            }
        }
    }

  }

  pub fn main() {
    let mut tree = Tree::new();
    let data = [5, 3, 65, 123, 6, 11, 3, 1, 5, 42];

    for i in data {
        tree.insert(i);
    }

    println!("{:?}", tree);
  }

eliben · on Nov 13, 2021

I had this approach in a draft as well! In fact, I tried this before making all links `*mut Node`. But coding the removal methods turned out to be really hard - I couldn't fight through the borrow checker. So I figured if I already have raw pointers and unsafe blocks, might as well go all the way :)

drran · on Nov 13, 2021

I tried to implement removing of elements from the tree, just as an exercise, and I see no problems with borrow checker at all. I use Rust since version 1.0, so borrow checker is my friend.

Rebalancing of the left branch is a bit complicated, so I implemented just detach() method, which takes out the element with it subbranches from the tree. From there, you can implement your favorite deletion method. For production, I will just mark a node as deleted and then compact the tree later, to have predictable timings.

scotty79 · on Nov 13, 2021

Could you keep parent reference in Option<Rc<Node>> ?

eliben · on Nov 13, 2021

`Rc` implies ownership, albeit shared one. So if a heap location is held by `Rc`, all reference to it should be either `Rc` or `Weak`. You can't have an `Rc` and a `Box` for the same data, IIUC. A `Box` is unaware of `Rc` and will drop its data when dropped itself.

Syzygies · on Nov 13, 2021

In Haskell and other purely functional programming languages, one encounters exactly the same challenge. These communities have worked out sophisticated, well-documented solutions. Facing any problem, one should always ask "Who else has already encountered this problem?" To ignore their work is to reinvent the wheel. As a Haskell programmer exploring Rust, I was thrilled to recognize this parallel.

One gains persistent data structures. In many problems, "reuse" is far more efficient than carving out each new instance from scratch.

I've never understood the evangelists that claim one can't learn functional programming after any exposure to conventional programming, but perhaps the prejudices run deeper than I've recognized. How does one characterize a rejection out-of-hand of persistent data structures?

"One-use plastic bag syndrome"

kubb · on Nov 13, 2021

Is it possible that you might be misunderstanding the "challenge" being faced here?

In Rust the "problem" is that the safe subset of the language enforces ownership rules, which make directly representing circular pointer structures impossible. Of course, you can use "unsafe" and it becomes as simple as in C. This tradeoff is well known since the inception of Rust.

In Haskell, the "problem" is that all functions need to be pure, so if you need to mutate a data structure, you actually have to create a new instance, sharing the bulk of the structure of the previous instance to be efficient.

This is not the same thing, but you seem to think it is. In Rust, persistent data structures still use reference counting to manage lifetime of the objects inside the structure (see rpds), which is also what one of the solutions listed here does.

juxtapose · on Nov 15, 2021

> This is not the same thing, but you seem to think it is.

They are only different superficially.

The deeper problem here is they both assume that "a new, valid value can only reference old, existing values". (For the sake of simplicity, let's pretend Haskell does not have laziness for a moment.) This is reasonable in almost any language because it enables easy inductive reasoning. Rust forbids circular references exactly for this reason, the whole borrow check builds on this principle: a new pointer becomes invalid when the old pointer becomes invalid. You cannot create them in Haskell either due to this very reason, if there is no laziness.

Techniques in Haskell, like tying the knot, work just because laziness makes it possible to reference values in the future. Unfortunately, it also creates possibilities for bottoms, which renders it unusable for a sound proof system -- Rust cannot do the same thing because borrow check should be sound and terminating.

It's easy in C++ because C++ does not guarantee validity of objects. You know, NULL is _always_ an invalid pointer. Rust and Haskell both guarantee that a constructed object is always valid. (ofc, only in most cases where the escape hatches are not used.)

nickkell · on Nov 13, 2021

I thought maybe Haskell also prohibited circular references, but looking it up they do something called “tying the knot”, which is enabled by the language’s laziness

solomatov · on Nov 13, 2021

You can't tie the knot in Rust. Tying the knot creates a self referential structure, which creates an ownership loop.

helen___keller · on Nov 13, 2021

I come from a functional background (significant work in SML and Ocaml) and went into learning Rust excited to implement all my favorite persistent data structures, only to find myself foiled by borrow and lifetime madness similar to what’s described in the naive implementation in the OP. Where I could define a simple tree in one line of SML, I struggled to define one at all in Rust!

(I think my problem actually just came down to the lifetime on the object holding the tree itself and the way I was instantiating nodes; but when those lifetime compiler warnings popped up my eyes would glaze over and I knew it was time to put rust down for the day)

I imagine Haskell programmers have a step up because the Monad system is significantly less forgiving than SML’s look-the-other-way approach to side effects, and in this sense Haskell and Rust are more similar than many other functional languages.

throwvirtever · on Nov 13, 2021

Looking for precedents is always good advice, but the essential issue is that even if you don't re-invent the wheel, the "wheel" that works in functional programming territory, for certain tasks (e.g. the binary tree with parent pointers), is more complicated, more verbose, and comes with downsides.

People ignore existing work (at least at first), and assume they can make it simpler themselves, because they're surprised it _needs_ to be sophisticated.

EDIT: That said, the article does a great job of explaining the problems and the potential solutions.

pca006132 · on Nov 13, 2021

What about using ghost cell? https://github.com/matthieu-m/ghost-cell

thurn · on Nov 13, 2021

Has any ghost cell stuff gotten closer to production-ready in the last couple months? It seems like it's still pretty early on in its development lifecycle.

pca006132 · on Nov 13, 2021

I haven't heard of it getting closer to production-ready, but I hope articles can cover this so more people can look into it.

PoignardAzur · on Nov 13, 2021

I remember seeing a mature-ish project that was architectured around them. I think it was JS-related.

ModernMech · on Nov 13, 2021

I hadn’t heard of this before. Is this a drop in replacement for RefCell that is compile time checked?

phendrenad2 · on Nov 13, 2021

Very interesting. Rust is certainly an interesting language and this angle made me understand more of it.

implfuture · on Nov 13, 2021

Nice post! Might be worth mentioning that some data structure implementations already exist in std or as a crate, i.e. std::collections::BTreeSet and petgraph. While it’s good to understand how Rc/Weak/raw pointers work, I always check if there is a battle-tested version of what I need before reaching for them.

conradludgate · on Nov 14, 2021

As a slight nit-pick. For the third option, instead of having raw pointers in the struct, I'd instead have `Option<NonNull<Node>>`. It compiles to the same thing thanks to niche optimisations but it has much better safety and semantics

aldanor · on Nov 13, 2021

By the title of the post, I was expecting to see an example of where to use Pin<> and was surprised to not even see it mentioned...

natded · on Nov 13, 2021

Rust really doesn't like pointers; and as such modeling graph data structures besides something like a DAG is a huge pain in the ass: there barely is any idiomatic way to make a mutable self-referential, undirected graph in std-only.

baq · on Nov 13, 2021

yeah, that's by design. how do you check borrows in a mutable self-referential data structure? you don't by definition.

natded · on Nov 13, 2021

I know. I've just been going through an algorithm book and trying to implement them and the graphs were easily hardest to implement in Rust despite the pseudocode being simple as peanuts (and trivial in a GC language).