Learning Rust with Entirely Too Many Linked Lists

Animats · on Oct 2, 2017

Rust needs back pointers as a primitive. A back pointer and a forward pointer are locked in an invariant relationship (A points to B which points back to A). The borrow checker needs to know about that to check back pointers properly. Then you could do trees, doubly-linked lists, and various other graphs safely.

(The two basic constructs that are hard to express safely in Rust are back pointers and partially initialized arrays. There's also some trouble with concurrency primitives. See [1])

[1] https://people.mpi-sws.org/~dreyer/papers/rustbelt/paper.pdf

dbaupp · on Oct 2, 2017

Do you have any thoughts on how it might even begin to be possible to do back pointers safely, in ways that don't already exist in Rust?

I find it very confusing that you often complain about Rust's complexity, and then also complain that it isn't complex enough (i.e. doesn't have the features required statically reason about complex ownership graphs at compile time). It's fair that maybe there's complexity budget spent in places that you think aren't quite right, but I haven't picked up any reasonable plan for modelling back pointers (from anyone), nor many specific pain points that could be solved in your suggested way while maintaining the low-level nature of Rust.

Animats · on Oct 2, 2017

Do you have any thoughts on how it might even begin to be possible to do back pointers safely, in ways that don't already exist in Rust?

First, you need to be able to talk about them. A forward ref/back ref pair has to be identified with some form of declaration at compile time. The forward pointer is an ordinary owning pointer. The back pointer is special. So add a variable attribute "back", to be used like much like "mut", but only on a reference field of a struct. Let's see if that's enough.

The invariants to be enforced are:

- If b.bak points to a, a.fwd must point to b.

- If a.fwd points to b, b.bak must be Option(None) or point to a.

These can be enforced by some simple update checks when a.fwd or b.bak is updated. The normal sequence of events is something like:

    a.fwd = &mut b;
    b.bak = &back a;

or, when breaking a link,

    b.bak = Option(None);
    a.fwd = None; // drop ownership of b, which deallocates it.

You have to clear the back reference before dropping the object, to avoid a dangling pointer situation. The compiler might optimize that out, observing that b is dead at deallocation.

If you try to change b.bak to anything but Option(None) or a, that's an error. If you try to change a.fwd while b.bak is not Option(None), that's an error. This can usually be checked at compile time.

Open questions:

- Can the compiler figure out which field of a is the forward ref? That's the ref that owns b. Or is syntax needed in the declaration of the a struct for that.

- What about arrays of forward refs?

dbaupp · on Oct 2, 2017

You've missed a far more fundamental question: how does this interact with aliasing XOR mutability? That's the core reason why back pointers are hard: they result in complicated relationships between things, in that an &mut pointer to a child may not be independent of its owner. I suspect one would need annotations for different ways to handle this, like single threaded checking of mutability (which already exists: Cell and RefCell) or multi-threaded (Mutex, RWLock), essentially meaning that this would be inserting into the language things things that don't need to be there.

In any case, even switching to the correct Option::None syntax, your example doesn't actually make sense: &mut isn't an owning pointer, so it doesn't make sense to try to construct an owning tree with it. Maybe something like

  a.fwd = Some(b);
  a.fwd.as_mut().unwrap().bak = Some(&back a);

is what you're trying to get at? In either example, the mutability question appears: in yours, b is mutated while an outstanding &mut borrow exists, and in mine, a is mutated while an outstanding &back borrow exists.

This all seems like it might be possible to solve for some special cases, but pointer aliasing is one of the hardest parts of static analysis. It's essentially impossible to do in the general case, and my intuition is that back pointers get you very close to the general case. It's not even obvious to me how one can get a binary tree with back pointers to work well.

> The compiler might optimize that out, observing that b is dead at deallocation.

Only if b doesn't have a destructor (or has a destructor that can be sufficiently inlined).

Animats · on Oct 3, 2017

That's the core reason why back pointers are hard: they result in complicated relationships between things.

That's why they need built-in checking, rather than hacks using "unsafe". Use of back references is going to have to be restricted so you can't get two mutable handles to the same object. The trick is making that work through restrictions which can be checked locally.

dbaupp · on Oct 4, 2017

That's not my point. A general built-in checking scheme that handles all the variations of linked lists and trees will likely have to have quite a few nobs to tweak for the various trade-offs one might want to make, and will likely have to be very general to handle all the variations, making that general infrastructure significantly more complicated (both to implement in the unsafe block that is the compiler, and, likely, for users to understand) than focused `unsafe` blocks, which can, and often are, packaged up into safe interfaces (like reference counting with weak pointers).

> Use of back references is going to have to be restricted so you can't get two mutable handles to the same object. The trick is making that work through restrictions which can be checked locally.

Talk is cheap, and I'm not sure this talk actually says anything of note. Those "ideas" seem to be the most obvious first considerations for working out how this feature might even work:

- disallowing two mutable handles is the fundamental rule of Rust,

- the ability to check locally is a strong convention for programming language implementations, and a very strong one for Rust.

Having some concrete idea that addresses just those two points (no need to worry about syntax or anything like that) would be an improvement on the current situation: I haven't see any for this other than "runtime checking" (already exists in Rust) and "no back references".

---

In other words, vague assertions like "Rust should support back pointers" with syntax ideas are jumping way ahead. The language typically tries to provide building blocks to allow things to be built in libraries, rather than lumping opinionated features into the language. These features will often then require unsafe to create safe abstractions, but this isn't bad: the compiler itself is essentially one large unsafe block.

The place to start would be trying to create a safe back-pointer interface in a library, and seeing if there's anything that is too hard to make truly safe. This has in fact already been done, with Rc/Weak and Arc/Weak, but for zero-overhead/non-reference-counted scenarios, I think a good place to start would be a smart pointer pair like Rc but with a constructor `fn make<T>(val: T) -> (Forward<T>, Back<T>)`.

Animats · on Oct 7, 2017

It's hard to make compile-time checks for consistency between two different places in the code using a library.

gradschool · on Oct 1, 2017

An old school c programmer wants to know if there's a Rust idiom for recovering gracefully from failed allocations (i.e., when malloc returns NULL). Otherwise, what good is a type safe program that crashes due to a heap overflow? If this is the wrong kind of question, I'm listening.

Animats · on Oct 2, 2017

That's a legitimate criticism of a systems programming language. You'd like your OS kernel to never panic on an out of memory condition. (The serious end of the embedded world insists on this.) There are people working on this problem.[1]

Core Rust avoids almost all implicit heap allocations that aren't triggered by operations the programmer sees as allocations.[2] But there are many library functions which allocate and panic if the allocation fails.

[1] https://internals.rust-lang.org/t/notes-interviews-on-fallib... [2] https://www.reddit.com/r/rust/comments/69awhl/how_to_know_wh...

steveklabnik · on Oct 1, 2017

Rust's standard library assumes that allocation will not fail. That's just a library; the language doesn't inherently know things about heap allocation. You can always call an alllocator directly and look at what it returns.

Furthermore, we're working on an allocator interface that you could use with stdlib stuff that would let you handle it.

The TL;DR is basically "if you want to, you can. It's going to be even easier in the future."

gpm · on Oct 2, 2017

I'd also imagine that a allocator that returned a Result type would be quite ergonomic. Though you'd have to re-implement any datastructures that you wanted to use.

FreeFull · on Oct 2, 2017

You would probably want to be really careful about avoiding any sort of allocation in the error case, though.

bryanlarsen · on Oct 2, 2017

Where are you using this? Embedded? By default Linux is set to over commit memory, so malloc always be succeeds even if you're out of memory.

jjnoakes · on Oct 2, 2017

Not if process limits are set.

SeanDav · on Oct 2, 2017

> " By default Linux is set to over commit memory, so malloc always be succeeds even if you're out of memory."

How does Linux handle 1 trillion mallocs of 1MB, on a 8GB system with 500GB hard drive, without failing?

There is nothing magic about Linux, it still cannot allocate and use memory which physically and virtually is simply not there.

steveklabnik · on Oct 2, 2017

When overcommit is set, it's not malloc that fails; the OS kills your process. So, checking the result of your allocations will never let you handle this kind of error.

rcxdude · on Oct 2, 2017

Though if the allocation is ludicrously oversized (not sure the threshold, but 2x present memory will do it) it will just straight up fail without over-committing.

dmpk2k · on Oct 2, 2017

There are other *nix OSes out there (that don't make the mistake of overcommit by default).

jwatt · on Oct 2, 2017

> what good is a type safe program that crashes due to a heap overflow

Being "safe" is not about preventing any type of crash. It's about preventing exploitable crashes and behavior.

emerged · on Oct 2, 2017

I feel that Rust is something of a religion at this point. Curious how many downvotes I might accrue for expressing that perspective. Might be wrong, but my sense has been that challenges with Rust are written off in an off-hand way and its features exaggerated.

Glad to see promising new languages pushing progress forward, though. I'm head high in C/C++/asm for decades so many of the complaints about those languages fall flat with me since I'm over the hurdles via sheer years of experience.

mikebenfield · on Oct 2, 2017

> challenges with Rust are written off in an off-hand way and its features exaggerated

I think people are pretty straightforward about Rust's challenges. The borrow checker can be a hurdle and probably needs more work (although it is getting a big boost soon with non-lexical lifetimes). Compile times are not great. The macro system is powerful but can be confusing. People from some mainstream languages may have trouble wrapping their heads around the type system.

Regarding its features, these attributes:

* it can do low level, fast code;

* it's not garbage collected, but is memory safe;

* it has zero cost abstractions (and, conversely, it doesn't have ridiculous, unnecessary, slow abstractions);

* it doesn't ignore the last thirty years of innovation in programming languages (by which I mean it has a modern type system, closures, variant types, and a few other things);

* it doesn't have 30-40 years of cruft;

* it has a standard and very high quality build/packaging system;

* its developers are thoughtful and have good taste;

* it has a reasonably big community and has a good chance of catching on even more widely;

are huge. You say you think its features are exaggerated, but to my mind, what's to exaggerate? That list looks pretty great to me.

posterboy · on Oct 2, 2017

half of these "features" are your own subjective opinion

erichdongubler · on Oct 2, 2017

And those are...?

It's hard to refute or agree with you when I don't know what you're talking about. :)

mikebenfield · on Oct 2, 2017

They are? I'll grant you the one about the taste of the developers. Which other one is opinion?

posterboy · on Oct 2, 2017

after zero cost abstractions you got carried away.

pornel · on Oct 2, 2017

As someone who's been using Rust in production for over a year, and C on and off for 17 years, I can assure you Rust really delivers what it promises.

The challenges are not exaggerated, but they do have solutions, so they're not deal-breakers in practice.

The learning curve is steep. It takes effort, but once you get it (and unlearn some C-isms), Rust is a quite productive language.

Some patters and data structures are not understood by borrow checker, so you have to fall back to coding as-safely-as-any-C program. However, thanks to the practice of wrapping risky bits in safe abstractions, they remain only a small fraction of any program.

It might feel that writing any "unsafe" code defeats the promise of Rust, but it doesn't. The concept of safe abstraction is quite powerful, and makes sense even for wrapping 100% C libraries, where you can encode informal constraints form the manual as code, and automatically prevent library API from being misused.

xiphias · on Oct 2, 2017

It's the first time since I'm alive a language that is higher level than C/C++ can compete with C/C++ in the benchmarks.

If you have experienc with large C++ code bases you probably know how impossible it is to write safe code. Also after you use only unique/linear types, single inharitance (which is advised when working with a big code base), you are writing something that's easier to express in Rust than in C/C++.

Inlined asm will stay of course, Rust doesn't try compete with it.

blub · on Oct 2, 2017

Safety is many shades of grey instead of being black and white and it is intertwined with the concept of risk.

The fact that large C and C++ code bases exist and the products they drive are successful proves that it is possible to write code in those languages that offers a good enough risk/reward balance.

This is what many Rust fans are missing: you want to sell 100% memory safety, but you're assuming that everyone wants 100% instead of 80 or 90% and that they're willing to pay the price of switching.

steveklabnik · on Oct 2, 2017

If you want to work in a memory-unsafe superset, you can; that's what unsafe is.

blub · on Oct 2, 2017

It looks like you didn't understand what I meant. Each company/project/person needs to make a decision:

* I can work in C and C++, which I can get to X% safe using the techniques and tools A, B, C at the cost of M.

OR

* I can invest in learning Rust and integrating it with my environment to get to 100% at the cost of N.

Is the difference from X% to 100% big enough to pay N instead of M? How big of an X do I need? Is it even possible for me to use A, B and C?

This is how an informed decision would look like, but Rust proponents just keep repeating that X < 100.

rcxdude · on Oct 2, 2017

Not saying there's a cost-benefit analysis to be done, but there's a pretty nonlinear jump from under 100% to 100%, however. In my experience existing tooling for static analysis on C and C++ tends to be it can point out a lot of bugs, but it can't prove that there's are not bugs (of a particular class). This helps me catch things I miss, but it doesn't free me up from worrying about them (X% safe doesn't been I don't have to worry about X% of the code, which would be great, it means all of the code has a (1 - X%) chance of containing a safety bug, which is far less useful).

This cost-benefit analysis also is substantially different between switching an existing codebase and starting a new one. The fact that Mozilla considers it worthwhile to switch an extremely large C++ codebase to Rust is worth noting.

steveklabnik · on Oct 2, 2017

Yeah, we mean slightly different things. I agree with what you're saying, but what I'm saying is about the cost of N; if you find yourself fighting with Rust's rules, you can drop back to unsafe, and use the rules you already know. Well, similar ones anyway. This reduces the cost of N.

blub · on Oct 3, 2017

N however can only be decreased by improving Rusts UX. If one is able to write unsafe code in Rust, they already paid the full N.

jdub · on Oct 2, 2017

> my sense has been that challenges with Rust are written off in an off-hand way and its features exaggerated

Not by the Rust development team, which is what matters. They are keenly aware of the challenges, and have done a huge amount of work (particularly over the last six to twelve months) working on ergonomics and developer experience.

It's a primary focus for the Rust community, now and into the future.

Many (perhaps most?) of the specifications lined up for the current "impl period" (a time for building rather than designing) are related to developer ergonomics in some form.

derefr · on Oct 2, 2017

Question: does a hash table with linked-list chaining count as a non-niche use-case for linked lists, or is the linked list just counted as an implementation detail rather than a separate data structure in that context?

(Though, Rust's own std::collections::HashMap just does robin-hood hashing instead of separate chaining, and everyone's just going to use that, so "what data structure should a separately-chained hash-table use in Rust" is mostly a moot question.)

PaulHoule · on Oct 1, 2017

Makes Rust look hard.

steveklabnik · on Oct 1, 2017

Getting this code right is hard in many languages in Rust's space; Rust just forces you to get it right, whereas they let you get it wrong.