binaries never contain references to anything else, and therefore can never be a part of a reference loop. so it's safe to toss them out into a shared space and use reference counting to track them. you don't want to do it with small binaries, as the overhead of shared-allocating/ locking/ incrementing/ locking/ decrementing would probably be worse than just coping the small binary in the first place.
if you put complex reference bearing data out in the void, now whatever memory arena that allocated it may contain data that it references, and now you have to keep track of those references and use all of the shared items as new roots into your data. and if some other memory arena receives it and plucks out a value, do you only then copy it or do you reference back to the original. if you send a message back and form a loop, the gc process for either now has to jump through both, and you've destroyed your cheap gc cycles. if you shunt all of the data into shared memory to avoid multi-gc reference loops, you've now just created a big shared gc that will have all the problems of big shared gcs.
the per-process small gcs with full copying between them mean you can do things like just dropping the arena when the process dies without even checking liveness for anything in it ( you'll need to run any destructors if anything needs cleanup ( like references to shared large binaries needing dereferencing ), but you can keep track of items that need that in a bitmap or something, and avoiding the trace is a win on its own )
Minor nit: shared allocation, counter incrementing, and counter decrementing can all be done lock-free. They'd still need memory fence operations (and retries in case of contention), and the associated performance hits, but not actual locking.
Lwn.net just published an article where a comment
https://lwn.net/Articles/849239/ described how there were no really lock-free data structures on modern CPUs:
A couple of decades of writing concurrent algorithms has taught me that scalability is really defined by the frequency the concurrent algorithm accesses/updates shared data, not whether the software algorithm is considered "lockless" or not.
Nobody is claiming that memory fences are free or that livelock isn't possible with lockfree algorithms. (Except in corner cases, such as carefully constructed RISC-V code where the architectural specification does guarantee progress in short tight ll-sc loops with proper instruction alignment.)
The LWN comment mentions spinlocks to improve worst-case performance of some lockfree algorithms, but that criticism doesn't apply to atomic increments / decrements on multi-issue CPUs. The latency of contended memory operations would completely hide the overhead of the add/subtract and make the atomic add/subtract equivalent to a spinlock. (Also, the LWN comment doesn't apply where the algorithm can be written within guaranteed progress corner-cases of the architecture, where applicable.)
if you put complex reference bearing data out in the void, now whatever memory arena that allocated it may contain data that it references, and now you have to keep track of those references and use all of the shared items as new roots into your data. and if some other memory arena receives it and plucks out a value, do you only then copy it or do you reference back to the original. if you send a message back and form a loop, the gc process for either now has to jump through both, and you've destroyed your cheap gc cycles. if you shunt all of the data into shared memory to avoid multi-gc reference loops, you've now just created a big shared gc that will have all the problems of big shared gcs.
the per-process small gcs with full copying between them mean you can do things like just dropping the arena when the process dies without even checking liveness for anything in it ( you'll need to run any destructors if anything needs cleanup ( like references to shared large binaries needing dereferencing ), but you can keep track of items that need that in a bitmap or something, and avoiding the trace is a win on its own )