Ropes are great when constructing a large string, but can be much slower and mor...

TheLoneWolfling · on July 22, 2014

You can just fall back to a "rope" consisting of a single node for small strings.

One of the things I really like about what I call "pseudo-immutable" ropes (ropes that are not truly immutable, but to all appearances are - when you take a slice it modifies the original rope to maximize structure-sharing.) is that you get immunity to those subtle memory leaks on slicing "for free".

I agree fully on length-prefixed strings. They are a lot better than null-terminated strings, in a bunch of ways. Regexes using DFAs though can be... problematic. You really want a NFA that's converted to a DFA on-the-fly, with caching, or else you can end up with exponential blowup when constructing the DFA (a[<some large character class>]{1000}a, for example.). A DFA-based regex also doesn't handle a number of things.

As for slicing, I had a thought on that matter. Make slices reference the main array weakly, and have (weak) backreferences to any slices made of an array. When an array comes up for GC, if it has any slices made of it, copy out the slices to new arrays. There are a bunch of heuristics that can be tuned here, of course (don't do weakref / backref at all on smaller arrays, don't copy out slices if it won't reduce memory usage, etc.), but it prevents the worst case at least.

The big advantage of slices is that it means that programmers don't need to sacrifice readability for efficiency. It's much easier to be able to pass in arr[7:10000] to a function than to either have to modify the function to take index parameters (which you can't always do) or have to do a temporary copy. It also prevents the O(n^2) behavior associated with doing simple tokenizing (the change to Java's substring behaviour, for example)

I disagree strongly with "Encoding happens at program boundaries". The (major) problem with that is that there becomes no good way to pass strings through the language without going through an encoding pass. Try to implement cat in Python 3, for example. Encoding should happen when you ask for it.