I don't know the JIT internals, but just ignoring them I'd point fingers on the setBit(x) = "load byte from mem into reg, set some bit in reg, write back reg to mem". Now it's just a "write byte to memory", which is much faster at the expense of using more storage (factor 32 or 64, probably 64). Though I am surprised caching doesn't mitigate this.