In my view what qemu produces is perfectly acceptable. On the other hand applying optimization approach of Java's HotSpot or Python's PsyCo to qemu might be interesting. (and something like that was already done by HP's Dynamo which could "emulate" parisc code on parisc faster than direct execution)
Still architecture differences are significant, but in this case for what you really want to accomplish ARM is reasonably similar to i386.
Still architecture differences are significant, but in this case for what you really want to accomplish ARM is reasonably similar to i386.