Mozilla JS engineer here. It goes without saying that I don't speak for Mozilla ...

Mozilla JS engineer here. It goes without saying that I don't speak for Mozilla here, or even for the rest of the JS team. And I may misremember stuff, so don't cite me too closely.

Benchmarking is very difficult, and not just in JS. I worked on PHP before, and benchmarking was crap there too. Same in Lua [0]. Same in Python until the Unladen Swallow team packaged up some real-life applications. Even in Java, things were bad. They had SpecJVM98, and it so mischaracterized real Java programs that the Dacapo group went and formed and built benchmarks so they could optimize their research correctly.

What is an ideal web benchmark? Tough question. Parts of Sunspider and V8 tried to look at what the web was and distil it into a useful benchmark - that's a good start [5]. Kraken tried instead to look forward at what the web might be, even though people aren't doing that stuff just yet - that's good too. Perhaps what the web is now would be good, but we don't have anything like that [1] [2].

Add to this that benchmarking is inherently hard, so hard that it's its own research area. How do you weight various benchmarks to produce the really useful single number that we all want to see? That's kinda hard, and back when I read research on this, I didn't see a particularly useful answer.

Let's consider what we'd love to have. Run a web page for a little while, record it somehow, and package that up so that it can be reproduced in a useful way [3]. So what should go in it? I mainly use gmail, google reader, bugzilla, hacker news, and small amounts of other sites. What if you read nothing but Jezebel and Facebook? Will my benchmark help you? Will my benchmark even help me in 6 months after those sites are updated? Whose view of the web is the right one, at which time, and how will we coordinate that into a benchmark?

Which brings us back to the original post. That's benchmarking a decompressor, ported from C, which doesn't sound like a great real-world application to me. Its hot-loop seems to be string appending, which Firefox is great at because of ropes [4]. We actually have a few benchmarks like this. One of the V8 benchmarks uses a Scheme-to-JS compiler to produce code. And Emscripten, an LLVM-to-JS compiler, produces JS apps in the same way to OP did, and we measure these periodically to make sure we don't regress. But the OP's benchmark is hardly representative of real web usage, and which particular implementations did well here is at least partially down to luck.

So to summarize, benchmarking is hard, no-one is doing it right, I'm not even sure we know what right is, but most of us can agree that ported C apps are probably not representative of real browser usage.

And I leave you with a final reminder that I don't speak for Mozilla.

[0] This was true when my mate worked on Lua two years ago.

[1] If you can package Gmail and Facebook into a nice platform-independent package that we can run and get one single number representing its performance, we will happily use it and worship at your feet.

[2] Microbenchmarks (how fast is string appending, for example) aren't the answer either.

[3] On multiple platforms, while doing the same thing on each.

[4] Other posters commented that string-appending is an O(n^2) operation - this isn't quite true. The naive implementation is, but JS vendors haven't been naive since at least 2007.

[5] While not representative of the real web, these benchmarks helped us run our little JS performance war, right from slow-and-sad up to fast-and-awesome.