I split the code in two binaries: "code" (qsort) and "code2" (std::sort()) and then I ran both under a profiler (based on intel's performance counters).
It seams that qsort simply executes an order of magnitude more instructions for the same result than std::sort. On the other hand std::sort() code, even if it's faster, it has more branch miss-predictions.
Here [1] are the results if you want to have a look.