In essence, a big reason why the Intel team was able to do this was because they're getting optimized IR from GHC's front-end[0], and then applying further optimizations with their compiler (HRC) and then converting it into yet another IR (MIL) used by yet another compiler (FLRC), designed by Intel for arbitrary functional programming languages, for some further optimizations and ultimately into C where Intel's C compiler does the rest of the work.
It's interesting that HRC makes relatively significantly optimized programs (up to 2x), but that GHC's runtime (which HRC is not using) is so well optimized that the performance of HRC programs is roughly on par with those from GHC, despite the programs themselves being more performant.
I find stuff like this to be a testament to the practicality of great design in functional programming language ecosystems: Even the compilers are composable!
[0]: For those unfamiliar with what GHC does/how it's designed, this talk gives a great overview of the Core language that is the optimized IR of GHC: https://www.youtube.com/watch?v=uR_VzYxvbxg
That first paragraph was a fun summary. It sounds like they're just throwing proven tools and tactics at a tough problem getting some good results. They start with output of one, high-quality tool. Like prior projects, they find an intermediate point in the transition that lets them add or experiment with optimizations. Then, they feed it into another high-quality tool to get even more out.
Looks like good research mixed with good, engineering tradeoffs.
It's interesting that HRC makes relatively significantly optimized programs (up to 2x), but that GHC's runtime (which HRC is not using) is so well optimized that the performance of HRC programs is roughly on par with those from GHC, despite the programs themselves being more performant.
I find stuff like this to be a testament to the practicality of great design in functional programming language ecosystems: Even the compilers are composable!
[0]: For those unfamiliar with what GHC does/how it's designed, this talk gives a great overview of the Core language that is the optimized IR of GHC: https://www.youtube.com/watch?v=uR_VzYxvbxg