You would need large memory bandwidth and a good set of cache pre-population heuristics (putting it directly on the memory is a way to get the bandwidth).
ML would benefit from both too, as would highly complex graphics and physics simulation. The cache pre-population is probably at odds with low latency graphics.