I'd say that much of the M2 speed comes from its ultrawide RAM interface, only possible because RAM is soldered basically directly to the CPU die(s). It can't scale to server sizes, but it makes perfect sense for a laptop or a smaller desktop.
Apple's wide ram interface is an advantage. But they aren't doing anything exotic to achieve it. AMD & Intel could offer consumer cpus with more memory channels but they choose not to, likely for cost and market-segmentation reasons.
"On Package" isn't the same thing as on die. Apple's M1 LPDDR memory setup isn't really any different from what you would find in a normal PC laptop. By putting the memory as close as possible to the CPU it makes it easier to maintain signal integrity, but it's not really any different from anyone else's approach.
> While 243GB/s is massive, and overshadows any other design in the industry, it’s still quite far from the 409GB/s the chip is capable of. More importantly for the M1 Max, it’s only slightly higher than the 204GB/s limit of the M1 Pro, so from a CPU-only workload perspective, it doesn’t appear to make sense to get the Max if one is focused just on CPU bandwidth.
It's shared with the GPU, so limited it to CPU-only doesn't seem very fair. In fact, I think not having to transfer data to the GPU is another big part about why, at least for casual gaming, it packs way more punch than it really has any right to.