With all these new AI models, both stable diffusion and llama specially, I'm considering switching to iPhone. I don't think I fully understand why iPhones and Macs are getting so many implementations but it seems like it's hardware based.
The GPU isn't actually used by llama.cpp. What makes it that much faster is that the workload, either on CPU or on GPU, is very memory-intensive, so it benefits greatly from fast RAM. And Apple is using DDR5 running at very high clock speeds for this shared memory stuff.
Most of these implementations are not platform-specific. I've been running llama.cpp on x86_64 hardware and the performance is fine. The small models are fast and the quantized 65B model generates about a token per second on a system with dual-channel DDR4, which isn't unusable.
The tough thing to find is something affordable that will run the unquantized 65B model at an acceptable speed. You can put 128GB of RAM in affordable hardware but ordinary desktops aren't fast. The things that are fast are expensive (e.g. I bet Epyc 9000 series would do great). And that's the thing Apple doesn't get you either, because Apple Silicon isn't available with that much RAM, and if it was it wouldn't be affordable (the 96GB Macbook Pro, which isn't enough to run the full model, is >$4000).
If you want to spend $4800.00 on just the computer, you can get a Mac Studio with 128G of memory with 400GB/s bandwidth. There are sparse reports out there of folks running 65B models on it. I've seen no performance measurements though.
It's interesting that they actually have it but the price is still silly.
SP5 system board ~$1000
Epyc 9124 $1083
192GB registered DDR5 (12x16GB) ~$1000
case, power supply, modest storage: ~$300
460GB/s bandwidth from 12 memory channels, 50% more memory and you'd have more than $1000 left over. But >$3000 is not a low price either, it's just lower.
iPhones leaned in to "computational photography" a long time ago. Eventually they added custom hardware to handle all the matrix multiplies efficiently. They exposed some of it to apps with an API called CoreML. They've been adding more features like on-device photo tagging, voice recognition, VR stuff.
Google was the leader on computational smartphone photography. They released their "night sight" mode before Samsung and Apple had anything competitive.
Sure, and you can run Stable Diffusion on normal Snapdragon SoCs, and there's a very hacky way to get llama.ccp running on a Pixel phone https://twitter.com/thiteanish/status/1635678053853536256 but I haven't seen any good apps yet.