With all these new AI models, both stable diffusion and llama specially, I'm con...

simonw · on April 29, 2023

My understanding is that part of it is that Apple Silicon shares all available RAM between CPU and GPU.

I'm not sure how many of these models are actively taking advantage of that architecture yet though.

int_19h · on April 29, 2023

The GPU isn't actually used by llama.cpp. What makes it that much faster is that the workload, either on CPU or on GPU, is very memory-intensive, so it benefits greatly from fast RAM. And Apple is using DDR5 running at very high clock speeds for this shared memory stuff.

It's still noticeably slower than GPU, though.

AnthonyMouse · on April 30, 2023

Most of these implementations are not platform-specific. I've been running llama.cpp on x86_64 hardware and the performance is fine. The small models are fast and the quantized 65B model generates about a token per second on a system with dual-channel DDR4, which isn't unusable.

The tough thing to find is something affordable that will run the unquantized 65B model at an acceptable speed. You can put 128GB of RAM in affordable hardware but ordinary desktops aren't fast. The things that are fast are expensive (e.g. I bet Epyc 9000 series would do great). And that's the thing Apple doesn't get you either, because Apple Silicon isn't available with that much RAM, and if it was it wouldn't be affordable (the 96GB Macbook Pro, which isn't enough to run the full model, is >$4000).

spudlyo · on April 30, 2023

If you want to spend $4800.00 on just the computer, you can get a Mac Studio with 128G of memory with 400GB/s bandwidth. There are sparse reports out there of folks running 65B models on it. I've seen no performance measurements though.

AnthonyMouse · on April 30, 2023

It's interesting that they actually have it but the price is still silly.

  SP5 system board ~$1000
  Epyc 9124 $1083
  192GB registered DDR5 (12x16GB) ~$1000
  case, power supply, modest storage: ~$300

460GB/s bandwidth from 12 memory channels, 50% more memory and you'd have more than $1000 left over. But >$3000 is not a low price either, it's just lower.

sp332 · on April 29, 2023

iPhones leaned in to "computational photography" a long time ago. Eventually they added custom hardware to handle all the matrix multiplies efficiently. They exposed some of it to apps with an API called CoreML. They've been adding more features like on-device photo tagging, voice recognition, VR stuff.

sagarm · on April 29, 2023

Google was the leader on computational smartphone photography. They released their "night sight" mode before Samsung and Apple had anything competitive.

sp332 · on April 30, 2023

Sure, and you can run Stable Diffusion on normal Snapdragon SoCs, and there's a very hacky way to get llama.ccp running on a Pixel phone https://twitter.com/thiteanish/status/1635678053853536256 but I haven't seen any good apps yet.

bkm · on April 29, 2023

Homogenized hardware I assume, this is why iOS had so many photography Apps too.