It could be. But there's quite a bit of momentum behind CUDA. Plus, CUDA is just...

		kayvr on June 13, 2023 \| parent \| context \| favorite \| on: Llama.cpp: Full CUDA GPU Acceleration It could be. But there's quite a bit of momentum behind CUDA. Plus, CUDA is just wicked fast. I wrote a WebGPU version of LLaMA inference and there's still a bit of a gap in performance between WebGPU and CUDA. Admittedly, WebGPU can't access tensor cores and I undoubtedly need to optimize further.