More

chatmasta · 2026-03-27T11:40:51 1774611651

On the other hand it forced developers to invest more in Metal which looks like an investment starting to bear fruit.

chatmasta · 2026-03-27T05:36:24 1774589784

I love everything about this direction except for the insane inference costs. I don’t mind the training costs, since models are commoditized as soon as they’re released. Although I do worry that if inference costs drop, the companies training the models will have no incentive to publish their weights because inference revenue is where they recuperate the training cost.

Either way… we badly need more innovation in inference price per performance, on both the software and hardware side. It would be great if software innovation unlocked inference on commodity hardware. That’s unlikely to happen, but today’s bleeding edge hardware is tomorrow’s commodity hardware so maybe it will happen in some sense.

If Taalas can pull off burning models into hardware with a two month lead time, that will be huge progress, but still wasteful because then we’ve just shifted the problem to a hardware bottleneck. I expect we’ll see something akin to gameboy cartridges that are cheap to produce and can plug into base models to augment specialization.

But I also wonder if anyone is pursuing some more insanely radical ideas, like reverting back to analog computing and leveraging voltage differentials in clever ways. It’s too big brain for me, but intuitively it feels like wasting entropy to reduce a voltage spike to 0 or 1.

throwaw12 · 2026-03-27T08:39:54 1774600794

> I love everything about this direction except for the insane inference costs.

If this direction holds true, ROI cost is cheaper.

Instead of employing 4 people (Customer Support, PM, Eng, Marketing), you will have 3-5 agents and the whole ticket flow might cost you ~20$

But I hope we won't go this far, because when things fail every customer will be impacted, because there will be no one who understands the system to fix it

michaelmior · 2026-03-27T11:08:14 1774609694

I worry about the costs from an energy and environmental impact perspective. I love that AI tools make me more productive, but I don't like the side effects.

azan_ · 2026-03-27T14:11:11 1774620671

Environmental impact of ai is greatly overstated. Average person will make bigger positive impact on environment by reducing his meat intake by 25% compared with combined giving up flying and AI use.

eksu · 2026-03-27T06:31:05 1774593065

This is the wrong way to see it. If a technology gets cheaper, people will use more and more and more of it. If inference costs drop, you can throw way more reasoning tokens and a combination of many many agents to increase accuracy or creativity and such.

gf000 · 2026-03-27T10:00:27 1774605627

> throw way more reasoning tokens and a combination of many many agents to increase accuracy or creativity and such.

But this is just not true, otherwise companies that can already afford such high prices would have already outpaced their competitors.

spacebanana7 · 2026-03-27T11:10:50 1774609850

No company at the moment has enough money operate with 10x the reasoning tokens of their competitors because they're bottlenecked by GPU capacity (or other physical constraints). Maybe in lab experiments but not for generally available products.

And I sense you would have to throw orders of magnitude more tokens to get meaningfully better results (If anyone has access to experiments with GPT 5 class models geared up to use marginally more tokens with good results please call me out though).

mastermage · 2026-03-27T07:07:57 1774595277

I mean theoretically if there are many competitiors the costs of the product should generally drop because competition.

Sadly enough I have not seen this happening in a long time.

chatmasta · 2026-03-27T04:28:46 1774585726

> The LLM is the key element here

No, the key (novel) element here is the two-tiered approach to sandboxing and inter-agent communication. That’s why he spends most of the post talking about it and only a few sentences on which models he selected.

chatmasta · 2026-03-27T04:26:09 1774585569

This sounds a lot cleaner than the approach I was thinking of with a separate bot for each role. I like it.

chatmasta · 2026-03-27T04:25:16 1774585516

Does IRC still have message length limits or was that only in the early versions of the protocol?

entropie · 2026-03-27T04:37:35 1774586255

I guess you just send newlines as in multiple messages and disable flood protection on the server or whitelist your bot.

stackghost · 2026-03-27T04:40:50 1774586450

RFC 1459 originally stipulated that messages not exceed 512 bytes in length, inclusive of control characters, which meant the actual usable length for message text was less. When the protocol's evolution was re-formalized in 2000 via RFCs 2810-13 the 512-byte limit was kept.

However, most modern IRC implementations support a subset of the IRCv3 protocol extensions which allow up to 8192 bytes for "message tags", i.e. metadata and keep the 512-byte message length limit purely for historical and backwards-compatibility reasons for old clients that don't support the v3 extensions to the protocol.

So the answer, strictly speaking, is yes. IRC does still have message length limits, but practically speaking it's because there's a not-insignificant installed base of legacy clients that will shit their pants if the message lengths exceed that 512-byte limit, rather than anything inherent to the protocol itself.

chatmasta · 2026-03-27T04:21:10 1774585270

I bet there’s gonna be a banger of a Mac Studio announced in June.

Apple really stumbled into making the perfect hardware for home inference machines. Does any hardware company come close to Apple in terms of unified memory and single machines for high throughput inference workloads? Or even any DIY build?

When it comes to the previous “pro workloads,” like video rendering or software compilation, you’ve always been able to build a PC that outperforms any Apple machine at the same price point. But inference is unique because its performance scales with high memory throughput, and you can’t assemble that by wiring together off the shelf parts in a consumer form factor.

It’s simply not possible to DIY a homelab inference server better than the M3+ for inference workloads, at anywhere close to its price point.

They are perfectly positioned to capitalize on the next few years of model architecture developments. No wonder they haven’t bothered working on their own foundation models… they can let the rest of the industry do their work for them, and by the time their Gemini licensing deal expires, they’ll have their pick of the best models to embed with their hardware.

whywhywhywhy · 2026-03-27T10:04:26 1774605866

> But inference is unique because its performance scales with high memory throughput, and you can’t assemble that by wiring together off the shelf parts in a consumer form factor.

Nvidia outperforms Mac significantly on diffusion inference and many other forms. It’s not as simple as the current Mac chips are entirely better for this.

rafram · 2026-03-27T10:34:18 1774607658

But where are you going to find an Nvidia GPU with 128+ GB of memory at an enthusiast-compatible price?

sippeangelo · 2026-03-27T13:33:54 1774618434

Some Chinese sources sell modded Nvidia GPUs with extra VRAM. They're quite affordable in comparison to even a Mac Pro.

giwook · 2026-03-27T13:49:06 1774619346

And how much do you trust Chinese hardware?

embedding-shape · 2026-03-27T13:59:12 1774619952

Give that most of mine, and probably yours, and probably most of the world's computers are in fact made in China one way or another, some higher percentage than others, I'm guessing most of us trust our hardware enough to continue using it.

x______________ · 2026-03-27T13:56:26 1774619786

When there's no one left to trust, maybe you need to re-evaluate your criteria.

ricardobayes · 2026-03-27T11:44:47 1774611887

That might even be true, but how large is the TAM for such machines?

edelans · 2026-03-27T13:07:16 1774616836

and let alone competing on the energy consumption!

embedding-shape · 2026-03-27T13:38:10 1774618690

Where are you gonna find Apple hardware with 128GB of memory at enthusiast-compatible price?

The cheapest Apple desktop with 128GB of memory shows up as costing $3499 for me, which isn't very "enthusiast-compatible", it's about 3x the minimum salary in my country!

kaashif · 2026-03-27T13:55:32 1774619732

Apple is not catering to minimum salaries in poor countries. Does this really need to be explained?

$3499 is definitely enthusiast compatible. That's beefy gaming PC tier, which is possibly the canonical example of an enthusiast market.

This isn't tens of thousands of dollars for top tier Nvidia chips we're talking about.

embedding-shape · 2026-03-27T13:57:52 1774619872

Seems I misunderstood what a "enthusiast" is, I thought it was about someone "excited about something" but seems the typical definition includes them having a lot of money too, my bad.

joe_mamba · 2026-03-27T13:40:35 1774618835

> it's about 3x the minimum salary in my country!

Enthusiast compute hardware doesn't cater to the people on the minimum salary in any country.

embedding-shape · 2026-03-27T13:46:53 1774619213

Right, I think maybe we're then talking about "upper class enthusiasts" or something in reality then? I understood that to juts be about the person, not what economic class they were in, maybe I misunderstood.

wappieslurkz · 2026-03-27T13:57:24 1774619844

Do NVIDIA solutions also outperform the Apple M-series in performance per Watt?

chpatrick · 2026-03-27T10:41:59 1774608119

But they're pretty fast and can have loads of RAM, which would be prohibitively expensive with Nvidia.

chocochunks · 2026-03-27T10:49:08 1774608548

A 128GB 2TB Dell Pro Max with Nvidia GB10 is about $4200, a Mac Studio with 128GB RAM and 2TB storage is $4100. So pretty comparable. I think Dell's pricing has been rocked more by the RAM shortage too.

midnight_eclair · 2026-03-27T12:15:15 1774613715

~not unified memory tho~

benoau · 2026-03-27T14:00:08 1774620008

That won't hold much benefit as SOCAMM2 and LPCAMM2 get more popular.

ctxc · 2026-03-27T12:59:34 1774616374

I took ~ to be a "singing tone" for some reason till I saw sibling and realized it might be an attempted strikethrough xD

mciancia · 2026-03-27T12:23:47 1774614227

It is unified memory on this one

cestith · 2026-03-27T13:49:37 1774619377

From the spec sheets I’m looking at, it is not. I’m seeing models of the Dell Pro Max with 128 GB of DDR5-6400 as CAMM2, then a separate memory of up to 24 GB on the GPU. CAMM2 does not make the memory unified.

There are also SO-DIMM options.

chocochunks · 2026-03-27T14:02:37 1774620157

You're not looking at the right thing. Dell's naming is horrible. Dell Pro Max with GB10 (https://www.dell.com/en-us/shop/cty/pdp/spd/dell-pro-max-fcm...). It's a very different computer than what you're looking at and has 128GB LPDDR5X unified memory.

midnight_eclair · 2026-03-27T12:31:22 1774614682

my bad

plagiarist · 2026-03-27T13:33:30 1774618410

Not quite, what is the vRAM bandwidth of each? The bandwidth is a huge contributor to LLM performance.

embedding-shape · 2026-03-27T13:45:50 1774619150

AFAIK, for the unified bandwidth, it depends mostly on the CPU, for M4 Max (I think it's the default today?) it does ~550 GB/s, while GB10 does ~270 GB/s, so about a 2x difference between the two. For comparison, RTX Pro 6000 does 1.8 TB/s, pretty much the same as what a 5090 does, which is probably the fastest/best GPUs a prosumer reasonable could get.

AdamN · 2026-03-27T10:44:11 1774608251

Nvidia isn't selling one-off home computers afaik. But yes in terms of datacenter cloud usage Nvidia performs.

_zoltan_ · 2026-03-27T12:31:53 1774614713

GB300 DGX Station was announced last Monday.

newsclues · 2026-03-27T11:02:10 1774609330

https://marketplace.nvidia.com/en-us/enterprise/personal-ai-...

jamespo · 2026-03-27T11:36:38 1774611398

Amusingly there's a macbook next to it in the pic, is this headless?

Tsiklon · 2026-03-27T11:45:26 1774611926

It has a HDMI port and its USB-C ports also support display out. But I believe most who buy it intend to use it headless. The machine runs Ubuntu 24.04 and has a slightly customised Gnome (green accents and an nvidia logo in GDM) as its desktop.

HerbManic · 2026-03-27T04:53:10 1774587190

Jeff Geerling doing that 1.5TB cluster using 4 Mac Studios was pretty much all the proof needed to demo how the Mac Pro is struggling to find any place any more.

https://www.jeffgeerling.com/blog/2025/15-tb-vram-on-mac-stu...

pjmlp · 2026-03-27T08:21:05 1774599665

That is the proof what is left is a workaround, just like pilling minis on racks because Apple left the server space.

Also why Swift nowadays has to have good Linux support, if app developers want to share code with the server.

coldtea · 2026-03-27T11:33:17 1774611197

A workaround that works is better than an official solution that's barely adequate. Which is often the case.

pjmlp · 2026-03-27T12:39:24 1774615164

Or just maybe, to use a Steve Jobs quote, one is holding it wrong and should look elsewhere.

zozbot234 · 2026-03-27T05:02:49 1774587769

But those Thunderbolt links are slower than modern PCIe. If there's actually a M5-based Mac Studio with the same Thunderbolt support, you'll be better off e.g. for LLM inference, streaming read-only model weights from storage as we've seen with recent experiments than pushing the same amount of data via Thunderbolt. It's only if you want to go beyond local memory constraints (e.g. larger contexts) that the Thunderbolt link becomes useful.

wpm · 2026-03-27T05:09:49 1774588189

Why everyone wants to live in dongle/external cabling/dock hell is beyond me. PCIe cards are powered internally with no extra cables. They are secure. They do not move or fall off of shit. They do not require cable management or external power supplies. They do not have to talk to the CPU through a stupid USB hub or a Thunderbolt dock. Crappy USB HDMI capture on my Mac led me to running a fucking PC with slots to capture video off of a 50 foot HDMI cable, that then streamed the feed to my Mac from NDI, because it was more reliable than the elgarbo capture dongle I was using. This shit is bad. It sucks. It's twice the price and half the quality of a Blackmagic Design capture card. But, no slots, so I guess I can go get fucked.

wtallis · 2026-03-27T06:06:23 1774591583

For anything that's even somewhat in the consumer space rather than pure workstation/professional, the main reason is that dongles can be used with a laptop but add-in cards can't. When ordinary consumer PCs (or even office PCs) are in the picture, laptops are a huge chunk of the target audience.

The market segments that can afford to ignore laptops and only target permanently-installed desktops are mostly those niches where the desktop is installed alongside some other piece of equipment that is much more expensive.

GeekyBear · 2026-03-27T06:04:49 1774591489

Wasn't streaming models from storage into limited memory a case where it was impressive that you could make the elephant dance at all?

If you want to get usable speeds from very large models that haven't been quantitized to death on local machines, RDMA over Thunderbolt enables that use case.

Consumer PC GPUs don't have enough RAM, enterprise GPUs that can handle the load very well are obscenely expensive, Strix Halo tops out at 128 Gigs of RAM and is limited on Thunderbolt ports.

zozbot234 · 2026-03-27T08:18:28 1774599508

The bad performance you saw was with very limited memory and very large models, so streaming weights from storage was a huge bottleneck. If you gradually increase RAM, more and more of the weights are cached and the speed improves quite a bit, at least until you're running huge contexts and most of the RAM ends up being devoted to that. Is the overall speed "usable"? That's highly subjective, but with local inference it's convenient to run 24x7 and rely on non-interactive use. Of course scaling out via RDMA on Thunderbolt is still there as an option, it's just not the first approach you'd try.

mixdup · 2026-03-27T12:25:47 1774614347

The proposition of a Mac Pro in the Apple Silicon world wasn't necessarily about performance, it was about the existence of the PCIe slots. I don't think AI becoming a workload for pro Macs means the Mac Pro doesn't have a place, people who were using Mac Pros for audio or video capture didn't stop doing that media work and switched to AI as a profession. That market just wasn't big enough to sustain the Mac Pro in the first place and Apple has finally acknowledged that fact

alsetmusic · 2026-03-27T13:23:41 1774617821

I had a U-Audio PCI card in a Mac Pro during the Intel era of Macs. It was a chip to run their software plugins and the plugins are top of the line. I have a U-Audio box that runs over Thunderbolt now. I know there are people who need device slots, but it's vanishingly few. I'm disappointed that this category of machine is going away, but it stopped being for me in the Apple Silicon era.

grahamlee · 2026-03-27T12:29:35 1774614575

so many peripherals now come in external boxes that communicate _incredibly quickly_ over Thunderbolt 4/5 that the need for PCIe is marginal, while the cost to support it is significant.

dragonwriter · 2026-03-27T10:22:41 1774606961

> Apple really stumbled into making the perfect hardware for home inference machines

For LLMs. For inference with other kinds of models where the amount of compute needed relative to the amount of data transfer needed is higher, Apple is less ideal and systems worh lower memory bandwidth but more FLOPS shine. And if things like Google’s TurboQuant work out for efficient kv-cache quantization, Apple could lose a lot of that edge for LLM inference, too, since that would reduce the amount of data shuffling relative to compute for LLM inference.

diabllicseagull · 2026-03-27T13:16:19 1774617379

I'm not a big fan of reducing computing as a whole to just inference. Apple has done quite a bit besides that and it deserves credit. Mac Pro disappearing from the product line is a testament to it, that their compact solutions can cover all needs, not just local inference, to a degree that an expandable tower is not required at all.

embedding-shape · 2026-03-27T13:40:24 1774618824

> Mac Pro disappearing from the product line is a testament to it

Apple removing/adding something to their product line matters nothing, for all we know, they have a new version ready to be launched next month, or whatever. Unless you work at Apple and/or have any internal knowledge, this is all just guessing, not a "testament" to anything.

robotswantdata · 2026-03-27T07:24:42 1774596282

DGX workstations, expensive but allow PCI cards as well.

https://marketplace.nvidia.com/en-us/enterprise/personal-ai-...

fooker · 2026-03-27T09:42:12 1774604532

It's hilarious that not a single one of these has pricing listed anywhere public.

I don't think they expect anyone to actually buy these.

Most companies looking to buy these for developers would ideally have multiple people share one machine and that sort of an arrangement works much more naturally with a managed cloud machine instead of the tower format presented here.

Confirming my hypothesis, this category of devices more or less absent in the used market. The only DGX workstation on ebay has a GPU from 2017, several generations ago.

bluedino · 2026-03-27T13:42:03 1774618923

'Important' people in organizations get them. They either ask for them, or the team that manages the shared GPU resources gets tired of their shit and they just give them one.

chatmasta · 2026-03-27T11:23:10 1774610590

Nvidia doesn’t list prices because they don’t sell the machines themselves. If you click through each of those links, the prices are listed on the distributor’s website. For example the Dell Pro Max with GB10 is $4,194.34 and you can even click “Add to Cart.”

fooker · 2026-03-27T13:32:12 1774618332

I don't mean the small GB10s.

If you try to find the pricing of the GB300 towers even on the manufacturer sites, you'll see that it's not listed for any of the six or so models.

tecleandor · 2026-03-27T14:09:14 1774620554

Because that's a different price point, that's getting near 100K, and the availability is very limited. I don't think they're even selling it openly, just to a bunch of partners...

The MSI workstation is the one that is showing some pricing around. Seems like some distributors are quoting USD96K, and have a wait time of 4 to 6 weeks [0]. Other say 90K and also out of stock [1]

--

  0: https://www.cdw.com/product/msi-nvidia-gb300-wkstn-72c-grace-cpu/9087313?pfm=srh
  1: https://www.centralcomputer.com/msi-ct60-s8060-nvidia-dgx-station-cpu-memory-up-to-496gb-lpddr5x-nvidia-blackwell-ultra-gpu-1x-10-gbe-2x-400-gbe.html

numpad0 · 2026-03-27T10:46:57 1774608417

I don't think it's so odd, very few products above ~$50k have final prices listed for anyone to buy 1-click.

fooker · 2026-03-27T13:33:34 1774618414

Workstations above 50k are not that uncommon.

Older xeon based workstations easily reach that number.

tecleandor · 2026-03-27T14:12:50 1774620770

If you put a 50 or 80K workstation in the HP store, it will say:

"Purchasing limit reached. To complete your order and provide you with the best customer experience, please call 1-877-888-8235"

deelowe · 2026-03-27T12:31:37 1774614697

There were plenty of them around when I worked at Nvidia. They definitely exist.

fooker · 2026-03-27T13:39:31 1774618771

You have seen plenty of third party GB300 DGX workstations?

QuantumNomad_ · 2026-03-27T08:07:19 1774598839

How much do those workstations cost? All of the different manufacturers links on that page lack pricing info and you have to contact them for pricing.

fotcorn · 2026-03-27T09:05:06 1774602306

Cheapest i know if is around $96k

cudima · 2026-03-27T08:46:17 1774601177

$4000

SwtCyber · 2026-03-27T12:37:21 1774615041

The interesting question is whether they'll lean into it intentionally (better tooling, more ML-focused APIs) or just keep treating it as a side effect of their silicon design

chatmasta · 2026-03-27T13:00:01 1774616401

I think we’ll see a much more robust ecosystem develop around MLX now that agentic coding has reduced the barrier of porting and maintaining libraries to it.

spacedcowboy · 2026-03-27T08:23:32 1774599812

Agreed. I’m planning on selling my 512GB M3 Ultra Studio in the next week or so (I just wrenched my back so I’m on bed-rest for the next few days) with an eye to funding the M5 Ultra Studio when it’s announced at WWDC.

I can live without the RAM for a couple of months to get a good price for it, especially since Apple don’t sell that model (with the RAM) any more.

nottorp · 2026-03-27T13:54:36 1774619676

Hey didn't they drop the 512 Gb model?

https://appleinsider.com/articles/26/03/06/forget-512gb-ram-...

You may want to hold on to your M3 Ultra! There's no guarantee there will be a M5 Ultra with 512 Gb ram.

spacedcowboy · 2026-03-27T13:56:55 1774619815

I don’t actually use the memory anywhere near as much as I thought I would. 256GB would be fine for me :)

nottorp · 2026-03-27T14:13:42 1774620822

Heh, my main "heavy stuff" desktop only has 64GB.

But it feels really good to have more ram than you can think of a use for.

I have a faint memory of an interview ages ago with Stoutsrup I think where he mentioned as an aside he was using a workstation with 3.2 Gb of storage and 4 Gb of ram :)

wolfhumble · 2026-03-27T09:06:49 1774602409

Just out of curiosity, where do you think is the best place to sell a machine like that with the lowest risk of being scammed, while still getting the best possible price?

Wish you a speedy recovery for your back!

spacedcowboy · 2026-03-27T10:09:47 1774606187

> Just out of curiosity, where do you think is the best place to sell a machine like that with the lowest risk of being scammed, while still getting the best possible price?

There are none currently on eBay.co.uk, so I'm going to try there. I'll also try some of the reddit UK-specific groups.

As far as not being scammed - it's a really high value one-off sale, so it'll either be local pickup (and cash / bank-transfer at the time, which happens in seconds in the UK) or escrow.com (for non-eBay) with the buyer paying all the fees etc.

I'd prefer local pickup because then I have the money, the buyer can see it working, verify everything to their satisfaction etc. etc.

> Wish you a speedy recovery for your back!

Thank you :) It is a little better today. Sitting down is now tolerable for short periods... :)

Imustaskforhelp · 2026-03-27T10:43:27 1774608207

doesn't escrow.com charge a 50$/pound minimum fees.

I do know that Escrow.com is one of the most reputable escrow platforms, on a more personal note, I would love to know a escrow service where I can just sell the spare domains I have (I have got some .com/.net domains for 1$ back during a deal for a provider), is there any particular escrow service which might not charge a lot and I can get a few dollars from selling them as some of those domains aren't being used by me.

> Thank you :) It is a little better today. Sitting down is now tolerable for short periods... :)

I am wishing you speedy recovery as well. A cowboy gotta have a strong back :-)

spacedcowboy · 2026-03-27T12:02:17 1774612937

According to the calculator, it’d be about £280 assuming the purchase cost was £11k. I think that’s probably an upper-bound on the sale-price, though I can see bids of $20k on eBay.com for the same model.

I sold a domain via escrow.com a long time ago now (20 years or so) but the buyer paid fees, so I don’t know what they charge for that. You could try the calculator they have though (https://www.escrow.com/fee-calculator)

And thanks for the good wishes :)

ricardobayes · 2026-03-27T11:46:18 1774611978

Probably ebay

asimovDev · 2026-03-27T09:33:29 1774604009

lowest is probably an apple trade in if available, but i can't imagine how bad of a price hit it will be.

spacedcowboy · 2026-03-27T10:13:18 1774606398

I checked, it's terrible. They don't take into account the size of the RAM in the machine, so you get the base-model trade-in value (£1280). Yeah, no.

polshaw · 2026-03-27T10:41:15 1774608075

sounds like 100% risk of getting scammed

port11 · 2026-03-27T08:23:24 1774599804

As to better or cheaper homelab: depends on the build. AMD AI Max builds do exist, and they also use unified memory. I could argue the competition was, for a long time, selling much more affordable RAM, so you could get a better build outside Apple Silicon.

fooker · 2026-03-27T09:34:43 1774604083

The typical inference workloads have moved quite a bit in the last six months or so.

Your point would have been largely correct in the first half of 2025.

Now, you're going to have a much better experience with a couple of Nvidia GPUs.

This is because of two reasons - the reasoning models require a pretty high number of tokens per second to do anything useful. And we are seeing small quantized and distilled reasoning models working almost as well as the ones needing terabytes of memory.

tantalor · 2026-03-27T12:48:29 1774615709

> home inference machines.

The market for this use case is tiny

chatmasta · 2026-03-27T13:02:05 1774616525

For now. In a few years it will be part of every day life, because people will see Apple users enjoying it without thinking about it. You won’t consider it a “home inference machine,” just a laptop with more capabilities than any other vendor offers without a cloud subscription.

tannhaeuser · 2026-03-27T06:46:18 1774593978

For LLMs and other pure memory-bound workloads, but for eg. diffusion models their FPU SIMD performance is lacking.

kranke155 · 2026-03-27T12:34:30 1774614870

The new M chips beat basically any PC on video editing. Their new ProRes accelerator chiplet is so good they can’t even compete.

AJRF · 2026-03-27T13:47:18 1774619238

> Apple really stumbled into making the perfect hardware for home inference machines

Apple are winning a small battle for a market that they aren’t very good in. If you compare the performance of a 3090 and above vs any Apple hardware you would be insane to go with the Apple hardware.

When I hear someone say this it’s akin to hearing someone say Macs are good for gaming. It’s such a whiplash from what I know to be reality.

Or another jarring statement - Sam Altman saying Mario has an amazing story in that interview with Elon Musk. Mario has basically the minimum possible story to get you to move the analogue sticks. Few games have less story than Mario. Yet Sam called it amazing.

It’s a statement from someone who just doesn’t even understand the first thing about what they are talking about.

Sorry for the mini rant. I just keep hearing this apple thing over and over and it’s nonsense.

_zoltan_ · 2026-03-27T12:30:51 1774614651

how about the newly announced GB300 DGX Workstation?

hermanzegerman · 2026-03-27T09:25:35 1774603535

Framework offers the AI Ryzen Max with ̶1̶9̶6̶G̶B̶ 128GB of unified RAM for 2,699$

That's a pretty good deal I would think

https://frame.work/de/de/products/desktop-diy-amd-aimax300/c...

eigenspace · 2026-03-27T09:32:38 1774603958

The framework desktop is quite cool, but those Ryzen Max CPUs are still a pretty poor competitor to Apple's chips if what you care about it running an LLM. Ryzen Max tops out at 256 GB/s of memory bandwidth, whereas an M4 Max can hit 560 GB/s of bandwidth.

So even if the model fits in the memory buffer on the Ryzen Max, you're still going to hit something like half the tokens/second just because the GPU will be sitting around waiting for data.

Personally, I'd rather have the Framework machine, but if running local LLMs is your main goal, the offerings from Apple are very compelling, even when you adjust for the higher price on the Apple machine.

rl3 · 2026-03-27T11:03:21 1774609401

There's also the DGX Spark. Granted, its price has been going up recently alongside everything else that has memory in it.

eigenspace · 2026-03-27T11:16:41 1774610201

I haven't heard a single good think about DGX Spark from anyone using it, so I'd be pretty wary about that.

xienze · 2026-03-27T11:22:42 1774610562

That also has pretty poor memory bandwidth. 283GB/s I think.

rl3 · 2026-03-27T11:46:20 1774611980

Yeah. The main selling point I'd say is the onboard ConnectX-7 hardware.

freeqaz · 2026-03-27T09:37:15 1774604235

128gb is the max RAM that the current Strix Halo supports with ~250GB/s of bandwidth. The Mac Studio is 256GB max and ~900GB/s of memory bandwidth. They are in different categories of performance, even price-per-dollar is worse. (~$2700 for Framework Desktop vs $7500 for Mac Studio M3 Ultra)

rl3 · 2026-03-27T09:28:55 1774603735

128GB*

hermanzegerman · 2026-03-27T10:26:58 1774607218

Thanks for spotting the mistake. No Idea how I got to 192

rl3 · 2026-03-27T11:47:57 1774612077

For what it's worth, I really wish that was the actual number.

DeathArrow · 2026-03-27T08:08:27 1774598907

Still, running 2 to 4 5090 will beat anything Apple has to offer for both inference and training.

Tsiklon · 2026-03-27T11:55:00 1774612500

That won’t work for the home hobbyist 2.4KW of GPU alone plus a 350W threadripper pro with enough PCIe lanes to feed them. You’re looking at close to twice the average US household electricity circuit’s capacity just to run the machine under load.

A cluster of 4 Apple’s M3 ultra Mac studios by comparisons will consume near 1100W under load.

bluedino · 2026-03-27T13:43:38 1774619018

I mean if a hobbyist can run a welder or cnc machine in their home workshop...

rubyn00bie · 2026-03-27T05:22:35 1774588955

I don't think Apple just stumbled into it, and while I totally agree that Apple is killing it with their unified memory, I think we're going to see a pivot from NVidia and AMD. The biggest reason, I think, is: OpenAI has committed to enormous amount capex it simply cannot afford. It does not have the lead it once did, and most end-users simply do not care. There are no network effects. Anthropic at this point has completely consumed, as far as I can tell, the developer market. The one market that is actually passionate about AI. That's largely due to huge advantage of the developer space being, end users cannot tell if an "AI" coded it or a human did. That's not true for almost every other application of AI at this point.

If the OpenAI domino falls, and I'd be happy to admit if I'm wrong, we're going to see a near catastrophic drop in prices for RAM and demand by the hyperscalers to well... scale. That massive drop will be completely and utterly OpenAI's fault for attempting to bite off more than it can chew. In order to shore up demand, we'll see NVidia and AMD start selling directly to consumers. We, developers, are consumers and drive demand at the enterprises we work for based on what keeps us both engaged and productive... the end result being: the ol' profit flywheel spinning.

Both NVidia and AMD are capable of building GPUs that absolutely wreck Apple's best. A huge reason for this is Apple needs unified memory to keep their money maker (laptops) profitable and performant; and while, it helps their profitability it also forces them into less performant solutions. If NVidia dropped a 128GB GPU with GDDR7 at $4k-- absolutely no one would be looking for a Mac for inference. My 5090 is unbelievably fast at inference even if it can't load gigantic models, and quite frankly the 6-bit quantized versions of Qwen 3.5 are fantastic, but if it could load larger open weight models I wouldn't even bother checking Apple's pricing page.

tldr; competition is as stiff as it is vicious-- Apple's "lead" in inference is only because NVidia and AMD are raking in cash selling to hyperscalers. If that cash cow goes tits up, there's no reason to assume NVidia and AMD won't definitively pull the the rug out from Apple.

AnthonyMouse · 2026-03-27T08:36:39 1774600599

> A huge reason for this is Apple needs unified memory to keep their money maker (laptops) profitable and performant

None of the things people care about really get much out of "unified memory". GPUs need a lot of memory bandwidth, but CPUs generally don't and it's rare to find something which is memory bandwidth bound on a CPU that doesn't run better on a GPU to begin with. Not having to copy data between the CPU and GPU is nice on paper but again there isn't much in the way of workloads where that was a significant bottleneck.

The "weird" thing Apple is doing is using normal DDR5 with a wider-than-normal memory bus to feed their GPUs instead of using GDDR or HBM. The disadvantage of this is that it has less memory bandwidth than GDDR for the same width of the memory bus. The advantage is that normal RAM costs less than GDDR. Combined with the discrete GPU market using "amount of VRAM" as the big feature for market segmentation, a Mac with >32GB of "VRAM" ended up being interesting even if it only had half as much memory bandwidth, because it still had more than a typical PC iGPU.

The sad part is that DDR5 is the thing that doesn't need to be soldered, unlike GDDR. But then Apple solders it anyway.

actionfromafar · 2026-03-27T10:13:44 1774606424

> Not having to copy data between the CPU and GPU is nice on paper but again there isn't much in the way of workloads where that was a significant bottleneck.

Isn't that also because that's world we have optimized workloads for?

If the common hardware had unified memory, software would have exploited that I imagine. Hardware and software is in a co-evolutionary loop.

wolfhumble · 2026-03-27T09:29:34 1774603774

> tldr; competition is as stiff as it is vicious-- Apple's "lead" in inference is only because NVidia and AMD are raking in cash selling to hyperscalers. If that cash cow goes tits up, there's no reason to assume NVidia and AMD won't definitively pull the the rug out from Apple.

These companies always try to preserve price segmentation, so I don’t have high hopes they’d actually do that. Consumer machines still get artificially held back on basic things like ECC memory, after all . . .

pjmlp · 2026-03-27T08:23:33 1774599813

No one cares about Metal in that space, plus CUDA already has unified memory for a while.

https://docs.nvidia.com/cuda/cuda-programming-guide/04-speci...

Can we also stop giving Apple some prize for unified memory?

It was the way of doing graphics programming on home computers, consoles and arcades, before dedicated 3D cards became a thing on PC and UNIX workstations.

UqWBcuFx6NV4r · 2026-03-27T08:39:50 1774600790

Can we please stop treating this like some 2000s Mac vs PC flame war where you feel the need go full whataboutism whenever anyone acknowledges any positive attribute of any Apple product? If you actually read back over the comments you’re replying to, you’ll see that you’re not actually correcting anything that anyone actually said. This shit is so tiring.

pjmlp · 2026-03-27T09:40:37 1774604437

You mean like the Neo marketing materials put out by Apple?

hermanzegerman · 2026-03-27T09:27:01 1774603621

Nvidia is definitely preparing for this with the Opensource LLMs they are currently developing

chatmasta · 2026-03-27T01:34:24 1774575264

> That boundary is deliberate: the public box has no access to private data.

Challenge accepted? It’d be fun to put this to the test by putting a CTF flag on the private box at a location nully isn’t supposed to be able to access. If someone sends you the flag, you owe them 50 bucks :)

chatmasta · 2026-03-25T23:01:56 1774479716

The gap in your example is that a human had to realize the system is broken so that he could nudge the agent into fixing it. He can fix that gap by updating the agent to recognize when the system breaks. This now becomes the level at which he debugs… did the agent recognize the failure and self-heal, or not?

And at that point, if the autonomous system breaks, realized it’s broken, and fixes itself before you even notice… then do you need to care whether you learn from it? I suppose this could obfuscate some shared root cause that gets worse and worse, but if your system is robust and fault-tolerant _and_ self-heals, then what is there to complain about? Probably plenty, but now you can complain about one higher level of abstraction.

chatmasta · 2026-03-25T22:29:17 1774477757

I’m from the US but reside indefinitely in the UK, and I’ve dodged all this crap (age verification, disabling advanced protection) by simply remaining in the US App Store. It has some downsides like I can’t download the Vodafone UK app but nowadays most apps are available globally.

And herein lies the absurdity of the whole legal framework in the first place. Does it apply to tourists? Residents? Citizens? Citizens traveling abroad?

chatmasta · 2026-03-25T21:32:18 1774474338

Is “this could have been one-shotted with Sonnet” the new “I could build this in a weekend?”

stirfish · 2026-03-26T01:20:11 1774488011

I always stop to ask myself, "but did I?" Usually I haven't, and that takes the edge off for me