More

deng · 2026-06-13T15:34:37 1781364877

I can understand the joy of running things yourself, and can also see the privacy aspect. However, I pay ~3$ per 1M/tokens for that model on Openrouter, and it's not even quantized. A refurbished 3090 and a 5080 will set you back well over 2k, not to mention the electricity to run them...

redfloatplane · 2026-06-13T15:43:14 1781365394

> I pay ~3$ per 1M/tokens for that model on Openrouter

I think the thing is, there's an unspoken "for now" at the end of that sentence and people running this locally are hedging against that "for now". Some people prefer to feel that they own the means rather than rent the means, even if the one they own is worse than the one they can rent. Especially with today's Fable news and the harsh realisation that the "for now" is dependent on very many unpredictable factors, where the one you have locally costs you capital today and a relatively predictable run-rate (made more predictable with on-prem solar for example), but should otherwise work predictably forever.

I'm not saying that you're wrong to do what you're doing, just that many people have their own lines in the sand where renting vs buying makes sense, and it doesn't only boil down to a rational (or irrational) financial decision.

jubilanti · 2026-06-13T16:17:20 1781367440

You're treating open weight inference providers the same as proprietary ones. They're fundamentally different business models. Proprietary companies have an incentive to subsidize actual inference and training costs in order to gain market share. The few dozen or so companies selling Qwen models by the token on openrouter are in a commodities market.

If suddenly the CCP declared a total digital embargo on Alibaba's Qwen models or even if for some reason all of mainland China (and Singapore) was completely unreachable from the rest of the world, the dozen or so companies selling Qwen by the token elsewhere in the world could continue business as usual.

bee_rider · 2026-06-13T18:36:35 1781375795

I don’t know anything about the open weight host business model. Do we know for certain that the folks selling inference by the token are really selling them in an upfront and profitable way? No subsidies from harvesting the info, to sell to the model trainers or anything like that?

usrusr · 2026-06-13T22:06:59 1781388419

Or subsidies from hopeful investors sweet-talked into not understanding the commodity nature of the business they are investing in. But that does not change much about the general assessment.

Chances are the typical story goes founders start fully believing that they would succeed with their own innovation but slip down a gradient towards commodity provider without really noticing themselves.

redfloatplane · 2026-06-13T16:23:28 1781367808

I was thinking of user-side regulations as well, not only provider-side ones. I could imagine a world where a government rules that you may not use LLMs for anything, which would be much easier to get around if you have local means.

alexjplant · 2026-06-13T17:51:48 1781373108

I've spent the past week trying to scheme a way to get affordable local inference of something useful (Qwen3.6-36B-A3B) for ~$500 and have come to the conclusion that it simply isn't viable. A pair of power-restricted P100s in a workstation gets close but the workstations themselves are expensive and rare as hen's teeth (not to mention loud and large). I think early '27 will be when things open up as the hardware market unclenches and further strides are made in small capable models.

mappu · 2026-06-13T23:33:30 1781393610

I'm running Qwen3.6-35B-A3B on a very ordinary desktop PC (32GB DDR5, 8GB Radeon 6600XT) and getting a useful 15-20 tok/sec out of it. The MoE architecture and auto offloading from system to VRAM is just fantastic. Unsloth Q4_K_XL.

The Qwen3.6-27B is unbearably slow as it doesn't fit in VRAM, though, i think the MoE is very easy to run.

It is also extremely nice that you can just `apt install llama.cpp libggml0-backend-vulkan` now too.

ozim · 2026-06-14T05:48:57 1781416137

I wonder what parent poster means with „useful” and what he actually tried? Feels like he was just comparing some benchmarks.

Yesterday I downloaded Gemma4-26B with Ollama on quite rusty desktop with 1070 8gb and 32gb of ram and Core i5-9400.

I drop photo of my water meter and tell it to read the value and serial number. It was far from instant but it was also easily under 3 minutes and result was correct.

Earlier like in February I was trying the same photo with Gemma3 on the same hardware and results were bad.

ThunderSizzle · 2026-06-13T16:11:40 1781367100

An R9700 is $1350 and can get 100 TPS running Qwen3.6-35B-A3B Q5 with 130k context window (with room to spare) with a bit of fine tuning llamacpp-vulkan, but llamacpp's repository instability and lack of real versioning frustrates me.

In terms of electricity, if you aren't using it, even with all the vram loaded, at most your wasting about 30 watts or so.

Prompt processing a large uncached context is annoying, which is why I forced a lower context window, but I don't know if it's any worse in performance than the cloud models I've used.

There's a niceness, to me, knowing I don't have to rent it anymore. If you rent it, the terms can change regularly.

rsync · 2026-06-13T19:26:47 1781378807

"An R9700 is $1350 and can get 100 TPS running Qwen3.6-35B-A3B Q5 with 130k context window ..."

How would that change (improve) if you had two R9700 in a similar configuration ?

vardalab · 2026-06-13T19:56:40 1781380600

better prompt processing like 1.5x+ and more kv but tg most likely lower like 0.8x or so but I am just going by memory for Qwen3.5 without mtp.

bertili · 2026-06-13T16:46:07 1781369167

Qwen 27b is a compute heavy dense model.

PeterStuer · 2026-06-13T16:48:28 1781369308

When they declare open models a 'security risk', his setup will be running, yours will not and even that 3090 will be way outside of your reach.

medfield · 2026-06-13T16:13:06 1781367186

I use local models to explore, hosted models to refine. I somewhat envy those who can sustain local models (q8 120b+) running as a hobby.... for me, the practical path is a better SearXNG setup and knowing my routes forward.

alexhans · 2026-06-13T18:34:01 1781375641

I think it's important to be able to do both so you can stay in control of the price to value created relationship.

In last year, some people were publishing aider /ollama/open router [1] and now thankfully people are publishing all around about pi/qwen/llama.cpp/openrouter. It's widespread.

[1] https://alexhans.github.io/posts/aider-with-open-router.html

sixothree · 2026-06-14T08:16:46 1781425006

You also aren't limited to LLMS. Vision, whisper, etc. You can even have claude farm out tasks to your local servers.

TSiege · 2026-06-13T15:38:55 1781365135

It’s a personal hobby project why should we care this is how someone chooses to spend their free time and money? Lots of hobbies are expensive and pointless if you think of commercially available offerings. That’s why it’s a hobby and not a small business

toyg · 2026-06-13T16:04:18 1781366658

Yeah but they can also be used to play games and do other stuff.

amelius · 2026-06-13T17:39:26 1781372366

You are paying with your privacy ...

pier25 · 2026-06-13T19:08:38 1781377718

> not to mention the electricity to run them...

And noise.

NicoJuicy · 2026-06-13T15:56:01 1781366161

Rtx 3090 24 gb set me back 390€ a year ago ( 2nd hand)

rirze · 2026-06-13T16:10:57 1781367057

Was it still in good condition? That price makes me wonder if it was used for crypto mining, which can wear down the hardware.

gsora · 2026-06-13T16:22:31 1781367751

Any sane crypto miner undervolted and underclocked their GPUs for efficiency's sake; if anything, they went through less wear than, say, regular gaming.

Der_Einzige · 2026-06-13T15:50:58 1781365858

Openrouter doesn't give you access to the models internals, i.e. complete control of logprobs, sampler stack, any PeFTs.

Openrouter fking sucks and I don't know why people here act like it's so great. Stop using it if you care about local AI and accept that the cost you'll pay for tokens is higher than you will when consumed via any cloud. That's the price for privacy, control, and better quality via inference time optimizations that otherwise aren't available.

jubilanti · 2026-06-13T16:24:28 1781367868

> Openrouter doesn't give you access to the models internals, i.e. complete control of logprobs, sampler stack, any PeFTs.

Openrouter gives you access to whatever the inference provider gives. They're just the middleman. Many providers give logprobs if you ask, it's in their API. And yeah, no Peft or Lora, but that's an entirely different product. And some of the inference providers do that directly.

> Openrouter fking sucks and I don't know why people here act like it's so great. Stop using it if you care about local AI

But the whole point of openrouter is that you can run models by the token and you don't have to care about local AI? Sounds like you're more upset that people aren't making the same calculation on privacy and local control vs cost and ease of use.

deng · 2026-06-13T12:42:31 1781354551

That engineer went on to create Brave, a browser that pays you Monopoly money for watching ads, injected affiliate links, installed their commercial VPN without asking, and leaked DNS traffic when using Tor in its "privacy" mode. I'd say Mozilla dodged a bullet there.

deng · 2026-06-03T06:58:52 1780469932

I wouldn't say these are "basic bugs". The first is specific to using 'rrsync', and the second is when using the rsync daemon, and I can't remember when I last saw a system using that one (yes, I'm aware there are still use cases for using the rsync protocol, but I would consider it pretty obscure nowadays).

You could argue that he should've bumped the version more and should've done a longer beta test, but on the other hand, these were mostly security fixes, and I can understand he wanted to get them out there rather sooner than later (also "doing a beta test" is easier said than done - how do you get people to run a test version of rsync?).

deng · 2026-06-01T10:32:47 1780309967

Nice post and technically impressive work. I agree we need to understand the build pipeline and be able to do things locally. However, depending on your electricity cost, it might not make sense financially. These old servers are not energy efficient at all (I'm guessing that old Xeon server will easily pull 200W on load), and that model is currently at 0.1$/0.3$ per 1M tokens (with 76 tps and 262k context) in Openrouter (also, these servers are LOUD).

EDIT: I stand corrected, 200W is apparently way too high of an estimate. I used to run a bunch of old Xeon servers and they slurped watts like crazy, but I can't remember which ones exactly those were.

toast0 · 2026-06-01T10:53:25 1780311205

2620v4 is not a power slurping beast. Depending on the server board, it might not be either. Servers are often loud, but it depends.

There's a lot of budget hosting built around chips like these, and they're suprisingly power efficient.

jansommer · 2026-06-01T10:36:34 1780310194

It should be closer to 85W on load. And it's incredibly silent on even a low end cooler. I rarely get above 50° Celcius.

deng · 2026-06-01T10:48:04 1780310884

OK, then you're in luck. I had a bunch of old 1U rack servers and even in the next room it was too annoying to run them (they had a bunch of 40mm fans which always ran at full speed, because in a server room, no one can hear you scream).

jansommer · 2026-06-01T11:01:35 1780311695

Could it just be really bad cooling? Looking at 9800X3D, it seems like it's running in a similar range wrt TDP unless you really push the 9800X3D. I'm comparing with desktop cpu's because that's what my workload is. cpu governor is set to performance (no schedutil). No audible change in fan speed during heavy compilation or gaming (very silent humming), and i don't have any fans beside cheap intake, cpu and exhaust fans (1 each) + an excessive amount of dust.

deng · 2026-06-01T11:15:01 1780312501

These servers had no fan control whatsoever, they always ran full blast. That's not untypical for rack servers, because as written: they are designed for server rooms, and you're supposed to wear ear protection there anyway... Yes, I could've modified them, but I ditched them because running them simply made no sense (especially the high idle power consumption was ridiculous).

jabroni_salad · 2026-06-01T16:52:18 1780332738

Yeah, 1u is gonna do that. Get something that can accommodate a big tower air cooler such as the Hyper 212 and your airflow will be quieter than the disks.

I don't run it anymore but my old server was a dual xeon (with two of those coolers crammed in) and I rarely heard a peep out of it.

irusensei · 2026-06-01T17:17:22 1780334242

Small fans need to spin faster so these can be very high pitch even if you stuff some Noctua 40mm fans into it.

consp · 2026-06-01T10:50:11 1780311011

Only when you remove it from the original server or enable low fan mode (if available). Most 1U/2U cases will happily blow at full speed well over 90db.

You likely need to replace the flow-through server chassis system with an active "normal" cooler to achieve a bit of silence.

85W might be about right. My old server CPU is in the same ballpark and compiling kernels it reached about 90w in power usage. If you want to keep it running: idle is not very low power unless you have one of the "low power" L versions, keep that in mind.

tjoff · 2026-06-01T11:09:44 1780312184

Get a 4U case, many options if you want to combine it with a NAS. Not hard to cool and keep somewhat quiet. If you can store it in a closet or something that helps too.

Well, you can use it for lots of other things as well.

Compared to the cloud you can probably save up to buy a new server every month. And don't underestimate the gains of having something to experiment on and play with.

ciupicri · 2026-06-01T12:26:51 1780316811

85W for the whole system?! The specifications for the CPU mention a TDP of 85W [1].

[1] https://www.intel.com/content/www/us/en/products/sku/92986/i...

actionfromafar · 2026-06-01T13:42:34 1780321354

But for LLM work the CPU is mostly idle, waiting for new data - so the CPU itself might not pull much power at all.

naasking · 2026-06-01T12:22:58 1780316578

These servers are loud if you're trying to fit them into a 1U or 2U, which requires high speed fans to generate the necessary static pressure to push air through the case. I run a similar setup in a 4U case with slow 120mm fans and it's fine.

deng · 2026-05-29T17:10:45 1780074645

Well, if Apple killed it, Lenovo killed it even more. I recently was looking for a laptop for a student. The Lenovo E14 Gen7 is 800 Euros here in Germany (where prices are always higher, the MacBook Neo is 700 Euros), it has 16GB of RAM, 1TB SSD, a 2.8k IPS display, a Intel Ultra5 12core CPU, and it has a repairability score of 9/10 from ifixit. Framework doesn't even come close to that package.

joe_mamba · 2026-05-29T17:20:45 1780075245

Same thought, as an owner of a similar Lenovo, that's top bang for the buck. Also, matte screen and hinge that opens 180 degrees is something the Neo and most Macs doesn't have.

Though I assume the Apple clientele is always different than those shopping for PCs, and doesn't care about specs, they just want MacOS and the Apple ecosystem, most likely they already have an iPhone or are planning to get one anyway so then a Macbook is the only thing on their radar. Those people aren't really shopping for PCs anyway unless they need some Windows/Linux exclusive apps like CAD/CAE.

But if you want to run linux and game then that Lenovo would be a good deal.

Similar to the Framework, it has its own niche clientele who values the company motto, tinkering and repairability aspects way more than the value proposition. Most likely they run Linux too.

There's something for everyone.

pseudosavant · 2026-05-29T17:24:03 1780075443

It is funny how Mac OS is a draw for some, when it is the main reason I don't use a Mac. Their hardware is excellent, but when I've tried using a Mac as my main machine, my productivity suffered. The only part of the Apple ecosystem I wish I could get on Windows is iMessage, and maybe FaceTime.

MBCook · 2026-05-29T21:17:27 1780089447

I’m way more productive on a Mac.

Different strokes.

Zak · 2026-05-29T18:35:59 1780079759

> The only part of the Apple ecosystem I wish I could get on Windows is iMessage, and maybe FaceTime.

It annoys me that these are such a draw. There are a dozen other viable messaging and video call apps, but there's always someone who feels like spending two minutes to install and activate one is a major imposition.

MBCook · 2026-05-29T21:20:26 1780089626

I like Apple hardware. I like the Apple integration. I like the hardware quality. I LOVE the silence of the M series machines.

But for me you’re right. More than anything, I’m not giving up Mac OS. Despite Tahoe, which I do severely dislike, I’m still far happier using it daily than Windows or Linux.

Until that changes, or the hardware gets bad enough (it’s going in the other direction), I’m not leaving. I don’t even look at other options for my real computers.

“Toy” computers that I want to throw Linux or BSD or something on just to play with, yeah of course. But not what I want to use all day every day.

Zak · 2026-05-29T17:26:11 1780075571

Framework is definitely premium-priced, but I don't think most people are cross-shopping the Framework 12 (a 12" convertible tablet) and the Thinkpad E14 (a 14" dedicated laptop).

NooneAtAll3 · 2026-05-29T18:05:41 1780077941

so it's competing with Framework 14?

Zak · 2026-05-29T18:30:29 1780079429

No such model exists. The Framework 13 comes closest, but a 13" screen and a premium shell would compete more directly with the Thinkpad X13.

Direct price comparisons get tricky because different buyers care about different details. I really like the Thinkpad's Trackpoint, for example, but I also like the Framework's 3:2 aspect ratio. I'd have a hard time choosing.

Rebelgecko · 2026-05-29T18:46:55 1780080415

Ymmv but the 16:10 screen on the framework punches a bit above its weight compared to 16:9 screens with a similar diagonal measurement

Zak · 2026-05-29T18:49:15 1780080555

The Framework 13 has an aspect ratio of 3:2, not 16:10. The Thinkpad X13 has an aspect ratio of 16:10.

hmstx · 2026-05-29T19:32:03 1780083123

Dammit. I got an IdeaPad of similar price in december 2024. It didn't have one of the fancier displays from the era but still a decent option, it has 16Gb and I thought I'd try a Ryzen mobile thing that time. Wish I'd gone for the Thinkpad E series had I known about it then : that lower-end IdeaPad feels like trash.

SSD IO is sluggish, fans always spin when plugged in, audio crackles if I so much as scroll a page while a youtube video is playing, the keyboard might be the worst I've touched in many, many years, the 3.5mm audio jack wore out into intermittent connectivity within a couple of months. At least the display still looks good. Went through the windows optimization motions with it too. My x230 with an i5 still has lower and more stable DPC latency and has remained my DJ laptop.

registeredcorn · 2026-05-29T19:13:15 1780081995

> and it has a repairability score of 9/10 from ifixit

Do you mean a 6/10? The only score I saw for the neo on iFixIt is here: https://www.ifixit.com/News/116152/macbook-neo-is-the-most-r...

I checked the "Laptop repairability scores" page and the Neo doesn't appear to be listed. https://www.ifixit.com/repairability/laptop-repairability-sc...

jerlam · 2026-05-29T19:24:53 1780082693

They mean the Lenovo laptop has a 9/10 repairability, not the Apple Neo.

registeredcorn · 2026-05-30T03:14:13 1780110853

Ah, my mistake. Thanks!

pjmlp · 2026-05-31T07:08:10 1780211290

This is what I keep telling, Neo outside US isn't a good deal, given the price/hardware combo.

throw1234567891 · 2026-05-29T17:22:12 1780075332

16GB of RAM? Good for browsing the internet and nothing else.

deng · 2026-05-27T21:43:55 1779918235

Looking at all the unmerged pull requests in ripgrep, you can see what's going on. I will not link him here, but for instance, there's a "Senior Software Engineer at Microsoft", whose agent created 260 PRs in 211 repos with trivial typo fixes in code comments(!). Almost all of them are rejected (including those in ripgrep), but of course, a few get merged and he now boasts he "contributed" to sqlalchemy, Nim and others... What a time to be alive.

stevekemp · 2026-05-28T07:34:33 1779953673

I used my human eyes to submit updates to Redis and Git, fixing typos in comments.

Sure it's low-hanging fruit, but if you're looking at the code it's good to have the comments be readable and not full of typos.

(That said this was a few years ago, and there were no LLMs at that point. I didn't go out of my way to make trivial contributions, but I figured since I saw the "problems" I should submit a patch to fix them.)

ramon156 · 2026-05-28T06:12:40 1779948760

Drive-by PRs have been an issue before, but with AI it's just getting disrespectful

deng · 2026-05-18T06:10:36 1779084636

You can't release this under MIT license, as it contains a ton of various different things under various different licenses, from GPL to proprietary.

tech4bot · 2026-05-18T06:24:19 1779085459

The project itself is MIT, meaning the scripts/docs I wrote. Everything else remains under its respective upstream license, GPL, vendor/proprietary blobs, debian packages, firmware, etc.

I did mention this in the license section, last line of the README.

deng · 2026-05-17T15:01:21 1779030081

But it's also plug&play for anyone stealing your laptop, see for instance

https://qht.co/item?id=39941021

deng · 2026-05-12T05:53:08 1778565188

Oh well, they really do their best to alienate people as well. They just completely overhauled their UX, and after that update, people at my company were so confused, they couldn't even open new issues anymore, because everything was somehow renamed to "work items". I kid you not, literally two decades of UX people were used to, just thrown out the window, it's absolutely mind-boggling. The feedback to this is devastating:

https://gitlab.com/gitlab-org/gitlab/-/work_items/590689

deng · 2026-05-12T05:41:40 1778564500

> Nicholas Carlini, a researcher at Anthropic, orchestrated 16 parallel Claude agents to write a production C compiler in Rust.

No he didn't. The compiler is bascially useless as it produces vastly inferior code than gcc/clang.