I don't think MLX supports similar 2-bit quants, so I never tried 397B with MLX.
However I did try 4-bit MLX with other Qwen 3.5 models and yes it is significantly faster. I still prefer llama.cpp due to it being a one in all package:
- SOTA dynamic quants (especially ik_llama.cpp)
- amazing web ui with MCP support
- anthropic/openai compatible endpoints (means it can be used with virtually any harness)
- JSON constrained output which basically ensures tool call correctness.
- routing mode
Why would they have that feature in claude code cli if it goes against the ToS? You can use Claude Code programatically. This is not the issue. The issue is that Anthropic wants to lock you in within their dev ecosystem (like Apple does). Simple as that.
allowed shell pipes doesn't necessarily mean they want loops running them.
One of the economic tuning features of an LLM is to nudge the LLM into reaching conclusions and spending the tokens you want it to spend for the question.
presumably everyone running a form of ralph loop against every single workload is a doomsday situation for LLM providers.
> allowed shell pipes doesn't necessarily mean they want loops running them.
insane that people apologize for this at all. we went from FOSS software being standard to a proprietary cli/tui using proprietary models behind a subscription. how quickly we give our freedom away.
Anthropic itself advertised their own implementation of agentic loop (Ralph plugin). Sure, it worked via their official plugin, but the end result for Anthropic would be the same.
There's nothing in TOS that prevents you from running agentic loops.
I don't know why this is downvoted, see my nephew (?) comment [0] for a longer version, but this is not at all clear IMHO. I'm not sure if a "claude -p" on a cron is allowed or not with my subscription, if I run it on another server is it? Can I parse the output of claude (JSON) and have another "claude -p" instance work on the response? It's only a hop, skip, and a jump over to OpenClaw it seems, which is _not_ allowed. But at what point did we cross the line?
It feels like the only safe thing to do is use Claude Code, which, thankfully, I find tolerable, but unfortunate.
Or can you? It's my understanding that you cannot use your subscription with the Agent SDK, that's what the docs say:
> Unless previously approved, Anthropic does not allow third party developers to offer claude.ai login or rate limits for their products, including agents built on the Claude Agent SDK. Please use the API key authentication methods described in this document instead.
Though there was that tweet [0] a while back by someone from Anthropic that just muddied the water. It's frustrating because I feel like the line between the Agent SDK and `claude -p` is not that large but one can use the subscription and one can't... or we don't know, the docs seem unambiguous but the tweet confuses things and you can find many people online saying you can, or you can't.
I'd love to play around with the Agent SDK and try out some automations but it seems I can only do that if I'm willing to pay for tokens, even though I could use Claude Code to write the code "for" the Agent SDK, but not "run" the Agent SDK.
Where is the line? Agent SDK is not allowed with subscription, but if I write a harness around passing data to and parsing the JSON response from `claude -p '<Your Prompt>' --output-format json` would that be allowed? If I run it on a cron locally? I literally have no idea and, not wanting my account to be banned, I'm not interested in finding out. I wish they would clarify it.
How is Peter "early in their career"? When he sold PSPDFKit for 100mio in 2020 he had been working on it for 13 years, and before that he'd worked as an engineer.
Is there a reliable way to run MLX models? On my M1 Max, LM Studio seems to output garbage through the API server sometimes even when the LM Studio chat with the same model is perfectly fine. llama.cpp variants generally always just work.
Instead of win key, you can press F3, or just set a hotkey that works for you in the System Preferences
Instead of clicking the red maximize button, you can double-click the window header / title. This will use an algorithm to try to resize the window to the best size for its content.
Technically it’s zoom, and how it functions is dependent on the app. In Finder it used to resize the window to a size that contained all the icons. Clicking it again would revert the window size.
The app still gets to decide though! Most programs do go full size with an alt+green click, but not all. A column-style Finder window, for example, seems to go taller but no wider.
I like doing side projects, I don't like wasting a day of work potential on any of these web apps: Google Cloud, AWS, Azure, Appstore Connect, Google's Android App Store, RevenueCat, Stripe, etc
I dread having to log in to these systems and waste hours achieving the simplest tasks.
This is what I'm using Claude for. E.g. I log in to AppStore connect, tell it what I need (3 subscription tiers), it will do all the clicking and editing and Apple's stupid UI, then I will ask it to create a summary for RevenueCat, and use another Claude session in there to click all the buttons to configure based on what just happened in Appstore connect.
Have you compared against MLX? Sometimes I’m getting much faster responses but it feels like the quality is worse (eg tool calls not working, etc)
reply