"Flash-Lite" is a different product from "Flash", which is more expensive. They couldn't be more confusing with their naming though, especially since they have 3.1 Pro and not 3.1 Flash non-lite.
It depends on the use-case. yes, 90% of cost is cache in agentic coding scenarios (actually 95% in my experience). But not when the model reasons for 200k+ tokens before answering a complex problem.
$0.15 / million tokens
$1.00 / 1,000,000 tokens per hour (storage price)
I much prefer the OpenAI/DeepSeek way of pricing caching where you don't have to think about storage price at all - you pay for cached tokens if you reuse the same prefix within a (loosely defined) time period.
I confirmed this by running a bunch of prompts through Gemini 3.5 Flash without doing anything special to configure caching and noting that it comes back with a "cachedContentTokenCount" on many of the responses.
In our experience, caching is not very reliable with google. We always get random cache misses that don't happen with other providers. We find OpenAI, Anthropic and Fireworks (which we use a lot) all have higher cache hit rates. So it's not only about the costs of cached token but also what kind of cached hit rate you get.
In my experience Google is the most flaky in general, which is surprising considering the rock solid history of their search and other products. Just more likely not to respond at all, to give a response out of left field, to handle the same error in 12 different ways randomly (a rainbow of HTTP status codes and error messages), etc etc.
I agree. The https://aistudio.google.com/ is shockingly bad. I'm not sure I've ever used such a flaky Google service before. It's so much worse than Gmail or Google, not to mention ChatGPT or Claude or DeepSeek or Kimi or Midjourney web interfaces. The bizarre janky integration with your Google Drive, or Gemini or NBPs randomly erroring out, often indefinitely. I've had sessions refresh themselves and just... disappearing. Or when you get frustrated with a buggy dead session and hit 'new session' and have to wait minutes for 'saving...' to happen.
Exactly our experience too. Effectively we catch these and on these status codes, we send to OpenAI. Retrying the same query in Gemini has high chance to give kind-of the same status code.
yah, which means that the input cost is the only value that should be paid attention to at the end + the cache discount (x10). If google would start offering x20 discount it would make it twice as cheap while input and output stayed the same.
The model is (like Composer 2) based on Kimi K2.5 and they claim SOTA performance for 1/10th of the cost. The tweet also mentions that they've started a new model from scratch on Colossus 2 (xAI/SpaceX Cluster). Really impressive how they've made this jump from being called the vscode fork with no moat just a couple of months ago.
I am not sure we should dismiss what they have today. Nobody has yet to come close with a full package ide that works well for coding. Is that not a moat? It is easy for my to in my head discount it, thinking that I could build something myself but between autocomplete and their workflow for agent use, it feels like they have some tangible moat emerging.
If we ignore cost (which is kinda hard to ignore), I feel Codex kinda' does it for me. Sure it's not really an editor but I find I don't need that _that much_ and it's easy to launch an external editor (they actually have the feature).
The ironic thing is that half a year ago, after trying factory.ai I thought chat-first interface was a stupid idea that will never work.
A few years ago I tried Zed when it was still pretty early, but eventually settled on Cursor. I gave Zed another shot a few days ago because Cursor’s worktree support still feels pretty weak.
In my setup I use multiple agents like Claude Code and Codex, and Zed’s ACP support makes it pretty nice to manage them all as “threads” in one place. Worktree switching also feels much smoother.
Overall the experience was pretty good, but the way the agent and editor are integrated still feels a bit lacking, and tab completion is the big one for me. Cursor’s tab completion is still the best I’ve used.
So now I’m using both. For work that needs a lot of focus and careful iteration, I use Cursor. For things that are easy to split into worktrees and hand off to agents, I use Zed with Claude/Codex.
The tab completion is "faster than vim" from a long-time vimmer. It's at the point where a lot of times i'll lead with the comment instead of the code:
# now take the list and sort by x.lastName
<tab>
...and it'll "do the thing" (w/ type hints, its own comments, etc). Obviously in this very simple, understandable, completely contrived example, it's "trivial" (but 3 years ago would have seemed like magic), but it'll also pick up on "continuation / more of the same" type edits. A comment like `# use random_utility to call the api and only accept matches which supplement addresses that have already been found` will (usually) autocomplete all the gobbledy-gook w.r.t. tokens, URL's, function names, etc. so it's effectively an "automatic omni-complete with simplistic post-processing"
Example #2: I was just fixing some vibe-coded slop, where it was taking `click.echo( some_api.whatever_endpoint() )` and the "slop" portion was literally emitting: `str('{ "A": 1, "B": 2 }')` and that function call was emitting it directly.
On the command line, I was doing `blah whatever-endpoint --something | jq '.'` and got tired of the JQ thing, so I'm like: "I'll just use `json.dumps(...,indent=2)`", but lo and behold, I'm getting a dumb JSON string literal, not a pretty printed object shape.
I start typing `json.loads(` to move from "str()" to "dict()" ... and it autocompletes the whole scenario (on that line), then I move to `def some_other_endpoint` and it basically has that same edit queued up. (ie: it "knows" what i'm about to do).
...so overall, "faster than vim", even with high skill bar for repetition, motion, macros, sed-style edits, etc. You can't beat: "<tab>", especially when it's lightly intelligent (ie: knows when/what/str/int, adapts do different function calls, etc).
I've been using cursor for over a year for my personal projects. At work, I use Claude Code, and so I've been wondering if I'm missing something in the other agents.
Over the last week, I tried out two other agents on my personal projects: dirac and forgecode, after seeing impressive results from both of them on terminal bench.
After a good amount of testing, and over $100 in open router spend, I'm back to cursor.
I really liked forgecode the best, and it feels better than claude code, but cursor definitely feels best to me. Composer 2.5 is fast and effective, and it makes a huge difference. I was running `forge` with Opus, and it was taking dozens of minutes to do things, and the feedback loop was so slow.
The previous version of composer was also much faster, and it makes a difference. Maybe people like context switching, but I prefer to stay focussed on the task in front of me, and I'm reviewing the code carefully.
I think that's a pretty good moat. I was ready to end my subscription a week ago, and now I'm back after learning the grass is not necessarily greener on the other side of the fence.
They don't have the same quality and kind of data. For example, Claude Code might have general conversation flow data for implementing feature X, but Cursor has users individual editing actions AND the chat flow. Which line did the user manually edit after the agent did it's thing? What's the commit message (if done manually)? Stuff like that is worth it's weight in gold.
Honestly the data itself is probably worth heaps even in the company itself collapses. Early attention engineering when humans were still in the loop!!!
> Early attention engineering when humans were still in the loop
Exactly. Cursor was the first product used by tons of devs on real codebases. Just the signal "acceptance rate" is huge and can't be easily captured w/ synthetic data.
How much the RL they are doing really improves Kimi K2.5 is to be seen. So, right now, the ground truth is that they combined what they had with a strong open weights model. The RL improvement may be both marginal (since may folks report strong results with vanilla K2.6) and may mostly bias the model towards coding tasks: when a model like this is trained to be generalist, there is a tension between being good at one thing and the other, in terms of SFT and RL. You can see this in the DeepSeek v4 Flash training report for instance but it is a known fact. So if you have the GPUs and a decent RL pipeline that does not run the model you can indeed specialize it a bit more for a given task at the expenses of tasks people will not do inside Cursor. But, so far, the measurable reality is that Cursor uses an open weight model like most could do, and the RL story could be partilly a marketing move to call to Composer 2.5 more than a real strong gain, given that there is no way to verify and K2.5 was already strong. And we also know that they had to partner to do the training, which is also not a good news.
They are still a vscode fork with no moat? Like they lost about 70% of users in half a year which goes to show how there is not even the tiniest of moat.
I think it's going to be brutal for them to compete with OpenAI and Anthropic.
I switched to claude code because of usage. For $200 a month, I would run out of usage halfway through the month. Then be forced to use their composer model or whatever slow, dumb model they served up in their "auto" mode.
For that same $200 a month, I could use claude code and basically never hit usage limits.
I don't understand what people are doing who run into the limits on that max x20 plan. I NEVER have.
Since the frontier is only 8-month ahead of DeepSeek, it is hard to see how model training can be a moat as all the tricks are available from open labs in China. You really just need <100m to bootstrap at this point.
>& now they're still losing all of their users to Claude Code and Codex.
Why pay for Cursor when I can use GLM 5.1, Kimi K2.6, MiniMax M2.7, Xiaomi MiMo V2.5 Pro and Deepseek v4 for cheap and use whatever harness I want, including Claude Code.
It's not like Cursor harness is the best out there.
And even if I want to edit the code, I don't need to run the agent harness in an IDE.
these are in the trillion parameters range, not sure it's actually that cheap to have at a reasonable speed without quality degradation & without like.. your own DGX B200
It's still a VsCode fork just now with a Kimi fine tune and still no moat...
I won't debate that it turns out none of this mattered when it came to being as successful company though and kinda makes anyone who tried to roll their own instead of fork look a little silly.
How I see this is that its so important to bundle the model with the right tooling.
Like a racecar, having the best engine doesn't help if the rest of the car lacks other winning properties (reliability, aerodynics etc).
So for Cursor, which IMO, they put themself in a strong position by having both a solid IDE __and__ a solid+cost efficient model. Those two working great in combination for the task they are designed to solve (coding) is more important than benchmarks
Always thought of this as two cars driving faster than you on the road. After a certain distance it's clear both are faster than you, but really hard to say which one is the fastest.
I agree with the general sentiment here that this is the future of coding for a lot of tasks. But in terms of a business case for your product I'm really struggling to see how this beats Claude code action? Which integrates directly with GitHub, at no additional cost, and I can use an oauth token to use my subscription.
Hey Ishaan here (co-founder), totally fair point, claude code actions are great for GitHub workflows. We see Omnara fitting in when you want to keep a live session going across devices (terminal ↔ web ↔ mobile) and outside GitHub too
I see, thanks for explaining and congrats on the launch! After re-reading the description, the ability to use other frameworks might become a USP too.
Just a random remark, what's annoying and a pain point in my workflow are definitely proper development environments for agents . Not just runtimes but also managing secrets etc. Maybe an avenue to explore and use in marketing copy.
Haven't used Windsurf yet, but in other tools this is called 'Agent' mode. So you open up the chat modal to talk to an LLM, then select 'Agent' mode and send your prompt.
Cursor recently lost me as a customer. Too many updates that disturb my workflow and productivity, no easy way to roll back versions, super sparse changelogs, lots of magic in context building, really untransparent pricing on max mode. I recently made the switch to Claude Code on the Max plan and I couldn't be happier. The only real thing I'm missing is the diff view across files, but I assume it's just a matter of time until that's properly implemented in Zed or VSCode.
I feel unstoppable with Claude Code Max. I never thought I'd pay $200 per month for any developer tool, yet here we are, and I also couldn't be happier with it.
Maybe in the US? I will never pay 100$ for a subscription and I despise that people normalized it by even buying this stuff instead of saying "no, that's way too expensive".
Well bucko it’s time to open your wallet. There’s creatives out there who spend at least $1000/month in subscriptions for tools, but without those tools they could never do most of the work they do. And some who buy physical gear like photographs and videographers pay even way more than that for equipment.
Soon it will be the same for developers. Developers really are a spoiled bunch when it comes to paying for things, many will balk at paying $99/year just to publish stuff on an App Store. Everyone just wants free open source stuff. As expectations on developer productivity rises, you will be paying for these AI tools no matter how expensive they get, or you will just be gentrified out of the industry. The choice is yours.
I work in a cleanroom to fabricate semiconductor devices and I spend hundreds of euros per hour to use specific tools which mostly just use electricity and maintenance. Should we complain that it’s too expensive or should we use them because they’re worth the price?
Things have a price for a reason. It’s up to you whether it’s worth paying that or not.
We are talking about personal use and then people don't pay for it out of their own pocket but the company's. At least I hope so because otherwise it would be very dumb.
I’m also talking about personal use. These are research devices for my PhD. I’m obviously not paying out of pocket, but my funding agency does.
I’m trying to convey that if a tool increases your efficiency by more than it costs then it’s worth paying for it regardless of how expensive it is. It’s how the economy works.
There is no free lunch. Even if a company pays for it instead of you, their LLM costs per developer will be factored in to what they are willing to provide as compensation. So one way or another, the end result is you get paid for less for the same amount of work today.
That “if” doesn’t apply to all of us, though. Not everybody is paid by the hour. I’d love to try something like Claude code, but $100 per month is way too expensive for me, and it probably wouldn’t even give me a single extra dollar of income. I think I’ll just wait for the time when local LLMs will be good enough to be a viable alternative.
By the time you can run good enough local LLMs without splurging on sufficiently powerful hardware, those LLMs will look like toys compared to whatever cloud based LLMs are available.
That's a great question. Probably not. IDK. I'm also only paying this much to maintain momentum on a personal project. I also know in a year, these LLM products will change drastically, pricing tiers will transform, etc.. So I can't predict what will happen in a year but things will probably be cheaper.
Edit: On the other hand, the state of the art tools will also be much better in a year, so might keep that high price point!
Am I rationalizing my purchase? Possibly. If I'm not using it daily, I will cancel it, I promise :)
I think there is definitely room to price AI tools way higher. Developers are being slowly boiled like frogs right now. Getting addicted to AI tools to the point they can’t work without them, that’s when you raise the price.
I see it as an investment into my future. I was able to make progress on a personal project with Claude Code which I failed at using other tools. Yes, I will, and apparently have, paid multiple hundreds of dollars to get the project release ready. But I definitely need to keep in mind that I'm not going to at that velocity all the time, which would make the $200 price point not justifiable long term.
I just started with it, so still getting my feet wet, but it's been better than any other tool at really grokking my codebase and understanding my intent. The workflow feels better than a strict IDE integration, but it does get pricey really quickly, and you pretty much need at least the $100 Max subscription.
Luckily, it should be coming with the regular $20 Pro subscription in the near future, so it should be easier to demo and get a feel for it without having to jump in all the way.
The current max pricing is actually as transparent as it has ever been: It's 20% more to use Max than the APIs directly. I am not sure if your feedback is outdated/based on a previous version of reality?
I have a workflow that also uses micro commits.
I keep my older JetBrains IDE open at the same time.
Using feature branches liberally, any successful interaction between me and the LLM in Cursor results in a micro commit. I use the Cursor AI ‘generate commit message’ for speed.
Every so often, I switch over to Jetbrains to use Git Interactive Rebase to tidy up the commits, as the diff viewer is unsurpassed. Then those micro commits get renamed, reordered, squash merged as required. All possible from Git CLI of course, but the Jetbrains Git experience is fantastic IMHO. All their free community edition IDEs have this.
Personally, I've been using Cursor since day 1. Lately with Gemini 2.5 Pro. I've also started experimenting with Zed and local models served via ollama in the last couple of days. Unfortunately, without good results so far.
Say I'm chatting in a git project directory `undici`. I can show you a few ways how I work with codex.
1. Follow up with Codex.
`mct "fix bad response on h2 server" --model anthropic/claude-3.7-sonnet:thinking`
Machtiani will stream the answer, then also apply git patches suggested in the convo automatically.
Then I could follow up with codex.
`codex "See unstaged git changes. Run tests to make sure it works and fix and problems with the changes if necessary."
2. Codex and MCT together
`codex "$(mct 'fix bad response on h2 server' --model deepseek/deepseek-r1 --mode answer-only)"`
In this case codex will dutifully implement the suggested changes of codex, saving tokens and time.
The key for the second example is `--mode answer-only`. Without this flagged argument, mct will itself try and apply patches. But in this case codex will do it as mct withholds the patches with the aforementioned flagged arg.
3. Refer codex to the chat.
Say you did this
`mct "fix bad response on h2 server" --model gpt-4o-mini --mode chat`
Here, I used `--mode chat`, which tells mct to stream the answer and save the chat convo, but not to apply git changes (differrent than --mode answer-only).
You'll see mct will printout that something like
`Response saved to .machtiani/chat/fix_bad_server_resonse.md`
Now you can just tell codex.
`codex "See .machtiani/chat/fix_bad_server_resonse.md, and do this or that...."`
*Conclusion*
The example concepts should cover day-to-day use cases. There are other exciting workflows, but I should really post a video on that. You could do anything with unix philosophy!
6x the price of 3.1 flash lite
reply