Hacker Timesnew | past | comments | ask | show | jobs | submit | asar's commentslogin

$1.5/m input tokens $9/m output tokens

6x the price of 3.1 flash lite


"Flash-Lite" is a different product from "Flash", which is more expensive. They couldn't be more confusing with their naming though, especially since they have 3.1 Pro and not 3.1 Flash non-lite.

I haven't used 3.5 at all yet, but previous Gemini (and Gemma models) are by far the most token light per task than any other model.

Cost per task is a more productive measure, but obviously a more difficult one to benchmark.


I wonder why they didn't discuss price in the post?

Compare to the GPT-5.5 announcement: https://openai.com/index/introducing-gpt-5-5/


I don't think input/output pricing matters, 90% of the cost is cache. $0.15 is pretty good, but still very expensive.

It depends on the use-case. yes, 90% of cost is cache in agentic coding scenarios (actually 95% in my experience). But not when the model reasons for 200k+ tokens before answering a complex problem.

gemini models solve a problem in 80% less tokens so that's something to think about.

Source?


Gemini caching is confusing though:

  $0.15 / million tokens
  $1.00 / 1,000,000 tokens per hour (storage price)
I much prefer the OpenAI/DeepSeek way of pricing caching where you don't have to think about storage price at all - you pay for cached tokens if you reuse the same prefix within a (loosely defined) time period.

As far as I can tell Gemini caching DOES work like OpenAI - see implicit caching here: https://ai.google.dev/gemini-api/docs/caching

I confirmed this by running a bunch of prompts through Gemini 3.5 Flash without doing anything special to configure caching and noting that it comes back with a "cachedContentTokenCount" on many of the responses.

The "storage price" quoted is for an optional Gemini feature that most people don't care about: https://ai.google.dev/gemini-api/docs/caching#explicit-cachi...


In our experience, caching is not very reliable with google. We always get random cache misses that don't happen with other providers. We find OpenAI, Anthropic and Fireworks (which we use a lot) all have higher cache hit rates. So it's not only about the costs of cached token but also what kind of cached hit rate you get.

In my experience Google is the most flaky in general, which is surprising considering the rock solid history of their search and other products. Just more likely not to respond at all, to give a response out of left field, to handle the same error in 12 different ways randomly (a rainbow of HTTP status codes and error messages), etc etc.

I agree. The https://aistudio.google.com/ is shockingly bad. I'm not sure I've ever used such a flaky Google service before. It's so much worse than Gmail or Google, not to mention ChatGPT or Claude or DeepSeek or Kimi or Midjourney web interfaces. The bizarre janky integration with your Google Drive, or Gemini or NBPs randomly erroring out, often indefinitely. I've had sessions refresh themselves and just... disappearing. Or when you get frustrated with a buggy dead session and hit 'new session' and have to wait minutes for 'saving...' to happen.

Exactly our experience too. Effectively we catch these and on these status codes, we send to OpenAI. Retrying the same query in Gemini has high chance to give kind-of the same status code.

10% of input pricing is standard especially compared to competition.

yah, which means that the input cost is the only value that should be paid attention to at the end + the cache discount (x10). If google would start offering x20 discount it would make it twice as cheap while input and output stayed the same.

[deleted]

Output cost is 3x from Gemini 3 flash.

The model is (like Composer 2) based on Kimi K2.5 and they claim SOTA performance for 1/10th of the cost. The tweet also mentions that they've started a new model from scratch on Colossus 2 (xAI/SpaceX Cluster). Really impressive how they've made this jump from being called the vscode fork with no moat just a couple of months ago.

> Really impressive how they've made this jump from being called the vscode fork with no moat just a couple of months ago.

Impressive, yes. But they still don't have a moat...


I am not sure we should dismiss what they have today. Nobody has yet to come close with a full package ide that works well for coding. Is that not a moat? It is easy for my to in my head discount it, thinking that I could build something myself but between autocomplete and their workflow for agent use, it feels like they have some tangible moat emerging.

If we ignore cost (which is kinda hard to ignore), I feel Codex kinda' does it for me. Sure it's not really an editor but I find I don't need that _that much_ and it's easy to launch an external editor (they actually have the feature).

The ironic thing is that half a year ago, after trying factory.ai I thought chat-first interface was a stupid idea that will never work.


Have you tried Zed?

I haven’t tried Cursor, so don’t know how they compare, but I like Zed a lot.

Anyway, would love to see a comparison from someone who has used a recent version of each.


A few years ago I tried Zed when it was still pretty early, but eventually settled on Cursor. I gave Zed another shot a few days ago because Cursor’s worktree support still feels pretty weak.

In my setup I use multiple agents like Claude Code and Codex, and Zed’s ACP support makes it pretty nice to manage them all as “threads” in one place. Worktree switching also feels much smoother.

Overall the experience was pretty good, but the way the agent and editor are integrated still feels a bit lacking, and tab completion is the big one for me. Cursor’s tab completion is still the best I’ve used.

So now I’m using both. For work that needs a lot of focus and careful iteration, I use Cursor. For things that are easy to split into worktrees and hand off to agents, I use Zed with Claude/Codex.


Interesting, is it that the tab completion is giving better results, or how it works is better?

The tab completion is "faster than vim" from a long-time vimmer. It's at the point where a lot of times i'll lead with the comment instead of the code:

    # now take the list and sort by x.lastName
    <tab>
...and it'll "do the thing" (w/ type hints, its own comments, etc). Obviously in this very simple, understandable, completely contrived example, it's "trivial" (but 3 years ago would have seemed like magic), but it'll also pick up on "continuation / more of the same" type edits. A comment like `# use random_utility to call the api and only accept matches which supplement addresses that have already been found` will (usually) autocomplete all the gobbledy-gook w.r.t. tokens, URL's, function names, etc. so it's effectively an "automatic omni-complete with simplistic post-processing"

Example #2: I was just fixing some vibe-coded slop, where it was taking `click.echo( some_api.whatever_endpoint() )` and the "slop" portion was literally emitting: `str('{ "A": 1, "B": 2 }')` and that function call was emitting it directly.

On the command line, I was doing `blah whatever-endpoint --something | jq '.'` and got tired of the JQ thing, so I'm like: "I'll just use `json.dumps(...,indent=2)`", but lo and behold, I'm getting a dumb JSON string literal, not a pretty printed object shape.

I start typing `json.loads(` to move from "str()" to "dict()" ... and it autocompletes the whole scenario (on that line), then I move to `def some_other_endpoint` and it basically has that same edit queued up. (ie: it "knows" what i'm about to do).

...so overall, "faster than vim", even with high skill bar for repetition, motion, macros, sed-style edits, etc. You can't beat: "<tab>", especially when it's lightly intelligent (ie: knows when/what/str/int, adapts do different function calls, etc).


I've tried Zed and really didn't like it.

I like VS Code with the Claude Plugin, and sometimes with the Codex Plugin


Tried it and it’s fine but the AI integration is not tight enough for me.

I've been using cursor for over a year for my personal projects. At work, I use Claude Code, and so I've been wondering if I'm missing something in the other agents.

Over the last week, I tried out two other agents on my personal projects: dirac and forgecode, after seeing impressive results from both of them on terminal bench.

After a good amount of testing, and over $100 in open router spend, I'm back to cursor.

I really liked forgecode the best, and it feels better than claude code, but cursor definitely feels best to me. Composer 2.5 is fast and effective, and it makes a huge difference. I was running `forge` with Opus, and it was taking dozens of minutes to do things, and the feedback loop was so slow.

The previous version of composer was also much faster, and it makes a difference. Maybe people like context switching, but I prefer to stay focussed on the task in front of me, and I'm reviewing the code carefully.

I think that's a pretty good moat. I was ready to end my subscription a week ago, and now I'm back after learning the grass is not necessarily greener on the other side of the fence.


Isn't a large user base and the data collected from those users a moat of sorts?

A moat is when you have something other's can't easily get.

Every MAG 7 / FAANG company already has more users and more data...

That's not a moat.

That's traction.


They don't have the same quality and kind of data. For example, Claude Code might have general conversation flow data for implementing feature X, but Cursor has users individual editing actions AND the chat flow. Which line did the user manually edit after the agent did it's thing? What's the commit message (if done manually)? Stuff like that is worth it's weight in gold.

That's not X.

That's Y.


Been a bit out of the loop.

What's wrong with using very short sentences like 'That's not X. That's Y.'?


Commonly used phrase by LLMs. Gives people slop vibes these days.

"It's not X, it's Y" is a good way to illustrate a point. Same goes for many other common LLM phrases. It's used because it's effective.

Huh. I associate it with LinkedIn slop, which is probably 100% ai nowadays but they certainly didn't wait for llms.

Honestly the data itself is probably worth heaps even in the company itself collapses. Early attention engineering when humans were still in the loop!!!

> Early attention engineering when humans were still in the loop

Exactly. Cursor was the first product used by tons of devs on real codebases. Just the signal "acceptance rate" is huge and can't be easily captured w/ synthetic data.


And its still just a vscode fork

Cursor 3 is a complete rewrite, its no longer a fork.

It's still a VSCode fork. Even Cursor's own About window tells you it's VSCode.

  Cursor
  Version: 3.4.20
  VSCode Version: 1.105.1

I believe the agent view is a complete rewrite, and maybe the other parts but not the editor itself

How much the RL they are doing really improves Kimi K2.5 is to be seen. So, right now, the ground truth is that they combined what they had with a strong open weights model. The RL improvement may be both marginal (since may folks report strong results with vanilla K2.6) and may mostly bias the model towards coding tasks: when a model like this is trained to be generalist, there is a tension between being good at one thing and the other, in terms of SFT and RL. You can see this in the DeepSeek v4 Flash training report for instance but it is a known fact. So if you have the GPUs and a decent RL pipeline that does not run the model you can indeed specialize it a bit more for a given task at the expenses of tasks people will not do inside Cursor. But, so far, the measurable reality is that Cursor uses an open weight model like most could do, and the RL story could be partilly a marketing move to call to Composer 2.5 more than a real strong gain, given that there is no way to verify and K2.5 was already strong. And we also know that they had to partner to do the training, which is also not a good news.

They are still a vscode fork with no moat? Like they lost about 70% of users in half a year which goes to show how there is not even the tiniest of moat.

I feel like they've been targeting enterprise pretty hard. I know my company uses them, and the companies that hire us also use Cursor.

All enterprises I know use GitHub copilot as they already have Office, Teams, … wonder how will it change with the recent pricing changes

I can tell my company wants nothing with them.

Cursor will definitely win the enterprise for coding. Enterprises aren't going to trust a TUI

Why not? That makes no sense to me.

I think it's going to be brutal for them to compete with OpenAI and Anthropic.

I switched to claude code because of usage. For $200 a month, I would run out of usage halfway through the month. Then be forced to use their composer model or whatever slow, dumb model they served up in their "auto" mode.

For that same $200 a month, I could use claude code and basically never hit usage limits.

I don't understand what people are doing who run into the limits on that max x20 plan. I NEVER have.


Since the frontier is only 8-month ahead of DeepSeek, it is hard to see how model training can be a moat as all the tricks are available from open labs in China. You really just need <100m to bootstrap at this point.

This was the only way forward.

In my opinion cursor actually has one of the best harnesses again at the moment.

why is that part impressive specifically? they got purchased by SpaceX, they have access to infinite compute and cash now.

& now they're still losing all of their users to Claude Code and Codex.


>& now they're still losing all of their users to Claude Code and Codex.

Why pay for Cursor when I can use GLM 5.1, Kimi K2.6, MiniMax M2.7, Xiaomi MiMo V2.5 Pro and Deepseek v4 for cheap and use whatever harness I want, including Claude Code.

It's not like Cursor harness is the best out there.

And even if I want to edit the code, I don't need to run the agent harness in an IDE.


Not a cursor shill by any means, I do use it at work but that's because it's what they pay for.

But Cursor has a CLI harness.


these are in the trillion parameters range, not sure it's actually that cheap to have at a reasonable speed without quality degradation & without like.. your own DGX B200

I didn't say to run them at home. There are some cheap coding plans that gets you plenty of usage for the Chinese models.

>Really impressive how they've made this jump from being called the vscode fork with no moat just a couple of months ago.

With so much money and computing from SpaceX, is not so impressive.


One would hope the vscode fork with a $50B valuation and no moat, would wisely spend the money they raised to build a moat.

It's still a VsCode fork just now with a Kimi fine tune and still no moat...

I won't debate that it turns out none of this mattered when it came to being as successful company though and kinda makes anyone who tried to roll their own instead of fork look a little silly.


"No moat", well...

How I see this is that its so important to bundle the model with the right tooling.

Like a racecar, having the best engine doesn't help if the rest of the car lacks other winning properties (reliability, aerodynics etc).

So for Cursor, which IMO, they put themself in a strong position by having both a solid IDE __and__ a solid+cost efficient model. Those two working great in combination for the task they are designed to solve (coding) is more important than benchmarks


I doubt it's a brand new model. It's likely just Kimi K2.5 further trained on coding.

They didn't say it's a new model... in fact they said exactly what you just said.

Always thought of this as two cars driving faster than you on the road. After a certain distance it's clear both are faster than you, but really hard to say which one is the fastest.


In the same boat and ready to downgrade. But this must be on their radar, or they were/are losing money with opus...


I agree with the general sentiment here that this is the future of coding for a lot of tasks. But in terms of a business case for your product I'm really struggling to see how this beats Claude code action? Which integrates directly with GitHub, at no additional cost, and I can use an oauth token to use my subscription.


Hey Ishaan here (co-founder), totally fair point, claude code actions are great for GitHub workflows. We see Omnara fitting in when you want to keep a live session going across devices (terminal ↔ web ↔ mobile) and outside GitHub too


I see, thanks for explaining and congrats on the launch! After re-reading the description, the ability to use other frameworks might become a USP too.

Just a random remark, what's annoying and a pain point in my workflow are definitely proper development environments for agents . Not just runtimes but also managing secrets etc. Maybe an avenue to explore and use in marketing copy.


Haven't used Windsurf yet, but in other tools this is called 'Agent' mode. So you open up the chat modal to talk to an LLM, then select 'Agent' mode and send your prompt.


Cursor recently lost me as a customer. Too many updates that disturb my workflow and productivity, no easy way to roll back versions, super sparse changelogs, lots of magic in context building, really untransparent pricing on max mode. I recently made the switch to Claude Code on the Max plan and I couldn't be happier. The only real thing I'm missing is the diff view across files, but I assume it's just a matter of time until that's properly implemented in Zed or VSCode.


Since last week it’s possible to use Claude Code in the VSCode terminal where it now automatically installs a plugin to display the diffs.


thanks! i never set this up properly. did it now though, really cool!


I feel unstoppable with Claude Code Max. I never thought I'd pay $200 per month for any developer tool, yet here we are, and I also couldn't be happier with it.


Would you pay $400?


Don't give people ideas.


Other professions pay a lot for their tools, and developers are loaded with cash.


>developers are loaded with cash

Maybe in the US? I will never pay 100$ for a subscription and I despise that people normalized it by even buying this stuff instead of saying "no, that's way too expensive".


Well bucko it’s time to open your wallet. There’s creatives out there who spend at least $1000/month in subscriptions for tools, but without those tools they could never do most of the work they do. And some who buy physical gear like photographs and videographers pay even way more than that for equipment.

Soon it will be the same for developers. Developers really are a spoiled bunch when it comes to paying for things, many will balk at paying $99/year just to publish stuff on an App Store. Everyone just wants free open source stuff. As expectations on developer productivity rises, you will be paying for these AI tools no matter how expensive they get, or you will just be gentrified out of the industry. The choice is yours.


I work in a cleanroom to fabricate semiconductor devices and I spend hundreds of euros per hour to use specific tools which mostly just use electricity and maintenance. Should we complain that it’s too expensive or should we use them because they’re worth the price?

Things have a price for a reason. It’s up to you whether it’s worth paying that or not.


We are talking about personal use and then people don't pay for it out of their own pocket but the company's. At least I hope so because otherwise it would be very dumb.


I’m also talking about personal use. These are research devices for my PhD. I’m obviously not paying out of pocket, but my funding agency does.

I’m trying to convey that if a tool increases your efficiency by more than it costs then it’s worth paying for it regardless of how expensive it is. It’s how the economy works.


There is no free lunch. Even if a company pays for it instead of you, their LLM costs per developer will be factored in to what they are willing to provide as compensation. So one way or another, the end result is you get paid for less for the same amount of work today.


Why not? If you charge $50/hr and it saves you even just two hours a month, it’s a profitable trade.


That “if” doesn’t apply to all of us, though. Not everybody is paid by the hour. I’d love to try something like Claude code, but $100 per month is way too expensive for me, and it probably wouldn’t even give me a single extra dollar of income. I think I’ll just wait for the time when local LLMs will be good enough to be a viable alternative.


By the time you can run good enough local LLMs without splurging on sufficiently powerful hardware, those LLMs will look like toys compared to whatever cloud based LLMs are available.


I'm not paid by the hour, it's just basic math on what my time is worth


That's a great question. Probably not. IDK. I'm also only paying this much to maintain momentum on a personal project. I also know in a year, these LLM products will change drastically, pricing tiers will transform, etc.. So I can't predict what will happen in a year but things will probably be cheaper.

Edit: On the other hand, the state of the art tools will also be much better in a year, so might keep that high price point!

Am I rationalizing my purchase? Possibly. If I'm not using it daily, I will cancel it, I promise :)


I think there is definitely room to price AI tools way higher. Developers are being slowly boiled like frogs right now. Getting addicted to AI tools to the point they can’t work without them, that’s when you raise the price.


I see it as an investment into my future. I was able to make progress on a personal project with Claude Code which I failed at using other tools. Yes, I will, and apparently have, paid multiple hundreds of dollars to get the project release ready. But I definitely need to keep in mind that I'm not going to at that velocity all the time, which would make the $200 price point not justifiable long term.


Can you elaborate? How is it better than Cursor?


I just started with it, so still getting my feet wet, but it's been better than any other tool at really grokking my codebase and understanding my intent. The workflow feels better than a strict IDE integration, but it does get pricey really quickly, and you pretty much need at least the $100 Max subscription.

Luckily, it should be coming with the regular $20 Pro subscription in the near future, so it should be easier to demo and get a feel for it without having to jump in all the way.


Try it


The current max pricing is actually as transparent as it has ever been: It's 20% more to use Max than the APIs directly. I am not sure if your feedback is outdated/based on a previous version of reality?


Yes, they've updated the docs since last week, I guess. Before, it didn't mention the 20% markup.


> The only real thing I'm missing is the diff view across files

You can commit checkpoints prior to each major prompt and use any IDE’s builtin visual diff versus last commit. Then just rebase when the task is done


I have a workflow that also uses micro commits. I keep my older JetBrains IDE open at the same time. Using feature branches liberally, any successful interaction between me and the LLM in Cursor results in a micro commit. I use the Cursor AI ‘generate commit message’ for speed. Every so often, I switch over to Jetbrains to use Git Interactive Rebase to tidy up the commits, as the diff viewer is unsurpassed. Then those micro commits get renamed, reordered, squash merged as required. All possible from Git CLI of course, but the Jetbrains Git experience is fantastic IMHO. All their free community edition IDEs have this.


Personally, I've been using Cursor since day 1. Lately with Gemini 2.5 Pro. I've also started experimenting with Zed and local models served via ollama in the last couple of days. Unfortunately, without good results so far.

I've created a list of self-hostable alternatives to cursor that I try to keep updated. https://selfhostedworld.com/alternative/cursor/


This sounds really cool. Can you explain your workflow in a bit more detail? i.e. how exactly you work with codex to implement features, fix bugs etc.


Say I'm chatting in a git project directory `undici`. I can show you a few ways how I work with codex.

1. Follow up with Codex.

`mct "fix bad response on h2 server" --model anthropic/claude-3.7-sonnet:thinking`

Machtiani will stream the answer, then also apply git patches suggested in the convo automatically.

Then I could follow up with codex.

`codex "See unstaged git changes. Run tests to make sure it works and fix and problems with the changes if necessary."

2. Codex and MCT together

`codex "$(mct 'fix bad response on h2 server' --model deepseek/deepseek-r1 --mode answer-only)"`

In this case codex will dutifully implement the suggested changes of codex, saving tokens and time.

The key for the second example is `--mode answer-only`. Without this flagged argument, mct will itself try and apply patches. But in this case codex will do it as mct withholds the patches with the aforementioned flagged arg.

3. Refer codex to the chat.

Say you did this

`mct "fix bad response on h2 server" --model gpt-4o-mini --mode chat`

Here, I used `--mode chat`, which tells mct to stream the answer and save the chat convo, but not to apply git changes (differrent than --mode answer-only).

You'll see mct will printout that something like

`Response saved to .machtiani/chat/fix_bad_server_resonse.md`

Now you can just tell codex.

`codex "See .machtiani/chat/fix_bad_server_resonse.md, and do this or that...."`

*Conclusion*

The example concepts should cover day-to-day use cases. There are other exciting workflows, but I should really post a video on that. You could do anything with unix philosophy!


Amazing, really excited to try this out. And thanks for the time you took to write this up!


I love this analogy.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: