More

cs702 · 2026-06-11T23:40:17 1781221217

A superior alternative to standard Muon and AdamW optimizers for training large models.

Fantastic work, instantly valuable, immediately usable.

A big THANK YOU to the authors:

Jack Zhang, Noah Amsel, Berlin Chen, and Tri Dao

cs702 · 2026-05-30T13:57:41 1780149461

There's an old saying, "in the land of the blind, the one-eyed man is king."

Here we have the opposite: In the land of the one-eyed, the blind are leading.

The blind in this case are all those executives and managers who don't understand much about AI's current potential and limitations, and so far have treated it like a magic button that will solve everything. The one-eyed are rank-and-file employees who maybe sort of know a little more about AI.

pocksuppet · 2026-05-30T15:16:11 1780154171

Executives and managers are the ones who correctly understood which game was being played. The game we are playing is not one of making good products, it's one of getting money from people who both have more money and are stupider than you. They're succeeding at that. We're also doing it, but we're not getting as much money.

bastawhiz · 2026-05-30T15:41:07 1780155667

In many cases, the people who have more money and are stupider than you are other executives. Sam Altman is arguably one of the executives who know how the game is played. OpenAI is at the front. Microsoft's executives are an example of the ones who got played.

cs702 · 2026-05-26T18:26:09 1779819969

Over the past two years, I have found Uber and Lyft rides getting more expensive than taxis in several large US cities, including Boston, Chicago, NY, and LA. Taxis are now 10-50% cheaper in my experience.

When I do take Uber and Lyft rides, I ask the drivers how much they're getting paid, and the amounts they tell me are often 30% to 60% less than what I paid, which is a bit shocking to me.

At some point, Uber and Lyft stopped being service providers that charged riders a fee for value provided. They have become market makers that squeeze as much trading profit as possible by arbitraging the prices riders are willing to pay and the rates drivers are willing to accept. I imagine they are capturing most of the value in each ride today. It's perfectly legal, but let's call what it is.

I'm not surprised about the ride-share driver union.

pessimizer · 2026-05-26T18:59:01 1779821941

> At some point, Uber and Lyft stopped being service providers that charged riders a fee for value provided.

They were only ever this for about 30 seconds between when they were dumping investor cash to sell dollars for 75¢, and when they realized finally that no one even knew what a taxi was anymore, or how to find one. What literally every "cynic" said would happen.

cs702 · 2026-05-03T17:28:33 1777829313

Was this written by an LLM? It def reads like it!

pierrekin · 2026-05-03T17:33:05 1777829585

Yes, it was created by a cool called Sourcery.

Sourcery Show HN: https://qht.co/item?id=47996426

The project is currently private, I'd love to have access to its source.

jubilanti · 2026-05-03T17:42:11 1777830131

Oh so some random user with no credentials, reputation, or real name just typed "do deep research on the AI infrastructure financial bubble and write a report" then submitted it to HN?

Why should I bother reading what may or may not be a pile of unverified hallucinations?

pierrekin · 2026-05-03T23:21:56 1777850516

No, but close. Someone like that built the infrastructure tooling to do deep research, wrote up their process doing that, and then did what you said after, which I consider to be different.

I didn’t read it in full but I spot checked one or two citations and I found them compelling.

gizajob · 2026-05-03T17:40:47 1777830047

An LLM who can’t format html so crams it in an ugly spread across a pdf.

cs702 · 2026-04-20T23:06:31 1776726391

How I imagine the Nash equilibrium in chatbot ads, driven by profit-seeking in a race to the bottom:

User: "What's the best way to fix this problem I have?"

Chatbot: "I recommend buying this shiny thing here." (Next to it, there's a near-invisible light-gray "ad" notice.)

Let's hope I'm wrong.

GolfPopper · 2026-04-20T23:44:42 1776728682

Oh, given what I've seen from LLM companies, I suspect you are wrong. It will be more like:

Buried in LLM click-through: By interacting with our LLM, you agree that you are consenting to make all your interactions with us advertising-driven to an extent that you will never know, but that we will determine based on whatever makes us the most money in the least time.

cryptoegorophy · 2026-04-20T23:21:53 1776727313

Look at Google in 2000s. If you travel back in time you would’ve never thought Google would do something like it is doing today. Now pretend you travelled back in time to 2026. You would’ve never thought OpenAI (open source non profit company) would do something crazy that it just did in 2030 or 2040 or where you came from.

operatingthetan · 2026-04-20T23:26:32 1776727592

I think pretty much everyone expects OpenAI to do the bad thing in the future given their track record.

PullJosh · 2026-04-20T23:40:18 1776728418

I can’t believe they haven’t already

eswdd · 2026-04-21T00:16:41 1776730601

Too early to do it. You have to wait until people's behaviour is set in stone to the point they need to be compensated heavily to switch.

This isnt rocket science, its basic game-playing on the economic behaviour of humans.

yunwal · 2026-04-21T01:29:14 1776734954

I don’t think they’ve been successful enough at monopolizing to get away with this to an egregious extent like Google has. Anthropic and Google both have debatably better models with ad-free platforms (so far). And open models are not so far behind.

ipdashc · 2026-04-21T05:21:05 1776748865

> If you travel back in time you would’ve never thought Google would do something like it is doing today.

I'm not exactly Google's biggest fan, but what does this refer to?

They still just... show ads on search results, no? (Not that most people I know ever see them, thanks to adblockers.) The disclaimers have gotten less prominent, but I think anyone could have expected that. Are there other major things they're doing that couldn't have been expected at all in the 2000s?

johanyc · 2026-04-21T09:26:09 1776763569

Yeah I'm confused too. Google is pretty much doing the same thing as it did when they started monetizing search.

KumaBear · 2026-04-20T23:17:23 1776727043

You think it will advise it is an ad. I’m hoping you are right but then again… Wonder if we will also be charged the token usage to generate said ad.

Imagine you have it coding for you and it injects and ad into your product.

nemomarx · 2026-04-21T00:23:03 1776730983

Why inject just an ad? Maybe it'll automatically decide to use a sponsored library in the code, or build in a whole ad network who's paid openai for the placement...

DrewADesign · 2026-04-21T00:57:55 1776733075

Frankly ads are the most benign shitty thing that could come of this. I’m a hell of a lot more worried about what they’re going to sell to data brokers.

JimsonYang · 2026-04-21T00:31:47 1776731507

Tbh it doesn’t even need that. Just a way for advertisers to say “I want to target people who have bought peanut butter in the last 2 weeks”(I’m a jelly seller). That alone would beat FB and Google.

ChatGPT is collecting your data fs so advertisers can go ultra niche targeting

eswdd · 2026-04-21T00:53:18 1776732798

Advertiser's on Google and Meta et al are not really paying for visibility - they are paying to achieve some objective (e.g. sales) that is directly tied to a campaign. That's why digital advertising is so much more powerful than non-digital.

The question is, will LLM's as an interface be worth the spend in relation to converting without throwing users of chatGPT off over-time, all whilst, doing it within the regulatory frameworks. That's difficult to say. OAI will face a lot of scrutiny in EU for sure.

JimsonYang · 2026-04-21T05:16:25 1776748585

There’s a misunderstanding. I’m not talking about AEO

It’s about how Meta and google provides good data about audiences but I need more detailed info about a person(they’re exact shopping habits)

As the person responsible for GTM, I would gladly pay $60CPM if I can say “I would like to target all people who said they love crunchy peanut butter and consistently ask ChatGPT for peanut butter ideas”

I have no idea what they’re trying to pitch with the “we’re at the last step of the transaction” idea-but I also understand the regulatory issues with what advertisers like me want

cs702 · 2026-04-11T14:59:57 1775919597

Thank you. Apologies in advance for nitpicking, but I think the correct spelling is "pardoned" (a quick search on Google confirms it).

SpyCoder77 · 2026-04-11T15:02:17 1775919737

Most likely that domain was already taken.

vidluther · 2026-04-11T18:43:27 1775933007

It's a play on Donald Trump, after watching a Liz Oyer video linking a very plausible pardon for sale scheme, I wanted to initially build a site that showcased pardons just by Trump, but I realized that would be partisan and not as useful.

cs702 · 2026-04-12T00:33:22 1775954002

Ah, OK. Sorry I didn't get it right away!

SpaceL10n · 2026-04-11T15:04:50 1775919890

Pardon me, but this is a list of pardons given to pardoned people.

ceejayoz · 2026-04-11T15:23:03 1775920983

I'd presumed this was a wordplay on Donald Trump.

vidluther · 2026-04-11T18:24:44 1775931884

correct.

cs702 · 2026-04-07T23:57:24 1775606244

The core developers need buy-in from nodes controlling > 50% of the computing power in the network to make any fundamental change to the network.

cs702 · 2026-04-06T13:27:15 1775482035

Thank you for coming on HN and offering to answer questions.[a]

This is a fantastic piece, very timely, evidently well-researched, and also well-written. Judging by the little that I know, it's accurate. Thank you for doing the work and sharing it with the world.

OpenAI may be in a more tenuous competitive position than many people realize. Recent anecdotal evidence suggests the company has lost its lead in the AI race to Anthropic.[b]

Many people here, on HN, who develop software prefer Claude, because they think it's a better product.[c]

Is your understanding of OpenAI's current competitive position similar?

---

[a] You may want to provide proof online that you are who you say you are: https://en.wikipedia.org/wiki/On_the_Internet%2C_nobody_know...

[b] https://www.latimes.com/business/story/2026-04-01/openais-sh...

[c] For example, there are 2x more stories mentioning Claude than ChatGPT on HN over the past year. Compare https://hn.algolia.com/?dateRange=pastYear&page=0&prefix=tru... to https://hn.algolia.com/?dateRange=pastYear&page=0&prefix=tru...

ronanfarrow · 2026-04-06T17:17:08 1775495828

Thank you for this, very much appreciate the thoughtful response.

The piece captures some of the anxieties within OpenAI right now about their competitive position. This obviously ebbs and flows but of late there has been much focus on Anthropic's relative position. We of course mention the allegations of "circular deals" and concerns about partners taking on debt.

cs702 · 2026-04-06T19:41:43 1775504503

Thank you. Yes, I saw that. The company's always been surrounded by endless talk about insane hype, speculative bubbles, and financial engineering. I wasn't asking so much about that.

I was asking more about your informed view on how OpenAI's technology, products, and roadmap are perceived, particularly by customers and partners, in comparison to those of competitors.

If you have an opinion about that, everyone here would love to hear about it.

cs702 · 2026-04-09T20:47:23 1775767643

UPDATE: Well-regarded people on HN are saying OpenAI's most recent GPT-5x codex model is better than Claude 5x for certain coding tasks:

https://qht.co/item?id=47707494

globalnode · 2026-04-07T08:08:19 1775549299

at this point even googles ai search results are better than gpt - obv. this is not for full programs but if you know what youre doing and just want a snippet, thats all you need.

embedding-shape · 2026-04-07T11:05:59 1775559959

Wild how different experience people can have. Both Google's models and Anthrophic's hallucinate a lot for me, even when I try the expensive plans and with web searches, for some reason, and none of them come close to the accuracy and hallucination-free responses of ChatGPT Pro, which to me still is SOTA and has been since it was made available. But people keep having opposite experiences apparently, I just can't make sense of it.

ethbr1 · 2026-04-07T14:07:49 1775570869

Kagi (assistant.kagi.com) with Kimi K2.5 (their current default) has worked great for me in scenarios where the search result data is more important than the model.

I.e. what I used to use Google for and when I don't want an AI to overly summarize / editorialize result data.

globalnode · 2026-04-07T17:21:15 1775582475

oh thats probably because im a cheap-skate and just use the free garbo models. im sure the pro version is quite good.

irishcoffee · 2026-04-07T00:25:09 1775521509

My guess is that the answer to your question, fantastic question, is that nobody knows. I remember having the same thoughts when Covid was first “arriving” if you will: we wanted people in the know to throw us a nugget of information, and they just didn’t know.

As it turns out, and what I’m kind of going with for this LLM shit, is that it’ll play out exactly how you think it will. The companies are all too big to fail, with billionaire backers who would rather commit fraud than lose money.

philipallstar · 2026-04-07T08:38:32 1775551112

How would fraud help here? Don't they just need scale of lots of customers paying a little bit? How do you fraud your way into that?

kelvinjps10 · 2026-04-07T13:40:45 1775569245

they don't need customers, when the customers ere each others companies for example the deals openAI nvidia oracle made

philipallstar · 2026-04-07T15:56:00 1775577360

That's not fraud, and it's not sustainable. They aren't going to just keep doing that. It only makes sense if an AI company wants to pay for GPUs with stock, and - more importantly - the GPU company agrees to sell in exchange for stock.

irishcoffee · 2026-04-07T23:10:40 1775603440

s/fraud/corrupt, illegal $something.

If you're picking on my vocabulary, that's fair. Fraud wasn't the point, I think you're smart enough to realize that.

philipallstar · 2026-04-08T18:15:55 1775672155

I appreciate the implication that either you're right or I'm stupid, but maybe you should write the comment you meant to write.

Trading shares for GPUs is not corrupt either.

Ericson2314 · 2026-04-07T00:37:10 1775522230

Ronan Farrow's expertise is investigations into elite amorality, not evaluating technical products. Why are you asking this question?

cs702 · 2026-04-07T01:11:30 1775524290

I didn't asking him to evaluate them. I asked him how customer and partners perceive them.

He's had so many conversations that he likely has a sense of how perceptions of the company and its offerings have changed.

I'm curious.

bloppe · 2026-04-07T04:15:02 1775535302

Much of the article and general palace intrigue is predicated on the idea that OpenAI has a singularly revolutionary product. If it later turns out to be a commodity, or OpenAI is simply outcompeted nonetheless, then the idea that Sam Altman's personal shortcomings are something to stress about would seem quaint. Just another hubristic tech billionaire acting in bad faith doesn't really pry attention the same way as someone "controlling your future".

keepamovin · 2026-04-07T09:07:00 1775552820

If you were in charge of the deciding what should be done with Sam Altman, what would you choose?

giancarlostoro · 2026-04-07T16:44:32 1775580272

I mean, its a fair question, though it does make some wonder how extreme the answers could be, so I could see why you're being downvoted.

The problem is sometimes on paper everything people like Sam Altman do is legal, despite it harming so many. We've literally had a major RAM producer pull off the consumer RAM market. I feel like Sam Altman should be investigated and heavily scrutinized. He kind of is the biggest bubble in the AI bubble, we're letting him fester too far into it too, and these circular deals have seemingly somewhat stopped for now, but it might only get worse.

keepamovin · 2026-04-08T03:20:47 1775618447

Totally. Lying about others can be so harmful. But lying to hostiles in order to protect? Acceptable.

I guess my question was more, if the article author was the judge of fate or morality, what should happen?

As to AI and Sam, I think it’s too early to tell what effects will be. So we should adopt non judgement, build good ourselves and see what unfolds.

unsupp0rted · 2026-04-06T21:26:54 1775510814

Many of us prefer OpenAI's Codex, because we think it's a better product.

No comment on the CEO: I just find the product superior in everything but UI/UX and conversation. It's better at quality code.

mliker · 2026-04-06T21:35:10 1775511310

Who is “us”? It does seem that some scientists prefer Codex for its math capabilities but when it comes to general frontend and backend construction, Claude Code is just as good and possibly made better with its extensive Skills library.

Both codex and Claude code fail when it comes to extremely sophisticated programming for distributed systems

keldaris · 2026-04-07T00:56:29 1775523389

As a scientist (computational physicist, so plenty of math, but also plenty of code, from Python PoCs to explicit SIMD and GPU code, mostly various subsets of C/C++), I can confirm - Codex is qualitatively better for my usecases than Claude. I keep retesting them (not on benchmarks, I simply use both in parallel for my work and see what happens) after every version update and ever since 5.2 Codex seems further and further ahead. The token limits are also far more generous (and it matters, I found it fairly easy to hit the 5h limit on max tier Claude), but mostly it's about quality - the probability that the model will give me something useful I can iterate on as opposed to discard immediately is much higher with Codex.

For the few times I've used both models side by side on more typical tasks (not so much web stuff, which I don't do much of, but more conventional Python scripts, CLI utilities in C, some OpenGL), they seem much more evenly matched. I haven't found a case where Claude would be markedly superior since Codex 5.2 came out, but I'm sure there are plenty. In my view, benchmarks are completely irrelevant at this point, just use models side by side on representative bits of your real work and stick with what works best for you. My software engineer friends often react with disbelief when I say I much prefer Codex, but in my experience it is not a close comparison.

Scene_Cast2 · 2026-04-07T13:32:20 1775568740

Have you tried the latest (3.1 pro) Gemini? In my experience, it's notably better for a similar type of problems than Opus 4.6. However, I don't really use OpenAI products to compare.

keldaris · 2026-04-07T21:25:33 1775597133

I actually haven't - I tried Gemini 3.0 Pro in Antigravity and was disappointed enough that I didn't pay much attention to the 3.1 release, it was notably worse than Opus and GPT at the time, and much more prone to "think" in circles or veer off into irrelevant tangents even with fairly precise instruction. I'll give 3.1 a try tomorrow, see what happens.

physicsguy · 2026-04-07T07:58:23 1775548703

I've tried both against similar and haven't found it such a clear cut difference. I still find neither are able to fully implement a complex algorithm I worked on in the past correctly with the same inputs. Not sharing exactly the benchmark I'm using but think about something for improving performance of N^2 operations that are common in physics and you can probably guess the train of thought.

keldaris · 2026-04-07T21:22:25 1775596945

I've had reasonable success using GPT for both neighbor list and Barnes-Hut implementations (also quad/oct-trees more generally), both of which fit your description, haven't tried Ewald summation or PME / P3M. However, when I say "reasonable success", I don't mean "single shot this algo with a minimal prompt", only that the model can produce working and decently optimized implementations with fairly precise guidance from an experienced user (or a reference paper sometimes) much faster than I would write them by hand. I expect a good PME implementation from scratch would make for a pretty decent benchmark.

physicsguy · 2026-04-08T08:04:13 1775635453

Think another level of complexity of algorithm, different expansion bases plus a mix of input sources. Also not trying to one-shot it.

tirutiru · 2026-04-07T20:20:57 1775593257

I can roughly guess the train of thought and I am a bit surprised that Claude is failing you.

That said, I am puzzled at the algorithms that Claude & GPT "get" and ones that they do not.

(former physicist here. would love to know the kind of things you're working on. email on my profile)

ricksunny · 2026-04-07T02:48:47 1775530127

>As a scientist (computational physicist,

Is there one that you prefer for, i dunno, physics?

zeroxfe · 2026-04-06T22:13:28 1775513608

I'm in that camp -- I have the max-tier subscription to pretty much all the services, and for now Codex seems to win. Primarily because 1) long horizon development tasks are much more reliable with codex, and 2) OpenAI is far more generous with the token limits.

Gemini seems to be the worst of the three, and some open-weight models are not too bad (like Kimi k2.5). Cursor is still pretty good, and copilot just really really sucks.

the__alchemist · 2026-04-07T01:41:35 1775526095

Claude Code, Codex, and Cursor are old news. If you're having problems, it's because you're not using the latest hotness: Cludge. Everyone is using it now - don't get left behind.

outside1234 · 2026-04-07T03:34:57 1775532897

Cludge has been left behind by Clanker, that’s the new hotness. 45B valuation!

p-t · 2026-04-07T13:07:59 1775567279

ive heard that poob has it for you!

unsupp0rted · 2026-04-06T21:47:34 1775512054

Us = me and say /r/codex or wherever Codex users are. I've tried both, liked both, but in my projects one clearly produces better results, more maintainable code and does a better job of debugging and refactoring.

sampullman · 2026-04-06T21:54:28 1775512468

That's interesting, I actively use both and usually find it to be a toss up which one performs better at a given task. I generally find Claude to be better with complex tool calls and Codex to be better at reviewing code, but otherwise don't see a significant difference.

SOLAR_FIELDS · 2026-04-06T23:45:29 1775519129

If you want to find an advocate for Codex that can give a pretty good answer as to why they think it's better, go ask Eric Provencher. He develops https://repoprompt.com/. He spends a lot of time thinking in this space and prefers Codex over Claude, though I haven't checked recently to see if he still has that opinion. He's pretty reachable on Discord if you poke around a bit.

hirako2000 · 2026-04-07T08:25:53 1775550353

Quite irrelevant what factions think. This or that model may be superior for these and those use cases today, and things will flip next week.

Also. RLHF mean that models spit out according to certain human preference, so it depends what set of humans and in what mood they've been when providing the feedback.

SOLAR_FIELDS · 2026-04-07T14:31:00 1775572260

On the contrary, I very much care about what the other factions think because I want to know if things have already flipped and the easiest way to do so is just ask someone who's been using the tool. Of course the correct thing to do is to set up some simple evals, but there is a subjective aspect to these tools that I think hearing boots on the ground anecdata helps with.

tharkun__ · 2026-04-08T04:41:04 1775623264

Haven't done it in a while, but I've done some tasks with both Codex and Claude to compare. In all cases I asked both to put their analysis and plans for implementation into a .md file. Then I asked the other agent to analyze said file for comparison.

In general, Claude was impressed by what Codex produced and noted the parts where it (i.e. Claude) had missed something vs. Codex "thinking of it".

From a "daily driver" perspective I still use Claude all the time as it has plan mode, which means I can guarantee that it won't break out and just do stuff without me wanting it to. With Codex I have to always specify "Don't implement/change, just tell me" and even then it sometimes "breaks out" and just does stuff. Not usually when I start out and just ask it to plan. But after we've started implementation and I review, a simple question of "Why did you do X?" will turn into a huge refactoring instead of just answering my question.

To be fair, that's what most devs do too (at least at first), when you ask them "Why did you do X" questions. They just assume that you are trying to formulate a "Do Y instead of X" as a question, when really you just don't understand their reasoning but there really might be a good reason for doing X. But I guess LLMs aren't sure of themselves, so any questioning of their reasoning obliterates their ego and just turns them into submissive code monkeys (or rather: exposes them as such) vs. being software engineers that do things for actual reasons (whether you agree with them or not).

cher88 · 2026-04-08T14:04:48 1775657088

Codex has plan mode too - /plan

aswanson · 2026-04-06T22:31:26 1775514686

Any difference in performance on mobile development?

sampullman · 2026-04-07T00:07:03 1775520423

For that I'm not so sure. I tried both early 2025 and was disappointed in their ability to deal with a TCA based app (iOS) and Jetpack compose stuff on Android, but I assume Opus 4.6 and GPT 5.4 are much better.

rocketpastsix · 2026-04-06T23:44:51 1775519091

yea Im not in this "us" you speak of.

Finbel · 2026-04-07T06:52:52 1775544772

Of course you're not one of "us" if you're one of "them".

zem · 2026-04-06T23:18:12 1775517492

I've found claude startlingly good at debugging race conditions and other multithreading issues though.

josephg · 2026-04-06T23:47:31 1775519251

My rule of thumb is that its good for anything "broad", and weaker for anything "deep". Broad tasks are tasks which require working knowledge of lots of random stuff. Its bad at deep work - like implementing a complex, novel algorithm.

LLMs aren't able to achieve 100% correctness of every line of code. But luckily, 100% correctness is not required for debugging. So its better at that sort of thing. Its also (comparatively) good at reading lots and lots of code. Better than I am - I get bogged down in details and I exhaust quickly.

An example of broad work is something like: "Compile this C# code to webassembly, then run it from this go program. Write a set of benchmarks of the result, and compare it to the C# code running natively, and this python implementation. Make a chart of the data add it to this latex code." Each of the steps is simple if you have expertise in the languages and tools. But a lot of work otherwise. But for me to do that, I'd need to figure out C# webassembly compilation and go wasm libraries. I'd need to find a good charting library. And so on.

I think its decent at debugging because debugging requires reading a lot of code. And there's lots of weird tools and approaches you can use to debug something. And its not mission critical that every approach works. Debugging plays to the strengths of LLMs.

DeathArrow · 2026-04-07T07:26:44 1775546804

Many paying customers say that Anthropic degraded the capability of Opus and Claude Code in the last months and the outcomes are worse. There are even discussions on HN about this.

Last one is from yesterday: https://qht.co/item?id=47660925

lhl · 2026-04-07T06:10:38 1775542238

As some other people mentioned, using both/multiple is the way to go if it's within your means.

I've been working on a wide range of relatively projects and I find that the latest GPT-5.2+ models seem to be generally better coders than Opus 4.6, however the latter tends to be better at big picture thinking, structuring, and communicating so I tend to iterate through Opus 4.6 max -> GPT-5.2 xhigh -> GPT-5.3-Codex xhigh -> GPT-5.4 xhigh. I've found GPT-5.3-Codex is the most detail oriented, but not necessarily the best coder. One interesting thing is for my high-stakes project, I have one coder lane but use all the models do independent review and they tend to catch different subsets of implementation bugs. I also notice huge behavioral changes based on changing AGENTS.md.

In terms of the apps, while Claude Code was ahead for a long while, I'd say Codex has largely caught up in terms of ergonomics, and in some things, like the way it let's you inline or append steering, I like it better now (or where it's far, far, ahead - the compaction is night and day better in Codex).

(These observations are based on about 10-20B/mo combined cached tokens, human-in-the-loop, so heavy usage and most code I no longer eyeball, but not dark factory/slop cannon levels. I haven't found (or built) a multi-agent control plane I really like yet.)

kasey_junk · 2026-04-07T11:37:35 1775561855

Codex won me over with one simple thing. Reliability. It crashed less, had less load shedding and its configuration is well designed.

I do regular evaluation of both codex and Claude (though not to statistical significance) and I’m of the opinion there is more in group variance on outcome performance than between them.

baq · 2026-04-07T08:35:49 1775550949

This is the way. Eg. IME Gemini is really damn good at sql.

Razengan · 2026-04-07T14:39:29 1775572769

I have been using Codex AND Claude side by side for the same project*, with the same prompts.

Codex has been consistently better on almost every level.

* (an open source framework for 2D games in Godot 4.6 GDScript, mostly using AI to review existing code)

7thpower · 2026-04-06T23:00:21 1775516421

Not a scientist and use codex for anything complex.

I enjoy using CC more and use it for non coding tasks primarily, but for anything complex (honestly most of what I do is not that complex), I feel like I am trading future toil for a dopamine hit.

baq · 2026-04-07T08:33:12 1775550792

I’m one of those ‘us’, Claude’s outputs require significant review and iteration effort (to put it bluntly they get destroyed by gpt and Gemini). I’m basically using sonnet to do code search and write up since it is a better (more human-like) writer than gpt and faster and more reliable than gemini, but that’s about it.

bko · 2026-04-07T00:26:01 1775521561

I also find Codex much more generous in terms of what you get with a Pro ($20/mo) subscription. I use it pretty much non-stop and I have yet to hit a limit. Weekly reset is much better as well.

DeathArrow · 2026-04-07T07:38:36 1775547516

I prefer GLM 5.1 and MiniMax 2.7. With a better harness like Forge Code, I have better results for way less money than by using GPT and Opus.

jbergqvist · 2026-04-07T11:55:44 1775562944

Usage limits are more generous and GPT 5.4 is a good model, but yes, UI/UX lags behind Claude Code. Currently I'm especially missing /rewind with code restoration and proper support for plugin marketplaces

KaiserPro · 2026-04-07T08:14:49 1775549689

GPT/claude/gemini is pretty interchangeable at this point.

baq · 2026-04-07T08:37:04 1775551024

Absolutely not the case. They're complementary.

shevy-java · 2026-04-07T07:12:44 1775545964

Does this work for people? To me having a "better product" would be completely irrelevant if the use cases are evil.

thaoanh404 · 2026-04-07T08:56:10 1775552170

i find myself being more productive with codex/copilot on coding tasks, but claude does seem to be better at planning

MrSkelter · 2026-04-11T10:28:26 1775903306

Here’s a reality check.

There are two types of vaccine be coders. Those who review the code generated and those who don’t.

Either because they don’t understand code at all, or because they don’t have time and don’t care.

Code quality is only one factor. Naive vibe coders, who don’t code otherwise, rate performance based on output alone.

aaa_aaa · 2026-04-07T04:45:01 1775537101

Shill talk

brightbeige · 2026-04-06T13:46:51 1775483211

He’s replying on this twitter thread - perhaps someone with an account can ask there and link his comment here?

https://xcancel.com/RonanFarrow/status/2041127882429206532#m

jamiequint · 2026-04-06T18:19:55 1775499595

Here is the actual link, not a link to some weird third-party site that can't be trusted.

https://x.com/RonanFarrow/status/2041127882429206532

rounce · 2026-04-06T23:33:14 1775518394

FYI xcancel is just a mirror that allows reading replies without needing an account.

SwellJoe · 2026-04-06T20:09:30 1775506170

Whereas X can be trusted?

jamiequint · 2026-04-07T00:09:17 1775520557

Yes? It's the data source, not a third-party. How is this even a question?

minimaxir · 2026-04-07T02:52:20 1775530340

There's pedantic, and then there's needlessly pedantic.

xcancel is a valid workaround for X links on Hacker News and is sufficient for original attribution.

SwellJoe · 2026-04-07T02:51:13 1775530273

X restricts what you can view without logging in. Many folks don't want to log in to X, for obvious reasons. Posting an xcancel link is kinda like folks posting various `archive` URLs to bypass paywalls, work around overloaded servers, etc. That's an extremely common practice here that usually goes without comment.

jamiequint · 2026-04-07T22:03:27 1775599407

What is an "obvious reason" one might not want to log into X? I can't think of any rational reason.

SwellJoe · 2026-04-13T05:05:08 1776056708

You can't think of even one reason?

https://www.theguardian.com/technology/2026/feb/12/elon-musk...

ed · 2026-04-06T22:46:24 1775515584

It's worth noting Codex has 2x more stories than Claude https://hn.algolia.com/?query=codex

cloverich · 2026-04-07T03:49:59 1775533799

But by page 5, those stories have around 50-60 karma, while claude page five is still 500+

(i found your comment surprising based on my daily hn reading recollection - i mostly read top N daily and feel i only occassionally see codex stories).

ATMLOTTOBEER · 2026-04-07T01:48:30 1775526510

Yeah we moved to Claude a few months ago, mostly because the devs kept using it anyway. Altman stuff is interesting but at the end of the day you just go with whatever tool works

cableshaft · 2026-04-07T14:13:55 1775571235

Personally, I prefer Claude for coding, but I still prefer ChatGPT for hashing out ideas for my projects (which tend to be game designs). So I use both.

lasky · 2026-04-10T06:54:30 1775804070

I’m assuming this is all sarcasm.

georgemcbay · 2026-04-06T16:30:47 1775493047

> You may want to provide proof online that you are who you say you are

Unfortunately it probably doesn't even matter here on HN considering how brigaded down this story is predictably getting.

But yeah, it was a fantastic piece.

dang · 2026-04-06T21:29:14 1775510954

It wasn't getting "brigaded down" - it set off a software penalty called the flamewar detector. I turned that off as soon as I saw it.

cs702 · 2026-04-07T11:56:07 1775562967

Thank you for keeping HN sane :-)

ronanfarrow · 2026-04-06T17:19:21 1775495961

Fair request, here you go: https://x.com/RonanFarrow/status/2041203911697068112

cs702 · 2026-04-03T20:41:34 1775248894

Profit-seeking at society's expense.

Also known as rent-seeking: "The act of growing one's existing wealth by manipulating public policy or economic conditions without creating new wealth. Rent-seeking activities have negative effects on the rest of society. They result in reduced economic efficiency through misallocation of resources, stifled competition, reduced wealth creation, lost government revenue, heightened income inequality, heightened debt levels, risk of growing corruption and cronyism, decreased public trust in institutions, and potential national decline."[a]

Sigh.

---

[a] https://en.wikipedia.org/wiki/Rent-seeking

cs702 · 2026-03-25T12:44:17 1774442657

That's peanuts.[a]

[a] https://dictionary.cambridge.org/us/dictionary/english/peanu...