More

andreagrandi · 2026-03-16T20:10:36 1773691836

Context: I've been using agents (both Claude Code and Codex) for my daily work and for personal projects, but always in domains where I had some knowledge and I'm currently happy with them.

I tried using Claude Code to build an RPG game with Godot and GDScript, using free to use assets: a total failure :/

The game was supposed to be many implementation steps long but I asked Claude to first produce a one area demo, so I could test the assets and choose the one I liked. First it produced some garbage using the assets randomly. Then it tried to copy from an existing demo but it had not idea where a door or a path were and at a certain point it even admitted it with something like: "I can't design an usable and nice area: I either make it functional and ugly or I copy and adapt the existing demo but I will have no clue about what is what"

I've never even attempted to develop games before so I'm sure I don't even know the basic concepts, but this use case definitely didn't work for me.

Maybe it could generate the code of the game if I provided the full design?

htdt · 2026-03-16T20:20:03 1773692403

That's exactly the failure mode this project exists to solve. The core issue is Claude Code has no way to see what it's producing — code compiles fine but assets are floating, paths lead nowhere, layouts are garbage. It even told you as much.

Godogen closes that loop: after writing code, it captures screenshots from the running engine and a vision model evaluates them. That's the difference between "compiles but broken" and "actually playable."

And yes — providing design docs helps a lot. The pipeline generates those automatically (visual reference, architecture, task plan), but you can provide your own and customize the skills to match your vision.

dr_kiszonka · 2026-03-16T22:21:49 1773699709

It would be a hit, if you packaged that loop as an MCP. Opus can make really pretty 3d models even using three.js primitives but they tend to have serious issues (like facial features inside the head). Being able to have it automatically generate a set of screenshots and Gemini scrutinize them and provide structured feedback would be a time saver. Curiously, I could not get Gemini 3.1 Pro to ever generate anything even remotely passable.

andreagrandi · 2026-03-17T07:52:45 1773733965

And it's exactly what I was trying to do manually :D I accept the limit and say that probably doing a video game is not for me, but it's nice that a solution exists.

andreagrandi · 2026-03-06T22:16:43 1772835403

Question: how can you find the exact session you are looking for, among hundreds of them? I had a look at my ~/.claude/projects/*/ and I couldn't even find my last session.

d4rkp4ttern · 2026-03-07T12:31:04 1772886664

I had exactly this problem and didn’t see anything good out there (Claude —resume only searches session names and auto-created titles) so I got a tool built that uses a Rust/Tantivy full text search index. It’s part of the aichat command suite, called “aichat search”:

https://pchalasani.github.io/claude-code-tools/tools/aichat/...

It brings up a nice TUI for filtering and further actions. There’s also a —json flag so agents can use it as a CLI search tool to find context about any past work. There’s a plugin that provides a corresponding session-searcher agent that knows to use this tool to search sessions.

I have hundreds/thousands of past sessions and this has been a life saver; I can just ask the main agent, “use the session searcher agent to get the details of how we built the tmux-cli tool so we can add some features”.

es617 · 2026-03-06T22:33:26 1772836406

Ha, good question. Short answer: I often let Claude Code find it.

Sessions are grouped by the folder where you ran Claude Code (e.g. ~/.claude/projects/Users-<user>-<path>), so if you don’t run everything from the same directory, it’s usually easy to narrow down.

They’re also plain JSONL files, so grep works well if you remember part of a prompt.

That said, it might be nice for claude-replay to add a helper command to list or search recent sessions.

minixalpha · 2026-03-07T02:07:42 1772849262

In your last session, use "/status" to show your session_id, then find your session file in "~/.claude/projects/[your_project]/[session_id].jsonl"

andreagrandi · 2026-03-03T07:12:02 1772521922

I must have missed something: why are people moving from OpenAI? Since they released gpt-5.3-codex I'be been using it and claude with opus-4.6 and Codex has always been better, more accurate, less prone to allucinations. I can do more with a 20$ OpenAI pland than with a Claude Max 100

andkenneth · 2026-03-03T07:15:08 1772522108

People are mad at openAI cooperating with the pentagon while anthropic put their foot down over their red lines.

direwolf20 · 2026-03-03T10:14:07 1772532847

More specifically OpenAI has agreed to be used for domestic mass surveillance and for autonomous (no human in the loop anywhere) drone attacks. ChatGPT will decide which building to destroy, and then it will be destroyed.

ternwer · 2026-03-03T07:41:56 1772523716

HN often avoids politics, but they were some of the most upvoted stories recently:

https://qht.co/item?id=47188697

https://qht.co/item?id=47189650

rhubarbtree · 2026-03-03T07:51:47 1772524307

I use both openai and Claude and in the last few months have moved exclusively to Claude, as it’s better.

CSMastermind · 2026-03-03T07:35:41 1772523341

Politics, agreed Codex performs significantly better for me.

andreagrandi · 2026-02-19T07:12:06 1771485126

I'm only waiting for OpenAI to provide an equivalet ~100 USD subscription to entirely ditch Claude.

Opus has gone down the hill continously in the last week (and before you start flooding with replies, I've been testing opus/codex in parallel for the last week, I've plenty of examples of Claude going off track, then apologising, then saying "now it's all fixed!" and then only fixing part of it, when codex nailed at the first shot).

I can accept specific model limits, not an up/down in terms of reliability. And don't even let me get started on how bad Claude client has become. Others are finally catching up and gpt-5.3-codex is definitely better than opus-4.6

Everyone else (Codex CLI, Copilot CLI etc...) is going opensource, they are going closed. Others (OpenAI, Copilot etc...) explicitly allow using OpenCode, they explicitly forbid it.

This hostile behaviour is just the last drop.

super256 · 2026-02-19T12:25:00 1771503900

OpenAI forces users to verify with their ID + face scan when using Codex 5.3 if any of your conversations was redeemed as high risk.

It seems like they currently have a lot of false positives: https://github.com/openai/codex/issues?q=High%20risk

andreagrandi · 2026-02-19T13:20:50 1771507250

They haven't asked me yet (my subscription is from work with a business/team plan). Probably my conversations as too boring

stogot · 2026-02-20T04:19:42 1771561182

Try something not boring and see what happens?

seu · 2026-02-19T08:21:06 1771489266

> Opus has gone down the hill continously in the last week

Is a week the whole attention timespan of the late 2020s?

latexr · 2026-02-19T08:45:49 1771490749

We’re still in the mid-late 2020s. Once we really get to the late 2020s, attention spans won’t be long enough to even finish reading your comment. People will be speaking (not typing) to LLMs and getting distracted mid-sentence.

eamag · 2026-02-19T09:27:31 1771493251

Reminded https://www.baen.com/Chapters/9781618249203/9781618249203___...

willguest · 2026-02-19T11:19:34 1771499974

thanks. i really enjoyed it

imafish · 2026-02-19T08:50:43 1771491043

Seems we're already there.

My brain trailed off after "won’t be long enough to even finish"...

mraart · 2026-02-19T09:32:19 1771493539

That's still impressive, given your claim of being a fish...

Bengalilol · 2026-02-19T11:35:42 1771500942

The most impressive thing is that this looks like it is your only comment on this network ^^

ps: imafish may only be a fan of <https://mumband.bandcamp.com/track/if-i-were-a-fish>

sharperguy · 2026-02-19T11:10:16 1771499416

I would even call it mid 2020s. I think in a couple years people's attention spans will be so short they won't even finish reading comments.

_kb · 2026-02-19T10:19:14 1771496354

Unfortunately, and “Attention Is All You Need”.

marcus_holmes · 2026-02-19T08:28:06 1771489686

oh shit we're in the late 2020's now

testdelacc1 · 2026-02-19T08:33:59 1771490039

Sorry, I don’t agree. And I won’t be taking questions at this time.

abm53 · 2026-02-19T11:11:17 1771499477

I’m unsure exactly in what way you believe it has gone “down the hill” so this isn’t aimed at you specifically but more a general pattern I see.

That pattern is people complaining that a particular model has degraded in quality of its responses over time or that it has been “nerfed” etc.

Although the models may evolve, and the tools calling them may change, I suspect a huge amount of this is simply confirmation bias.

ifwinterco · 2026-02-19T07:56:49 1771487809

Opus 4.6 genuinely seems worse than 4.5 was in Q4 2025 for me. I know everyone always says this and anecdote != data but this is the first time I've really felt it with a new model to the point where I still reach for the old one.

I'll give GPT 5.3 codex a real try I think

Esophagus4 · 2026-02-19T13:01:52 1771506112

Huh… I’ve seen this comment a lot in this thread but I’ve really been impressed with both Anthropic’s latest models and latest tooling (plugins like /frontend-design mean it actually designs real front ends instead of the vibe coded purple gradient look). And I see it doing more planning and making fewer mistakes than before. I have to do far less oversight and debugging broken code these days.

But if people really like Codex better, maybe I’ll try it. I’ve been trying not to pay for 2 subscriptions at once but it might be worth a test.

misnome · 2026-02-19T14:13:33 1771510413

> And I see it doing more planning and making fewer mistakes than before

Anecdotally, maybe this is the reason? It does seem to spend a lot more time “thinking” before giving what feels like equivalent results, most of the time.

Probably eats into the gambling-style adrenaline cycles.

mosselman · 2026-02-19T09:22:44 1771492964

I asked Codex 5.3 and Opus 4.6 to write me a macos application with a certain set of requirements.

Opus 4.6 wrote me a working macos application.

Codex wrote me a html + css mockup of a macos application that didn't even look like a macos application at all.

Opus 4.5 was fine, but I feel that 4.6 is more often on the money on its implementations than 4.5 was. It is just slower.

prodigycorp · 2026-02-19T10:44:18 1771497858

Codex has written me 3 very nice mac os applications in the past week lol

stavros · 2026-02-19T09:40:21 1771494021

I asked both to help me with a hardware bug. Codex kept trying things, being sure of what the problem is every time, and every time making it worse.

Opus went off and browsed my dependencies for ten minutes, and came back and solved the problem firs try.

saberience · 2026-02-19T14:23:33 1771511013

Heh, I find Codex to be a far, far smarter model than Claude Code.

And there's a good reason the most "famous" vibe coders, including the OpenClaw creator all moved to Codex, it's just better.

Claude writes a lot more code to do anything, tons of redundent code, repeated code etc. Codex is only model I've seen which occasionally removes more code than it writes.

Huppie · 2026-02-19T10:03:26 1771495406

Funnily enough I've been using Codex 5.3 on maximum thinking for bug hunting and code reviews and it's been really good at it (it's just seem to have a completely different focus than Opus.)

I generally don't like the way codex approaches coding itself so I just feed its review comments back in to Claude Code and off we go.

stavros · 2026-02-19T10:05:30 1771495530

I just created an OpenCode skill where both these models will talk to each other and discuss bug-finding approaches.

In my experience, two different models together works much better than one, that's why this subscription banning is distressing. I won't be able to use a tool that can use both models.

baq · 2026-02-19T12:30:48 1771504248

Literally a skill issue.

kilroy123 · 2026-02-19T08:54:19 1771491259

I agree with you. Codex 5.3 is good it's just a bit slower.

andreagrandi · 2026-02-19T09:06:15 1771491975

It is (slower), especially at xhigh setting. But if I have to redo things three times, keep confirming trivial stuff (Claude Code seems to keep changing the commands it uses to read code... once it uses "bash-read", once it uses "tree", once it uses "head" and I have to keep confirming permission), I definitely waste more time than give a command to codex (or in my case OpenCode + codex model) and come back after 10 minutes.

trillic · 2026-02-19T13:51:49 1771509109

The rate limit for my $20 OpenAI / Codex account feels 10x larger than the $20 claude account.

choilive · 2026-02-19T15:00:19 1771513219

YES. I hit the rate limit in about ~15 mins on Claude. But it will take me a few hours with Codex. A/B testing them on the same tasks. Same $20/mo.

GorbachevyChase · 2026-02-19T13:15:09 1771506909

I was underwhelmed by Opus4.6. I didn’t get a sense of significant improvement, but the token usage was excessive to the point that I dropped the subscription for codex. I am suspect that all the models are so glib that they can create a quagmire for themselves in a project. I have not yet found a satisfying strategy for non-destructive resets when the systems own comments and notes poisons new output. Fortunately, deleting and starting over is cheap.

dannersy · 2026-02-19T07:35:28 1771486528

No offense, but this is the most predicable outcome ever. The software industry at large does this over and over again and somehow we're surprised. Provide thing for free or for cheap, and then slowly draw back availability once you have dominant market share or find yourself needing money (ahem).

The providers want to control what AI does to make money or dominate an industry so they don't have to make their money back right away. This was inevitable, I do not understand why we trust these companies, ever.

NamlchakKhandro · 2026-02-19T08:37:23 1771490243

because it's easier than paying $50k for local llm setup that might not last 5 years.

dannersy · 2026-02-19T08:49:20 1771490960

Well, yes. They know what they are doing. They know when given the option the consumer makes the affordable choice. I just don't have to like or condone their practices. Maybe instead of taking on billions of dollars of debt they should have thought about a business model that makes sense first? Maybe the collective "we" (consumers and investors, but especially investors) should keep it in our pants until the product is proven and sustainable?

It will be real interesting if the haters are right and this technology is not the breakthrough the investors assume it to be AFTER it is already sewn into everyone's work flows. Everyone keeps talking about how jobs will be displaced, yet few are asking what happens when a dependency is swept out from underneath the industry as a whole if/when this massive gamble doesn't pay off.

Whatever. I am squawking into the void as we just repeat history.

newswasboring · 2026-02-19T09:45:31 1771494331

Or the companies can be transparent about their product roadmap. I can guarantee this enshittification was on the roadmap way before we knew about it. They let us operate under false information, that's just weak behavior.

andreagrandi · 2026-02-19T10:26:11 1771496771

No offense taken here :)

First, we are not talking about a cheap service here. We are talking about a monthly subscription which costs 100 USD or 200 USD per month, depending on which plan you choose.

Second, it's like selling me a pizza and pretending I only eat it while sitting at your table. I want to eat the pizza at home. I'm not getting 2-3 more pizzas, I'm still getting the same pizza others are getting.

neya · 2026-02-19T10:06:08 1771495568

It's the most overrated model there is. I do Elixir development primarily and the model sucks balls in comparison to Gemini and GPT-5x. But the Claude fanboys will swear by it and will attack you if you ever say even something remotely negative about their "god sent" model. It fails miserably even in basic chat and research contexts and constantly goes off track. I wired it up to fire up some tasks. It kept hallucinating and swearing it did when it didn't even attempt to. It was so unreliable I had to revert to Gemini.

resiros · 2026-02-19T11:04:48 1771499088

It might simply be that it was not trained enough in Elixir RL environments compared to Gemini and gpt. I use it for both ts and python and it's certainly better than Gemini. For Codex, it depends on the task.

thepasch · 2026-02-19T16:01:39 1771516899

> I’m only waiting for OpenAI to provide an equivalet ~100 USD subscription to entirely ditch Claude.

I have a feeling Anthropic might be in for an extremely rude awakening when that happens, and I don’t think it’s a matter of “if” anymore.

submain · 2026-02-19T17:21:15 1771521675

> And don't even let me get started on how bad Claude client has become

The latest versions of claude code have been freezing and then crashing while waiting on long running commands. It's pretty frustrating.

WarmWash · 2026-02-19T15:07:24 1771513644

My favorite conspiracy explanation:

Claude has gotten a lot of popular media attention in the last few weeks, and the influx of users is constraining compute/memory on an already compute heavy model. So you get all the suspected "tricks" like quantization, shorter thinking, KV cache optimizations.

It feels like the same thing that happened to Gemini 3, and what you can even feel throughout the day (the models seem smartest at 12am).

Dario in his interview with dwarkesh last week also lamented the same refrain that other lab leaders have: compute is constrained and there are big tradeoffs in how you allocate it. It feels safe to reason then that they will use any trick they can to free up compute.

cactusplant7374 · 2026-02-19T11:03:42 1771499022

No developer writes the same prompt twice. How can you be sure something has changed?

kasey_junk · 2026-02-19T11:42:14 1771501334

I regularly run the same prompts twice and through different models. Particularly, when making changes to agent metadata like agent files or skills.

At least weekly I run a set of prompts to compare codex/claude against each other. This is quite easy the prompt sessions are just text files that are saved.

The problem is doing it enough for statistical significance and judging the output as better or not.

andreagrandi · 2026-02-19T13:23:15 1771507395

I suspect you may not be writing code regularly... If I have to ask Claude the same things three times and it keeps saying "You are right, now I've implemented it!" and the code is still missing 1 out of 3 things or worse, then I can definitely say the model has become worse (since this wasn't happening before).

cactusplant7374 · 2026-02-19T14:11:00 1771510260

> I suspect you may not be writing code regularly...

You have no reason to suspect this.

2026-02-19T14:00:03 1771509603

[dead]

andreagrandi · 2026-02-19T14:51:17 1771512677

I haven't experiences this with gpt-5.3-codex (xhigh) for example. Opus/Sonnet usually work well when just released, then they degrade quite regularly. I know the prompts are not the same every day or even across the day, but if the type of problems are always the same (at least in my case) and a model starts doing stupid things, then it means something is wrong. Everyone I know who uses Claude regularly, usually have the same esperience whenever I notice they degrade.

SkyPuncher · 2026-02-19T13:28:48 1771507728

When I use Claude daily (both professionally and personally with a Max subscription), there are things that it does differently between 4.5 and 4.6. It's hard to point to any single conversation, but in aggregate I'm finding that certain tasks don't go as smoothly as they used to. In my view, Opus 4.6 is a lot better at long standing conversations (which has value), but does worse with critical details within smaller conversations.

A few things I've noticed:

* 4.6 doesn't look at certain files that it use to

* 4.6 tends to jump into writing code before it's fully understood the problem (annoying but promptable)

* 4.6 is less likely to do research, write to artifacts, or make external tool calls unless you specifically ask it to

* 4.6 is much more likely to ask annoying (blocking) questions that it can reasonably figure out on it's own

* 4.6 is much more likely to miss a critical detail in a planning document after being explicitly told to plan for that detail

* 4.6 needs to more proactively write its memories to file within a conversation to avoid going off track

* 4.6 is a lot worse about demonstrating critical details. I'm so tired of it explaining something conceptually without it thinking about how it implements details.

SkyPuncher · 2026-02-19T15:58:51 1771516731

Just hit a situation where 4.6 is driving me crazy.

I'm working through a refactor and I explicitly told it to use a block (as in Ruby Blocks) and it completely overlooked that. Totally missed it as something I asked it to do.

baq · 2026-02-19T12:32:12 1771504332

Ralph Wiggum would like a word

cactusplant7374 · 2026-02-19T14:10:13 1771510213

Same prompt assumes same context state. But I think you get what I mean.

bbstats · 2026-02-19T13:10:42 1771506642

all this because of a single week?

andreagrandi · 2026-02-19T13:21:42 1771507302

No, it's not the first time their models degrade for some time.

andreagrandi · 2026-02-09T21:50:38 1770673838

Jump here, you can see Lucca (as we say in Italy, more or less..)

andreagrandi · 2026-02-09T07:22:52 1770621772

same! I personally released a couple of CLIs (written using Claude Code) which I regularly use for my work: logbasset (to access Scalyr logs) and sentire (to access Sentry issues). I never use them manually, I wrote them to be used well by LLMs. I think they are lighter compared to an MCP.

andreagrandi · 2025-08-29T19:09:45 1756494585

You can opt out. It’s written quit quite clearly

distances · 2025-08-29T22:27:25 1756506445

I don't think that retention part was clear at all. It was separate from the opt-out. I assume I'm now opted out but that they'll keep the data for five years anyway.

robwwilliams · 2025-08-30T00:07:05 1756512425

You want them to flush your conversations on your own schedule. That I can understand. If you delete a conversation it should be DELETED.

andreagrandi · 2025-08-27T14:13:38 1756304018

Ok, so it's not just me. I was just struggling to assign a PR to a couple of colleagues and select a label (on a M2 Pro with 32 GB RAM!)

andrekandre · 2025-08-28T05:42:35 1756359755

same except 64GB and M3 Max smh... takes literally minutes to open the "Labels" popup and make a pr... its completely unacceptable for a product like this...

andreagrandi · 2025-08-06T19:20:14 1754508014

Another one not having a clue about what “consent” means. Next?

andreagrandi · 2025-06-26T11:31:23 1750937483

I've a few concerns:

1) I tried to use it on an existing project asking this "Analyse the project and create a GEMINI.md". It fumbled some non sense for 10-15 minutes and after that it said it was done, but it had only analysed a few files in the root and didn't generate anything at all.

2) Despite finding a way to login with my workspace account, it then asks me for the GOOGLE_CLOUD_PROJECT which doesn't make any sense to me

3) It's not clear AT ALL if and how my data and code will be used to train the models. Until this is pretty clear, for me is a no go.

p.s: it feels like a promising project which has been rushed out too quickly :/

XCSme · 2025-06-26T11:35:36 1750937736

> but it had only analysed a few files in the root and didn't generate anything at all

This is my experience with ALL AI editors and models. Half of the time, they say they changed things, but they didn't change anything.

UrineSqueegee · 2025-06-26T11:39:24 1750937964

its share data ON by default and you can turn it off in the options

andreagrandi · 2025-06-26T11:50:18 1750938618

which options? I don't see anything in the app