> I am talking about Opus 4.6 and Codex 5.3 here, at high+ effort settings
So you have to burn tokens at the highest available settings to even have a chance of ending up with code that's not completely terrible (and then only in very specific domains), but of course you then have to review it all and fix all the mistakes it made. So where's the gain exactly? The proper goal is for those 500 lines to be almost always truly comparable to what a human would've written, and not turn into an unmaintainable mess. And AI's aren't there yet.
So you have to burn tokens at the highest available settings to even have a chance of ending up with code that's not completely terrible (and then only in very specific domains), but of course you then have to review it all and fix all the mistakes it made. So where's the gain exactly? The proper goal is for those 500 lines to be almost always truly comparable to what a human would've written, and not turn into an unmaintainable mess. And AI's aren't there yet.