You have to hold AI hand to do even simple vanilla JS correctly. Or do framework code which is well documented all over the net. I love AI and use it for programming a lot, but the limitations are real.
Exactly that is also my experience also with Claude Code. It can create a lot of stuff impressively but with LOTS of more code than necessary. It’s not really effective in the end. I have more than 35 years of coding experience and always dig into the newest stuff. Quality wise it’s still not more than junior dev stuff even with latest models, sorry. And I know how to talk to these machines.
I don't have as many years of professional experience as you do, but IMO code pissing is one of the areas LLMs and "agentic tools" shine the least.
In both personal projects and $dayjob tasks, the highest time-saving AI tasks were:
- "review this feature branch" (containing hand-written commits)
- "trace how this repo and repo located at ~/foobar use {stuff} and how they interact with each other, make a Mermaid diagram"
- "reverse engineer the attached 50MiB+ unstripped ELF program, trace all calls to filesystem functions; make a table with filepath, caller function, overview of what caller does" (the table is then copy-pasted to Confluence)
- basic YAML CRUD
Also while Anthropic has more market share in B2B, their model seems optimized for frontend, design, and literary work rather than rigorous work; I find it to be the opposite with their main competitor.
Claude writes code rife with safety issues/vulns all the time, or at least more than other models.
The other day I (well, the AI) just wrote a Rust app to merge two (huge, GB of data) tables by discovering columns with data in common based on text distance (levenshtein and Dice) . It worked beautifully
An i have NEVER made one line of Rust.
I dont understand nay-sayers, to me the state of gen.AI is like the simpsons quote "worst day so far". Look were we are within 5 years of the first real GPT/LLM. The next 5 years are going to be crazy exciting.
The "programmer" position will become a "builder". When we've got LLMs that generate Opus quality text at 100x speed (think, ASIC based models) , things will get crazy.
Human minds are built to find patterns, and you should be careful not to assume the rate of improvement will continue forever based on nothing but a pattern.
Just the fact that even retail quality hardware is still improving at local LLM significantly is still a great sign.
If AI quality remained the same, and the cost for local hardware dropped to $1000, it would still be the greatest thing since the internet IMO.
So even if the worst happens and all progress stops, I'm still very happy with what we got.
I'm not all that impressed with "AI". I often "race" the AI by giving it a task to do, and then I start coding my own solution in parallel. I often beat the AI, or deliver a better result.
Artificial Intelligence is like artificial flavoring. It's cheap and tastes passable to most people, but real flavors are far better in every way even if it costs more.
At their current stage, this feels like the wrong way to use them. I use them fully supervised, (despite the fact that feels like I’m fighting the tools), which is kind of the best of both worlds. I review every line of code before I allow the edit, and if something is wrong, I tell it to fix it. It learns over time, especially as I set rules in memories, and so the process has sped up, to the point that this goes way faster than if I would have done that myself. Not all tasks are appropriate for LLMs at all, but when they are, this supervised mode is quite fast, and I don’t believe the output to be slop, but anyways I feel like I own every line of code still.
The happy path for me is with erlang, due to the concurrency model the blast radius of an error is exceptionally small, so the programming style is to let things crash if they go wrong. So, really you are writing the happy path code only (most of the time). Combine this approach with some very robust tests (does this thing pass the tests / behave how we need it to?) then you’re close to the point of not really caring about the implementation at all.
Of course, i still do, but i could see not caring being possible down the road with such architectures..
Home made food is better than anything you can buy too. Im 40 but I still drive 30 minutes to my parents once a week for dinner because the food they make feels like the elixir of life compared to the slop I can buy in trader joes, Costco, or most restaurants.
The overall trend in AI performance will still be up and to the right like everything else in computing over the past 50 years, improvement doesn't have to be linear
Because if you don't know the language or problem space, there are footguns in there that you can't find, you won't know what to look for to find them. Only until you try to actually use this in a production environment will the issues become evident. At that point, you'll have to either know how to read and diagnose the code, or keep prompting till you fix it, which may introduce another footgun that you didn't know that you didn't know.
This is what gets me. The tools can be powerful, but my job has become a thankless effort in pointing out people's ignorance. Time and again, people prompt something in a language or problem space they don't understand, it "works" and then it hits a snag because the AI just muddled over a very important detail, and then we're back to the drawing board because that snag turned out to be an architectural blunder that didn't scale past "it worked in my very controlled, perfect circumstances, test run." It is getting really frustrating seeing this happen on repeat and instead of people realizing they need to get their hands dirty, they just keep prompting more and more slop, making my job more tedious. I am basically at the point where I'm looking for new avenues for work. I say let the industry just run rampant with these tools. I suspect I'll be getting a lot of job offers a few years from now as everything falls apart and their $10k a day prompting fixed one bug to cause multiple regressions elsewhere. I hope you're all keeping your skills sharp for the energy crisis.
Before LLMs, I've watched in horror as colleagues immediately copy-paste-ran Stack Overflow solutions in terminal, without even reading them.
LLM agents are basically the same, except now everyone is doing it. They copy-paste-run lots of code without meaningfully reviewing it.
My fear is that some colleagues are getting more skilled at prompting but less skilled at coding and writing. And the prompting skills may not generalize much outside of certain LLMs.
I don't want exciting. I want a stable, well-paying job that allows me to put food on the table, raise a family with a sense of security and hope, and have free time.
I seem to remember doing it in SQL (EDIT_DISTANCE) 20ish years ago. While I wouldn't say it worked beautifully, I also didn't need to make a single line of Rust :) also no more than 2 line s of SQL were needed.
Edit_distance uses pure levenstein which is quadratic, so for tables of 500k rows and 20+ columns each it will slowdown to a crawl. Without going into a lot of detail, I needed this to work for datasets of that size. So a lot of "trick" optimization and pre-processing has to be done.
Otherwise simple merges in pandas or sql/duckdb would had sufficed.
Years of school (reading, calculus etc) to get to the point of learning basics of set theory.
One day to learn basic SQL based on understanding the set theory.
Maybe few weeks of using SQL at work for ad hoc queries to be proficient enough (the query itself wasn't really complex).
For the domain itself I was consulting experts to see what matters.
I'm not sure that time it would take to know what to prompt and verify the results is much different.
Fun fact - management decided that SQL solution wasn't enerprisely enough so they hired external consultants to build a system doing essentialy that but in Java + formed an 8 people internal team to guide them. I heard they finished 2 years later with a lot of manual matching.
Let me explain the naysayers, they know "programmer" has always meant "builder" and just because search is better and you can copy and paste faster doesn't mean you've built anything.First thing people need to realize is no proprietary code is in those databases, and using Ai will ultimately just get you regurgitated things people don't really care about. Use it all you want, you won't be able to do anything interesting, they aren't giving you valuable things for free. Anything of value will still take time and knowledge. The marketing hype is to reduce wages and prevent competition. Go for it.
I must say, I do love how this comment has provoked such varying responses.
My own observations about using AI to write code is that it changes my position from that of an author to a reviewer. And I find code review to be a much more exhausting task than writing code in the first place, especially when you have to work out how and why the AI-generated code is structured the way it is.
There's a very wide range of programming tasks of differing difficulty that people are using / trying to use it for, and a very wide range of intelligence amongst the people that are using / trying to use it, and who are evaluating its results. Hence, different people have very different takes.
LLMs can't lie nor can they tell the truth. These concepts just don't apply to them.
They also cannot tell you what they were "thinking" when they wrote a piece of code. If you "ask" them what they were thinking, you just get a plausible response, not the "intention" that may or may not have existed in some abstract form in some layer when the system selected tokens*. That information is gone at that point and the LLM has no means to turn that information into something a human could understand anyways. They simply do not have what in a human might be called metacognition. For now. There's lots of ongoing experimental research in this direction though.
Chances are that when you ask an LLM about their output, you'll get the response of either someone who now recognized an issue with their work, or the likeness of someone who believes they did great work and is now defending it. Obviously this is based on the work itself being fed back through the context window, which will inform the response, and thus it may not be entirely useless, but... this is all very far removed from what a conscious being might explain about their thoughts.
The closest you can currently get to this is reading the "reasoning" tokens, though even those are just some selected system output that is then fed back to inform later output. There's nothing stopping the system from "reasoning" that it should say A, but then outputting B. Example: https://i.imgur.com/e8PX84Z.png
* One might say that the LLM itself always considers every possible token and assigns weights to them, so there wouldn't even be a single chain of thought in the first place. More like... every possible "thought" at the same time at varying intensities.
That is fine. You should, and you'll get the best results doing so.
>LLMs can't lie nor can they tell the truth. These concepts just don't apply to them
Nobody really knows exactly what concepts do and don't apply to them. We simply don't have a great enough understanding of the internal procedures of a trained model.
Ultimately this is all irrelevant. There are multiple indications that the same can be said for humanity, that we perform actions and then rationalize them away even without realizing it. That explanations are often if not always post-hoc rationalizations, lies we tell even ourselves. There's evidence for it. And yet, those explanations can still be useful. And I'm sure OP was trying to point out that is also the case for LLMs.
I’m not anthropomorphizing. I’ve been in many situation where the AI wrote some code some way and I had to ask why, it told me why and then we moved on to better solutions as needed. Better if it just wrote the code and its reasoning was still in context, but even if it’s not, it can usually reverse engineer what it wrote well enough. Then it’s a conversation about whether there is a better clearer way to do it, the code improves.
It sounds like you either have access to bad models or you are just imagining what it’s like to use an LLM in this way and haven’t actually tried asking it why it wrote something. The only judgement you need to make is the explanation makes sense or not, not some technical or theoretical argument about where the tokens in the explanation come from. You just ask questions until you can easily verify things for yourself.
Also, pretending that the LLM is still just token predicting and isn’t bringing in a lot of extra context via RAG and using extra tokens for thinking to answer a query is just way out there.
You just steamrolled on, pretty much ignoring the comment you are replying to, made unkind assumptions, and put words in my mouth to boot. I don't mind some aggressive argumentation, but this misses the mark so completely that I have really no idea how to have a constructive conversation this way.
> where the AI wrote some code some way and I had to ask why, it told me why
I just explained that it cannot tell you why. It's simply not how they work. You might as well tell me that it cooked you dinner and did your laundry.
> the code improves.
We can agree on this. The iterative process works. The understanding of it is incorrect. If someone's understanding of a hammer superficially is "tool that drives pointy things into wood", they'll inevitably try to hammer a screw at some point - which might even work, badly.
> It sounds like you either have access to bad models or you are just imagining what it’s like to use an LLM in this way
Quoting this is really enough. You may imagine me sighing.
> Also, pretending that the LLM is still just token predicting
Strawman.
Overall your comment is dancing around engaging with what is being said, so I will not waste my time here.
Human code is still easier to review. Also, I program 80% of the time and review PRs 20% of the time. With AI, that becomes: I review 80% of the time, and write markdown and wait 20% of the time.
This is not my experience either. If you put the work in upfront to plan the feature, write the test cases, and then loop until they pass... you can build a lot of high quality software quickly. The difference between a junior engineer using it and a great architect using it is significant. I think of it as an amplifier.
> If you put the work in upfront to plan the feature, write the test cases, and then loop until they pass...
it can be exhausting and time consuming front-loading things so deeply though; sometimes i feel like i would have been faster cutting all that out and doing it myself because in the doing you discover a lot of missing context (in the spec) anyways...
that's just not even remotely my experience. and i am ~20k hours into my programming career. ai makes most things so much faster that it is hard to justify ever doing large classes of things yourself (as much as this hurts my aesthetic sensibilities, it simply is what it is).
I've never seen a human estimate their "programming career" in kilohours. Is that supposed to look more impressive than years? So, you've been programming only about 7 years? I guess I'm at about "170 kilohours".
As well as the peer comment about Gladwell (10k hours is considered the point you've mastered a skill), it's also a far more honest metric about how much time you've spent actually programming.
Maybe you were writing code, make design choices and debugging 8 hours a day.
Maybe you were primarily doing something else and only writing code for an hour a day.
Who would be the better programmer? The first guy with one year of experience or the second guy with 7 years?
I personally would only measure my experience in years, because it's approaching 3 decades full-time in industry (plus an additional decade of cutting my teeth during school and university), but I can certainly see that earlier on in a career it's a useful metric in comparison to the 10,000 hours.
> Maybe you were writing code, make design choices and debugging 8 hours a day. Maybe you were primarily doing something else and only writing code for an hour a day. Who would be the better programmer? The first guy with one year of experience or the second guy with 7 years?
So your logic is that the grandparent specified hours because they spent that many hours specifically programming, and not by just multiplying the number of years by the number of hours in a year?
I don't know exactly how they arrived at their 20k hours figure, all I'm saying is that it didn't seem a controversial way of expressing their experience level, and assumed it was intended to be a comparison to the typical 10k hours needed for mastery of a craft.
Part of this depends on if you care that the AI wrote the code "your way." I've been in shops with rather exotic and specific style guides and standards which the AI would not or will not conform to.
Yeah, I also highly value consistency in my projects, which forces me to keep an eye on the LLM and steer it often. This limits my overall velocity especially on larger features. But I'm still much faster with the agent. Recent example, https://github.com/igor47/csheet/pull/68 -- this took me a couple of hours pairing with Claude code, which is insane give the size of the work here. Though this PR creates a bunch of tables, routes, services -- it's not just greenfield CRUD work. We're figuring out how to model a complicated domain, integrating with existing code, thinking through complex integrations including with LLMs at run time. Claude is writing almost all the code, I'm just steering
AI assisted code can't even stick to the API documentation, especially if the data structures are not consistent and have evolved over time. You would see Claude literally pulling function after function from thin air, desperately trying to fulfill your complicated business logic and even when it's complete, it doesn't look neat at all. Yes, it will have test coverage, but one more feature request will probably break the back of the camel. And if you raise that PR to the rest of your team, good luck trying to summarise it all to your colleagues.
However if you just have an easy project, or a greenfield project, or don't care about who's going to maintain that stuff in 6 months, sure, go all in with AI.
I definitely wonder if the people going all-in on AI harnessing are working on greenfield projects, because it seems overwhelming to try to get that set up on a brownfield codebase where the patterns aren't consistent and the code quality is mixed.
So just iterate on it? Your complaint is that the model isn't one shotting the problem and reading your mind about style. It's like any coding workflow, make it work, then make it nice.
No, I never expect AI to one-shot (if I see such a miracle, it's usually because I needed a one-liner or something really simple and well documented, which I can also write on the whiteboard from memory).
Try iterating over well known APIs where the response payloads are already gigantic JSONs, there are multiple ways to get certain data and they are all inconsistent and Claude spits out function after function, laying waste to your codebase. I found no amount of style guideline documents to resolve this issue.
I'd rather read the documentation myself and write the code by hand rather than reviewing for the umpteenth time when Claude splits these new functions between e.g. __init__.py and main.py and god knows where, mixing business logic with plumbing and transport layers as an art form. God it was atrocious during the first few months of FastMCP.
Most of this thread is debating whether models are good or bad at writing code... however, I think a more important question is what we feed the AI with because that dramatically determines the quality of the output.
When your agent explores your codebase trying to understand what to build, it read schema files, existing routes, UI components etc... easily 50-100k tokens of implementation detail. It's basically reverse-engineering intent from code. With that level of ambiguous input, no wonder the results feel like junior work.
When you hand it a structured spec instead including data model, API contracts, architecture constraints etc., the agent gets 3-5x less context at much higher signal density. Instead of guessing from what was built it knows exactly what to build. Code quality improves significantly.
I've measured this across ~47 features in a production codebase with amedian ratio: 4x less context with specs vs. random agent code exploration. For UI-heavy features it's 8-25x. The agent reads 2-3 focused markdown files instead of grepping through hundreds of KB of components.
To pick up @wek's point about planning from above: devs who get great results from agentic development aren't better prompt engineers... they're better architects. They write the spec before the code, which is what good engineering always was... AI just made the payoff for that discipline 10x more visible.