More

ajkdhcb2 · on March 27, 2025

My experiences have all been like this too. I am puzzled by how some people say it works for them

simonw · on March 27, 2025

I wrote this article precisely for people who are having trouble getting good results out of LLMs for coding: https://simonwillison.net/2025/Mar/11/using-llms-for-code/

mplanchard · on March 28, 2025

I’ve followed your blog for a while, and I have been meaning to unsubscribe because the deluge of AI content is not what I’m looking for.

I read the linked article when it was posted, and I suspect a few things that are skewing your own view of the general applicability of LLMs for programming. One, your projects are small enough that you can reasonably provide enough context for the language model to be useful. Two, you’re using the most common languages in the training data. Three, because of those factors, you’re willing to put much more work into learning how to use it effectively, since it can actually produce useful content for you.

I think it’s great that it’s a technology you’re passionate about and that it’s useful for you, but my experience is that in the context of working in a large systems codebase with years of history, it’s just not that useful. And that’s okay, it doesn’t have to be all things to all people. But it’s not fair to say that we’re just holding it wrong.

simonw · on March 28, 2025

"my experience is that in the context of working in a large systems codebase with years of history, it’s just not that useful."

It's possible that changed this week with Gemini 2.5 Pro, which is equivalent to Claude 3.7 Somnet in terms of code quality but has a 1 million token context (with excellent scores on long context benchmarks) and an increased output limit too.

I've been dumping hundreds of thousands of times of codebase into it and getting very impressive results.

mplanchard · on March 28, 2025

See this is one of the things that’s frustrating about the whole endeavor. I give it an honest go, it’s not very good, but I’m constantly exhorted to try again because maybe now that Model X 7.5qrz has been released, it’ll be really different this time!

It’s exhausting. At this point I’m mostly just waiting for it to stabilize and plateau, at which point it’ll feel more worth the effort to figure out whether it’s now finally useful for me.

simonw · on March 28, 2025

Not going to disagree that it's exhausting! I've been trying to stay on top of new developments for the past 2.5 years and there are so many days when I'll joke "oh, great, it's another two new models day".

Just on Tuesday this week we got the first widely available high quality multi-modal image output model (GPT-4o images) and a new best-overall model (Gemini 2.5) within hours of each other. https://simonwillison.net/2025/Mar/25/

jjani · on March 28, 2025

> One, your projects are small enough that you can reasonably provide enough context for the language model to be useful. Two, you’re using the most common languages in the training data. Three, because of those factors, you’re willing to put much more work into learning how to use it effectively, since it can actually produce useful content for you.

Take a look at the 2024 StackOverflow survey.

70% of professional developer respondents had only done extensive work over the last year in one of:

JS 64.6% SQL 54.1% JTML/CSS 52.9% PY 46.9% TS 43.4% Bash/Shell 34.2% Java 30%

LLMs are of course very strong in all of these. 70% of developers only code in languages LLMs are very strong at.

If anything, for the developer population at large, this number is even higher than 70%. The survey respondents are overwhelmingly American (where the dev landscape is more diverse), and self-select to those who use niche stuff and want to let the world know.

Similar argument can be made for median codebase size, in terms of LOC written every year. A few days ago he also gave Gemini Pro 2.5 a whole codebase (at ~300k tokens) and it performed well. Even in huge codebases, if any kind of separation of concerns is involved, that's enough to give all context relevant to the part of the code you're working on. [1]

[1] https://simonwillison.net/2025/Mar/25/gemini/

mplanchard · on March 28, 2025

What’s 300k tokens in terms of lines of code? Most codebases I’ve worked on professionally have easily eclipsed 100k lines, not including comments and whitespace.

But really that’s the vision of actual utility that I imagined when this stuff first started coming out and that I’d still love to see: something that integrates with your editor, trains on your giant legacy codebase, and can actually be useful answering questions about it and maybe suggesting code. Seems like we might get there eventually, but I haven’t seen that we’re there yet.

simonw · on March 28, 2025

We hit "can actually be useful answering questions about it" within the last ~6 months with the introduction of "reasoning" models with 100,000+ token contest limits (and the aforementioned Gemini 1 million/2 million models).

The "reasoning" thing is important because it gives models the ability to follow execution flow and answer complex questions that down many different files and classes. I'm finding it incredible for debugging, eg: https://gist.github.com/simonw/03776d9f80534aa8e5348580dc6a8...

I built a files-to-prompt tool to help dump entire codebases into the larger models and I use it to answer complex questions about code (including other people's projects written in languages I don't know) several times a week. There's a bunch of examples of that here: https://simonwillison.net/search/?q=Files-to-prompt&sort=dat...

jjani · on March 28, 2025

How much lines of context and understanding can we, human developers, keep in our heads, taken into account and refer to when implementing something?

Whatever the amount may be, it definitely fits into 300k tokens.

mplanchard · on March 29, 2025

After more than a few years working on a codebase? Quite a lot. I know which interfaces I need and from where, what the general areas of the codebase are, and how they fit together, even if I don’t remember every detail of every file.

simoncion · on March 28, 2025

> But it’s not fair to say that we’re just holding it wrong.

<troll>Have you considered that asking it to solve problems in areas it's bad at solving problems is you holding it wrong?</troll>

But, actually seriously, yeah, I've been massively underwhelmed with the LLM performance I've seen, and just flabbergasted with the subset of programmer/sysadmin coworkers who ask it questions and take those answers as gospel. It's especially frustrating when it's a question about something that I'm very knowledgeable about, and I can't convince them that the answer they got is garbage because they refuse to so much as glance at supporting documentation.

Workaccount2 · on March 28, 2025

LLMs need to stay bad. What is going to happen if we have another few GPT-3.5 to Gemini 2.5 sized steps? You're telling people who need to keep the juicy SWE gravy train running for another 20 years to recognize that the threat is indeed very real. The writing is on the wall and no one here (here on HN especially) is going to celebrate those pointing to it.

bluefirebrand · on March 28, 2025

I don't think people really realize the danger of mass unemployment

Go look up what happens in history when tons of people are unemployed at the same time with no hope of getting work. What happens when the unemployed masses become desperate?

Naw I'm sure it will be fine, this time will be different

codegangsta · on March 28, 2025

Just wanted to chime in and say how appreciative I’ve been about all your replies here, and overall content on AI. Your takes are super reasonable and well thought out.

ajkdhcb2 · on March 27, 2025

True. People use completely unjustified anthropomorphised terminology for marketing reasons and it bothers me a lot. I think it actually holds back understanding how it works. "Hallucinate" is the worst - it's an error and undesired result, not a person having a psychotic episode

ajkdhcb2 · on March 18, 2025

Soft skills aren't underrated. On the contrary, people talk about them all the damn time to the point that it dominates hiring practices and the interview process

ajkdhcb2 · on Dec 5, 2024

The whitepaper shows it was intended to be "peer to peer electronic cash". It did clearly fail at that goal since it has none of the privacy of cash and so on.

aryan14 · on Dec 5, 2024

How did it fail at being P2P e-cash?

You can use bitcoin completely/as anonymously as cash if you wanted to, the same way you’d have to put in a bit of effort to have a completely anonymous cash transaction.

Monero (XMR) which is a fork of bitcoin, is much closer to actual cash

ajkdhcb2 · on Dec 5, 2024

Monero is fine. But how can you even use bitcoin anonymously now? Even wasabi and samourai wallet are shut down as far as i know. Seems highly risky to use any tool like that since i dont know what the consequences will be in the future. Joinmarket? I'm sure it is possible but it is error-prone, takes so much effort, time and fees that i see it as a failure

aryan14 · on Dec 6, 2024

A very easy anonymous way is to simply use a self custody wallet like Exodus.

You can buy ETH, then you can use a TOR crypto exchange to swap it from ETH -> BTC into your wallet.

Just an idea though

polotics · on Dec 5, 2024

well you can use Bisq right?

ajkdhcb2 · on Dec 5, 2024

I have used it, but i dont see how it is related to being able to transact privately. As soon as you do anything linked to your name with those coins puchased on bisq, the whole batch is linked to your identity

stackingsaunter · on Dec 5, 2024

Lightning is private

ajkdhcb2 · on Dec 5, 2024

Not based on everything ive read. Happy to be shown evidence of otherwise

ajkdhcb2 · on Dec 5, 2024

It is scarce, novel, transferable, passed critical mass of popularity long ago, uncensorable to some degree for now. That's all that's needed, it makes it better money than perhaps anything else

ajkdhcb2 · on Dec 5, 2024

It isn't fungible at all since every coin has a history which in some cases makes it unspendable, so it is highly risky to accept transfers without a third party chain analysis report, otherwise i agree

throwme_123 · on Dec 5, 2024

Report which you don't get from centralized exchanges. So you have to trust them and they should somehow be liable for the provenance of what they sell to you.

Probably most CEX don't allocate UTXOs upon buying, but only for withdrawals

ajkdhcb2 · on Dec 5, 2024

I am aware of a couple of cases where people received flagged coins from centralised exchanges and had a lot of issues

ajkdhcb2 · on Aug 31, 2024

I can't read reddit anymore because I always get "Your request has been blocked due to a network policy. Try logging in or creating an account here to get back to browsing."

Any way to bypass this?

doublerabbit · on Aug 31, 2024

https://reddit.garudalinux.org/r/BuyBorrowDieExplained/comme...

or host a private instance.

https://github.com/redlib-org/redlib

icegreentea2 · on Aug 31, 2024

Did you try other browsers? For some reason for my home IP address, only Firefox (desktop) is blocked. Chrome and Edge and even mobile Firefox work fine.

synthoidzeta · on Aug 31, 2024

You can access via tor (they have an onion address as well) — or run individual links through an archive service

bentley · on Aug 31, 2024

For the “old” Reddit frontend: https://old.reddittorjg6rue252oqsxryoxengawnmo46qy4kyii5wtqn...

For the new, crappier Reddit frontend (which has more dark patterns funneling people to the app): https://reddittorjg6rue252oqsxryoxengawnmo46qy4kyii5wtqnwfj4...

However, the TLS certificates on both have been expired since August 28.

ajkdhcb2 · on June 19, 2024

>I want my web browser to just be a web browser, no unnecessary add-ons.

When i try to consider what that really means, it leads me to think it actually encompasses a large and ever-increasing scope of features that have just become the norm. Browsers are becoming almost a general purpose OS.

Also, some of those features clearly require network effect to function which entails having it officially supported and suggested to the users (or even making it default).

ajkdhcb2 · on June 18, 2024

That's such an outrageously absurd statement that i find it hard to not read it as sarcasm

ajkdhcb2 · on March 16, 2024

The Ministry of Truth.