Hacker Timesnew | past | comments | ask | show | jobs | submit | frumiousirc's commentslogin

> There are better techniques for hyper-parameter optimisation, right?

Yes, for example "swarm optimization".

The difference with "autoresearch" (restricting just to the HPO angle) is that the LLM may (at least we hope) beat conventional algorithmic optimization by making better guesses for each trial.

For example, perhaps the problem has an optimization manifold that has been studied in the past and the LLM either has that study in its training set or finds it from a search and learns the relative importance of all the HP axes. Given that, it "knows" not to vary the unimportant axes much and focus on varying the important ones. Someone else did the hard work to understand the problem in the past and the LLM exploits that (again, we may hope).


> Copy paste is hijacked

FWIW, in Kitty on Linux, SHIFT + mouse-select copies and SHIFT + middle-mouse-button pastes. This use of SHIFT and otherwise using standard Unix style copy/paste is common in a lot of TUIs (eg, weechat).


A blanket follow-up "are you sure this is the best way to do it?"

Frequently returns, "Oh, you are absolutely correct, let me redo this part better."


You should start a new session for the code review to make sure the context window is not polluted with the work on implementation itself.

At the end of the day it’s an autocomplete. So if you ask “are you sure?” then “oh, actually” is a statistically likely completion.


> You should start a new session for the code review to make sure the context window is not polluted with the work on implementation itself.

I'm just a sample size of one, but FWIW I didn't find that this noticably improved my results.

Not having to completely recreate all the LLM context neccessary to understand the literal context and the spectrum of possible solutions (which the LLM still "knows" before you clear the session) saves lots of time and tokens.


Interesting, I definitely see better results on a clean session. On a “dirty” session it’s more likely to go with “this is what we implemented, it’s good, we could improve it this way”, whereas on a clean session it’s a lot more likely to find actual issues or things that were overlooked in the implementation session.

This (reverse proxy) is essentially what "tailscale serve" does.


The average native Japanese speaker knows more English than the average native English speaker knows Japanese.


It depends.

Google's AI that gloms on to search is not particularly good for programming. I don't use any OpenAI stuff but talking to those that do, their models are not good for programming compared to equivalent ones from Anthropic or google.

I have good success with free gemini used either via the web UI or with aider. That can handle some simple software dev. The new qwen3.5 is pretty good considering its size, though multi-$k of local GPU is not exactly "free".

But, this also all depends on the experience level of the developer. If you are gonna vibe code, you'll likely need to use a paid model to achieve results even close to what an experienced developer can achieve with lesser models (or their own brain).


Set up mmap properly and you can evaluate small/medium MoE models (such as the recent A3B from Qwen) on most ordinary hardware, they'll just be very slow. But if you're willing to wait you can get a feel for their real capabilities, then invest in what it takes to make them usable. (Usually running them on OpenRouter will be cheaper than trying to invest in your own homelab: even if you're literally running them on a 24/7 basis, the break even point compared to a third-party service is too unrealistic.)


Subjectively, but with tests using identical prompts, I find the quality of qwen3.5 122b below claude haiku by as much as claude haiku is below claude sonnet for software design planning tasks. I have yet to try a like-for-like test on coding.


> But, this also all depends on the experience level of the developer. If you are gonna vibe code,

Where I find it struggles is when I prompt it with things like this:

> I'm using the latest version of Walker (app launcher on Linux) on Arch Linux from the AUR, here is a shell script I wrote to generate a dynamic dmenu based menu which gets sent in as input to walker. This is working perfectly but now I want to display this menu in 2 columns instead of 1. I want these to be real columns, not string padding single columns because I want to individually select them. Walker supports multi-column menus based on the symbol menu using multiple columns. What would I need to change to do this? For clarity, I only want this specific custom menu to be multi-column not all menus. Make the smallest change possible or if this strategy is not compatible with this feature, provide an example on how to do it in other ways.

This is something I tried hacking on for an hour yesterday and it led me down rabbit hole after rabbit hole of incorrect information, commands that didn't exist, flags that didn't exist and so on.

I also sometimes have oddball problems I want to solve where I know awk or jq can do it pretty cleanly but I don't really know the syntax off the top of my head. It fails so many times here. Once in a while it will work but it involves dozens of prompts and getting a lot of responses from it like "oh, you're right, I know xyz exists, sorry for not providing that earlier".

I get no value from it if I know the space of the problem at a very good level because then I'd write it unassisted. This is coming at things from the perspective of having ~20 years of general programming experience.

Most of the problems I give it are 1 off standalone scripts that are ~100-200 lines or less. I would have thought this is the best case scenario for it because it doesn't need to know anything beyond the scope of that. There's no elaborate project structure or context involving many files / abstractions.

I don't think I'm cut out for using AI because if I paid for it and it didn't provide me the solution I was asking for then I would expect a refund in the same way if I bought a hammer from the store and the hammer turned into spaghetti when I tried to use it, that's not what I bought it for.


What LLM are you using? What you describe should be no problem for gemini free or claude haiku and above. Other models, I dunno.


Both ChatGPT's anonymous one as well as Google's "AI mode" on their search page which brings you to a dedicated page to start prompting. I'm not sure if that's Gemini proper because if I goto https://gemini.google.com/app it doesn't have my history.


The "AI mode" in Google search is pretty bad for programming. It is not Gemini.

I don't have direct experience with ChatGPT but those that do that I've talked to place it behind Gemini and Claude models.

Try free Claude or Gemini on the web and see if you have a better experience. Claude free is better than Gemini free. (actually, Gemini free seems extra dumb lately).


Thanks, I tried both and the results were not good IMO.

I gave them the same prompts. Both failed to give a working solution. I lost track of how many times it said "This is the guaranteed to work final solution" which still had the same problem as the 5 previous failures.

I gave up after around 40 failed prompts in a row where it was "Absolutely certain" it will work and is the "final boss" of the solution.


Or, "The I in LLM stands for intelligence."


I'm partial to "The AI is more A than I"


> Note: my main complaint is the sheer number of markdown files over time, but I haven't gotten around to (or needed to) automate this yet, as sometimes these historic planning/debug files are useful for future changes.

FWIW, what you describe maps well to Beads. Your directory structure becomes dependencies between issues, and/or parent/children issue relationship and/or labels ("epic", "feature", "bug", etc). Your markdown moves from files to issue entries hidden away in a JSONL file with local DB as cache.

Your current file-system "UI" vs Beads command line UI is obviously a big difference.

Beads provides a kind of conceptual bottleneck which I think helps when using with LLMs. Beads more self-documenting while a file-system can be "anything".


> these tools are actually just slot machines

Slot machines that are biased toward producing jackpots.

And "jackpots" are a metaphor for "training distribution".


Yeah. You always know you are doing something pretty unique when the LLM can conceptualize it (produce the right English output) but not put it into code.


    MY_API_KEY=$(pass my/api/key | head -1) python manage.py runserver


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: