Hacker Timesnew | past | comments | ask | show | jobs | submit | nibbleyou's commentslogin

What does it contain?

Because a human fails in a known way. If a human does not have expertise in domain X or tech Y, they will fail there and the expectation is that they will fail.

With an LLM you never know where it can fail. There is no domain expertise for an LLM. It can fail in a miserable way in the same domain it worked spectacularly for.


Humans fail in infinitely more complicated ways than LLMs. They can have a difficult personality, a medical issue, family stress, hangover, sleep deprivation or they can just wake on the wrong side of the bed. On any given day, you never know if you will get an expert in domain X or a sleep-deprived version of the same that accidentally drops a database.

Indeed, if you remember before AI took the world by storm, HN used to be chock-full of articles about how the hiring process is broken for both employers and candidates, where you can never tell if what you see is what you get.

When I run a local LLM I get none of that. I hit the intelligence walls or buggy behaviour, but it doesn't matter if it's 8am or 8pm, the model behaves exactly the same. If something doesn't work as I wished, I can retry as many times as I wanted without the model getting angry at me.


Damned squishy humans, with their feelings and moods...

Indeed. It's like saying "the strongest human on their best day can support the roof of this tent for hours, how dare you criticise them for being squishy humans" when someone says "why don't we make an a-frame out of wood?"

LLMs don't make a good A-frame, nor would I classify them as wood-like. People propose LLMs as solutions as if they're wooden when they're teetering contraptions of metal rods, aluminum extrusions, rubber bands, and duct tape. That can do the trick. It can't be relied on to fail reliably like a single solid material like wood.

> masterful piece of financial engineering

Love how we assign positive adjectives to unethical practices by corporates


I think the op was being a bit satirical

I don't think so, considering a substantial proportion of their comments on this site seems to be fanboying for SpaceX in particular and anything AI in general.

s/think/hope ?

I wouldnt class 'masterful' as a positive adjective personally.

EDIT: Downvotes? Not sure why. I would say Darth Vader is masterful of the force, and even that Donald Trump is masterful at being provocative. Masterful is not definitively positive or negative, it just describes being very good at something.


I don't understand this logic. Does whole market mean scamming companies too?

Fun fact, both Enron and Lehman Brothers were in the S&P 500 when they went bankrupt. So yes, the whole market or even the market of the largest companies, includes some that may not be great companies. The beauty of the index is you don't have to know or care, since it'll take care of itself over time.

>The beauty of the index is you don't have to know or care, since it'll take care of itself over time

As long as there are active investors in the market conducting price discovery. Which there always will be, just pointing out that someone has to care, even if you don’t


> As long as there are active investors in the market conducting price discovery. Which there always will be,

Passive funds dominate ow, don't they?


Depends on what you consider passive, I think index funds specifically are only 20% but if you add other low cost ETFs it’s probably about half the market. I don’t think there’s any way to know for sure at what point passive funds become distortionary, but it should be self correcting to some degree. If active funds are able to provide a substantially better return than passive funds, even with management fees, people will migrate back to them.

> it'll take care of itself over time

At least until it doesn't. If this spacex venture succeeds because it got propped up by index funds, then that's a decent indicator that more will follow.

It stands to reason that active investing will be more valuable as a result


Yes. That’s what passive investing is. You give money to the passive fund, the passive fund buys the market. No regard to price or any other metric.

> If a one wants IP and rule of law (incl contracts) to be respected, one should not violate others rights when it is convenient.

Yes that's what should be said to OpenAI. Now they should not cry about their T&Cs not being respected when they never cared about others' copyrights.


I think a lot of us would be fine for AA to be a for-profit enterprise earning money from donations and deals with companies. The service it provides is invaluable - free and DRM-free access to millions of titles in the world.


I have only worked in startups and I have been an early engineer in both of them. I would always get high privileges within a short time where I would have the access to create and delete resources. I don't think it's that uncommon.


I would never have these privileges granted directly to my account.

Indeed it’s a good practice to use roles where supported (AWS has them) and explicitly switch when needed


The problem with agents is they regularly sidestep the guardrails and do what they want with a script anyway. The number of times I’ve seen Claude try to escape the folder it’s working in, and then for it to write a python script that does exactly what I told it it’s not allowed do supports that.

If you use SSO and have an AWS config that Claude is allowed to see to get the correct role in the first place, it will just pick the role and plough on anyway.


And this is why it is the height of irresponsibility to run LLMs on your system. We know they are unreliable and just make things up; it's extremely foolish to go "yeah I'm going to let that run commands".


It's not _really_ any different to running an undocumented third party binary. Is it the height of irresponsibility to run Windows, or VSCode, or Spotify?

I think the model we've got now is wrong, and the harnesses should be OS-level sandboxed, and the agents should be running in harness managed sandboxes.


But the correct way to do it is to have a separate account with more privileges, and only give AI access to your standard developer account


I have personally seen AI bypass this multiple times.


Sounds like they're still giving the model the keys to the kingdom, which is my point, stop giving the model the avenue to do catastrophic mistakes, it makes no sense.


If you’re message is in response to me, which I think it is, I deliberately don’t give access to credentials and env variables. I’ve worked to create restrictions and seen AI models use very interesting methods to bypass them.

Even now my prompt says the AI must verify the path of the files it intends to edit, and get permission before editing one file at a time and only after permission. I stop it from ignoring those rules once a day at least.


This is not privilege separation/sandboxing. Separate virtual machine for an agent with limited credentials is reasonably safe approach


I built www.propelcode.app with separate Linux containers, unless you disconnect the container and your computer from the internet the models can escape the sandbox and get information off of your machine.

I am open to being corrected and learning from you if you have a better method of sandboxing


The best way to use LLMs is via tmux where it's running on a disposable VM. 0 chance of it getting information from your local machine.


I am using tmux but not disposable vm. I have thought about something like that but honestly some of the debugging work makes ephemeral environments hard to work with. How are you doing that in your workflow?


We kinda need to architect things with the assumption that all token-output from an LLM can be unpredictably sneaky and malicious.

Alas, humans suck at constant vigilance, we're built to avoid it whenever possible, so a "reverse centaur" future of "do what the AI says but only if you see it's good" is going to suck.


I built my own IDE to replace vscode / cursor so I could design the harness and ensure that the model tool access was secure and limited. But the rest of the industry is YOLO


That's one way to do it, how about backup to a remote location every hour? There's more than one way to be careful.


The first step I do when I do any meaningful side project is to set up rds with snapshots. So any startup that doesnt do this one basic step already deserves to fail in my opinion.

Then next I've used AI agents like crazy, we even have linked mcp servers that let it query on the dev database. Haven't seen it try deleting everything a single time. I haven't seen any agent try to do anything destructive. Ever. Perhaps its just reflecting an outrageously bad engineer and nothing else.


I too have felt the same around me. There is this lack of faith in the institutions now, feeling of distrust. Someone on HN called this the era of shamelessness and I kind of agree to it. The top has gotten shameless and the people at the bottom are trying to scrabble whatever they can to become one of them so that they can escape this hellhole that has been created.


Definitely the fish stinks from the head.

I'm also a bit confused about how the people on the top think this will play out.

A long time ago there was a french saying "noblesse oblige", or the german pendant "Wohlstand verpflichtet".


> I'm also a bit confused about how the people on the top think this will play out.

I don't know if they are really capable of thinking of the second and third order effects of what they're doing. There is something psychologically broken about many of the ultra-rich today where their behavior comes across as compulsive.

When you have a hole in your soul that can't be filled with a billion dollars, it simply can't be filled, and that black hole drives much of their behavior. You look at people like Trump and Musk, and they seem... miserable. Like, have you ever heard Trump have a genuine laugh of joy? Not the sort of sneering snicker of a bully, but one that comes from delight? Because I haven't.

We are all at the mercy of their actions, but it's almost like they're at the mercy of their irrational compulsions too.

Not that I'm saying they are deserving of sympathy or aren't responsible for their actions. But if we're looking for someone to pump the brakes on the crazy that's happening these days, it's sure as hell not going to be those hollow men.


I don't like being conspiratorial but it genuinely feels like the people at the top know some major catastrophe is coming and are just grabbing whatever resources they can while they can before retreating to their bunkers. Even the white house is trying to build a massive underground bunker using the ballroom on top as a excuse. I don't see why else they would all be willingly destroying society as they are right now unless they don't think it matters.


Everyone knows a major catastrophe is coming. Scientists have been talking about the tipping point for like five decades now.

It's a done deal, we were too stupid.


There's also a tool to automatically push it to multiple repos: https://github.com/prashantsengar/GitEcho

Disclaimer: the author is a colleague of mine

Though to be fair, what the parent meant by federated forges is different than this approach.



Curious to know what kind of problems you are talking about here


I don't want to give away too much due to anonymity reasons, but the problems are generally in the following areas (in order from hardest to easiest):

- One problem on using quantum mechanics and C*-algebra techniques for non-Markovian stochastic processes. The interchange between the physics and probability languages often trips the models up, so pretty much everything tends to fail here.

- Three problems in random matrix theory and free probability; these require strong combinatorial skills and a good understanding of novel definitions, requiring multiple papers for context.

- One problem in saddle-point approximation; I've just recently put together a manuscript for this one with a masters student, so it isn't trivial either, but does not require as much insight.

- One problem pertaining to bounds on integral probability metrics for time-series modelling.


Regarding the first problem: are you looking at NCP maps for non-Markovian processes given you mention C*-algebra? Or is it more of a continuous weak monitoring of a stochastic system that results in dynamics with memory effects?

I'd be very curious to know how any LLMs fare. I completely understand if you don't want to continue the discussion because of anonymity reasons.


More of the latter. It's a pet project of mine, and all of the LLMs tend to utterly fail at getting anywhere with it, at least in chats. In an agentic setup, it can chip away at some aspects, but it needs serious guidance on relevant language, notation, and concepts. To me, it demonstrates that the LLMs are not particularly good at crossing literatures, but then again, humans rarely seem to be good at that either...


By agentic do you mean that you run these models through an harness in the cli? If yes which one? Thanks for sharing


It would be wonderful to have a deeper insight, but I understand that you can disclose your identity (I understand that you work in applied research field, right ? )


Yes, I do mostly applied work, but I come from a background in pure probability so I sometimes dabble in the fundamental stuff when the mood strikes.

Happy to try to answer more specific questions if anyone has any, but yes, these are among my active research projects so there's only so much I can say.


Thanks a lot for your kind but detailed answer. I’m no more in the research field but you gave me good ideas to work on


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: