More

not2b · 2026-05-28T05:54:47 1779947687

Cool. So we benefit by prediction markets surfacing insider information about Trump's plans in the Iran conflict, and unknown insiders making hundreds of millions on that information with massive trades minutes before each announcement benefited the people watching prices in the oil market? That doesn't seem right.

not2b · 2026-05-28T05:27:34 1779946054

If the result is statistically significant, it just barely makes it. 84.8% isn't that much higher than 80.8% and they had only 250 prompts, if I'm reading this right.

tgv · 2026-05-28T05:35:29 1779946529

In a field where progress is measured in tenths of percent points, that's not true. Think of it this way: the error rate drops from 19% to 15%, or from 1 in 5 to 1 in 6.

danparsonson · 2026-05-28T11:28:24 1779967704

Statistical significance is about whether an effect can reliably be said to have been measured at all; it's not about whether or not the effect itself would be significant in the sense of moving some other needle.

The ~5% improvement reported here might just be an artefact of the data collection or random variation, rather than a consistent repeatable change.

tgv · 2026-05-28T17:52:24 1779990744

I know what significance means, and I also know that getting it from a p-value is nonsensical.

> The ~5% improvement reported here might just be an artefact of the data collection or random variation, rather than a consistent repeatable change.

You're questioning method or data representativeness, not significance. 250 samples is just about enough to for a 5% difference in NHST (stddev is around .4, so 1.64 sigma is .4/15.8*1.64=0.04 for single sided testing).

not2b · 2026-05-29T02:19:38 1780021178

Yes, it looks just barely significant. Results that are on the edge like that often aren't reproducible.

not2b · 2026-05-13T18:22:56 1778696576

Rather, several missing, useful APIs that were hard to emulate efficiently have been added. That's not "Windows in the Linux kernel".

MisterTea · 2026-05-14T03:44:03 1778730243

> several missing, useful APIs

Windows API's.

> That's not "Windows in the Linux kernel".

How is that not?

not2b · 2026-05-12T00:55:10 1778547310

That would matter if we were asking the AI to generate code open-loop: someone probably already wrote something close to what you asked for in Python. But if the agent generates code, tries to compile it, sees the detailed error messages and acts on those messages to refine the code, it's going to produce a higher quality result. rustc produces really good diagnostics. And there's a lot of Rust code online now, even if there's so much more Python and Javascript/Typescript.

ambicapter · 2026-05-12T01:25:06 1778549106

LLMs don't actually semantically parse the error messages. They will generate the most likely sequence resulting from the error message based on their training data, so you're back to the training data argument.

not2b · 2026-05-12T05:22:06 1778563326

They process those error messages in the same way that they process your instructions about what code to generate. It is just more commands.

neutronicus · 2026-05-12T01:29:44 1778549384

Perhaps the training data about what compiler diagnostics mean is particularly semantically rich training data.

Tarq0n · 2026-05-12T04:03:36 1778558616

Of course they do, error messages get tokenized and put into the context window just like anything else. This isn't a Markov chain.

hansvm · 2026-05-12T05:41:20 1778564480

Except the presence of errors, mistakes, contradictions, and doubling-back causes LLMs to have substantially worse output, especially without dedicated sub-agents who have been instructed about that deficiency and know to process that kind of crap into better prompts to insert into a different LLM with pristine, error-free context. Without hard numbers we're both just pissing into the wind, but it's entirely plausible that the higher rate of errors matters more than the fact that those errors are more ergonomic. Anecdotally, my LLM work is a _lot_ more productive when I have it draft the thing in Python and translate it into Rust since it wastes so much time on the tiniest of syntactic mistakes.

not2b · 2026-05-05T02:37:31 1777948651

The strongest evidence that something like MOND isn't the answer is that in some galaxy collisions, the visible matter and the dark matter appear to separate: the collision disrupts the visible matter and the dark matter appears to pass right through, uninterrupted, and we see galaxy remnants that look like they don't have dark matter. If MOND or some other modification of gravity were the answer we'd never see this kind of sorting.

See, for example, https://www.caltech.edu/about/news/dark-matter-flies-ahead-o...

not2b · 2026-04-30T23:50:56 1777593056

If by some miracle someone managed to create this, and a critical mass of people somehow discovered it and used it, at some point they'd burn out, sell it, and it would turn into the same shit that we see everywhere else.

wizardforhire · 2026-04-30T23:56:02 1777593362

Not if you organize it as a non-profit with stated purpose that explicitly address exactly that… and is run as a public service for the public good.

stephenhuey · 2026-05-01T00:14:44 1777594484

Might have better success with a Public Benefit Corporation instead of a nonprofit. I’ve considered starting some myself.

stack_framer · 2026-05-01T00:18:52 1777594732

Now do OpenAI...

not2b · 2026-04-30T17:13:00 1777569180

That is why egcs was launched, to get around the inability of the old team to do gcc releases. The issues had little to do with ideology and were about fixing a broken process and replacing it with something that had a hope of working.

not2b · 2026-04-14T18:52:45 1776192765

I looked at it and it is impressively lightweight. It would help if it could collapse duplicate notifications, right now the notifications page is filled with repeats even though I'm not all that popular on fedi.

not2b · 2026-04-14T05:07:37 1776143257

If the navigation simulates what would happen if we follow links to SPA#pos1, SPA#pos2, etc so that if I do two clicks within the SPA, and then hit Back three times I'm back to whatever link I followed to get to the SPA, I guess it's OK and follows user expectations. But if it is used as an excuse to trap the user in the SPA unless they kill the tab, not OK.

bonesss · 2026-04-14T05:20:12 1776144012

From the browsers perspective those are the same thing though. It’s a paradigm boundary.

The real answer is to have desktop applications that work like applications (buttons do what feels right), and websites that work like websites.

SPA, is a page application. Pages aren’t applications, applications aren’t pages. AutoCAD is an app, the Robotech Encyclopedia is content.

lxgr · 2026-04-14T11:52:03 1776167523

> From the browsers perspective those are the same thing though.

If the browser only allows adding at most one history item per click, I should be able to go back to where I entered a given site with at most that many back button clicks.

At a first glance, this doesn't seem crazy hard to implement? I'm probably missing some edge cases, though.

mock-possum · 2026-04-14T05:12:00 1776143520

Of course, but programmatically, how do you enforce that?

JoshTriplett · 2026-04-14T05:24:24 1776144264

Some browser APIs (such as playing video) are locked behind a user interaction. Do the same for the history API: make it so you can't add any items to history until the user clicks a link, and then you can only add one.

That's not perfect, and it could still be abused, but it might prevent the most common abuses.

EDIT: apparently Chrome tried that and it wasn't sufficient: https://qht.co/item?id=47761349

not2b · 2026-03-31T04:12:24 1774930344

Clearview again. ICE is using it too, and their people think it is an oracle that is always correct, so that when someone shows a passport card or a RealID showing that they are someone else, a US citizen or permanent resident, they are usually accused of having a fake ID. It's a flawed tool and it misidentifies people sometimes.