More

krisbolton · 2026-05-29T13:35:13 1780061713

It'll be a legal thing. You're reporting on behalf of yourself / a legal entity, so another system or entity can't say for you. I get it, but it really is a waste of time.

krisbolton · 2026-05-26T06:37:19 1779777439

I think Apple became much better at security in recent years. One example which I think is indicative of their approach to security - they bothered to add a hardware microphone disconnect when a macbook is closed. Source: https://support.apple.com/en-gb/guide/security/secbbd20b00b/...

krisbolton · 2026-05-22T20:12:23 1779480743

This is different though right? He found one (? we don't know who you're referring to - post sources for a higher quality discussion) vulnerability, he already knew it was there, etc. Anthropic didn't claim no other model can find vulnerabilities, nor that it's impossible with smaller models. They're claiming Mythos is a step-change in ability for end-to-end vulnerability discover and exploit creation. And that other frontier models are close behind.

krisbolton · 2026-05-22T20:06:35 1779480395

There is independent research out there on frontier model security capability. AI Security Institute (UK) put out their paper comparing Mythos to other frontier models in early April. They've been tracking frontier model security capability since early 2023, so it's a decent dataset. https://www.aisi.gov.uk/blog/our-evaluation-of-claude-mythos...

krisbolton · 2026-05-17T14:40:20 1779028820

I read it as the author is / was going through the vulnerability disclosure process with Microsoft and they're annoyed for unclear reasons and decided to publicly disclose, rather than being an insider.

mr_mitm · 2026-05-17T15:16:54 1779031014

How would that leave them homeless?

866-RON-0-FEZ · 2026-05-17T17:42:35 1779039755

Many brilliant people have serious mental health issues that preclude their ability to regulate their emotions and act maturely in serious situations e.g. responsible vulnerability disclosure.

I've watched genius-level IQ people get fired time and again because they don't know how to work with others at a basic kindergarten level.

wolvoleo · 2026-05-17T19:01:35 1779044495

To be honest if I got fired in a mean or unfair way I'd definitely hit back at my employer in such a manner if I'd have the ability to. I'm unlikely to have that though as I'm not aware of any saucy company secrets. But if this is what happened I think it's pretty justified.

The secret here seems to be that Microsoft caches the key somewhere even when it's supposed to be only in the TPM! That's a pretty big revelation IMO.

mananaysiempre · 2026-05-17T19:24:50 1779045890

> The secret here seems to be that Microsoft caches the key somewhere even when it's supposed to be only in the TPM!

Not what happened here (I reserve my judgment wrt the promised TPM+PIN exploit).

In the default TPM-only mode of BitLocker, the secret is in fact in the TPM, which will (as instructed by Windows upon key creation) release it to the correct OS running on the correct computer. Notably not in the picture is any user-provided data: measured boot is the only protection. It is only the correct programming of the OS that makes it request an account password (completely unrelated to the disk-encryption cryptography) before letting the user poke at the disk, which the OS can at that point already decrypt.

Well, turns out the programming is such that if you ask politely it’ll just pop an Administrator(?) shell.

wolvoleo · 2026-05-17T20:37:09 1779050229

> Not what happened here (I reserve my judgment wrt the promised TPM+PIN exploit).

Yes this is the one I'm referring to.

I have noticed it myself, it has happened to me that my system rebooted to install updates and it did not pass through the blue TPM pin entry screen at that point. That was a big red flag for me. A normal reboot always does that, even a 'hot' reboot.

anonymars · 2026-05-18T02:47:21 1779072441

Bitlocker can be suspended, and will be unprotected until the next reboot. Then it will resume (and presumably re-lock to the current state)

A good or corporate BIOS/etc. updater will do this to avoid requiring a recovery at the next boot

ranger_danger · 2026-05-18T04:49:34 1779079774

> Bitlocker can be suspended

But the files on the disk must still be decrypted somehow. The key must be stored somewhere.

According to this: https://windowsforum.com/threads/pause-bitlocker-before-bios...

> BitLocker is now suspended, which means the drive remains encrypted, but Windows temporarily stores the unlock information so firmware changes won’t immediately trigger recovery.

mananaysiempre · 2026-05-17T20:56:37 1779051397

> A normal reboot always [forces the TPM pin entry screen], even a 'hot' reboot.

In TPM-only mode, I only see the screen—which asks for an recovery key that serves an alternative to the TPM-borne secret, not for whatever you are calling the “TPM PIN” here—whenever I update the firmware or the bootloader (the latter from the other side of the dual-boot setup). Otherwise it boots straight to the login screen, which meshes with the measured-boot-only theory of operation I’ve described above. There’s nothing nefarious in this part, even if I think it exposes an unwisely large attack surface (e.g. the USB stack). I suspect you simply reboot so rarely you’re never hitting the happy path.

wolvoleo · 2026-05-18T04:39:47 1779079187

No I have the explicit PIN turned on. That means it requires a Pin entry on each boot. It's not the recovery screen though it looks similar. It's also not a password that's then hashed. It unlocks the TPM with a short pin, the number of attempts is limited by the TPM itself so that it doesn't get brute forced.

This is not a standard option, I think it can only be set through a group policy.

jrflowers · 2026-05-17T22:32:24 1779057144

> To be honest if I got fired in a mean or unfair way I'd definitely hit back at my employer in such a manner if I'd have the ability to.

I knew a contractor that developed a habit of not paying his workers for a short time. After people started walking off job sites with his tools and showing up at his house demanding to get paid, he magically found the money to pay them.

It’s pretty unsurprising how vindictive regular people rapidly become when they’ve been ripped off.

txrx0000 · 2026-05-17T20:04:51 1779048291

Reporting wrongdoing to the ones doing it doesn't work. Perhaps they relied on Microsoft a bit too much for their livelihood and are just beginning to reevaluate their decisions. It's not so rare for brilliant people to live a life of the mind and not pay enough attention to their material conditions. But defining that as "serious mental health issues" is such a cheap shot.

866-RON-0-FEZ · 2026-05-17T20:09:51 1779048591

> Reporting wrongdoing to the ones doing it doesn't work.

Most large companies — including Microsoft [1] — have an internal affairs call center where you can anonymously report issues of malfeasance — assuming that's what happened here.

[1] https://www.microsoft.com/en-us/legal/compliance/sbc/report-...

gusfoo · 2026-05-17T18:07:45 1779041265

There is, sadly, no place for non-standard ICs in corpos nowadays. HR will enforce that.

david-gpu · 2026-05-17T18:14:13 1779041653

Emotionally immature people tend to be a liability, not an asset. Therapy can help, but they first need a willingness to do better.

wolvoleo · 2026-05-17T19:14:42 1779045282

Yeah I'm getting a lot of pressure to be a "team player" lately. I've told them over and over I'm not capable of that and that has never been a problem before. But we have a hipster new VP who is really pushy and wants to generalise everything.

stackghost · 2026-05-17T19:35:51 1779046551

If you worked for me and you said you're not capable of being part of a team I'd immediately start looking to replace you.

You might be a 100x rockstar developer. You might even be the best software engineer in the world.

But the vast majority of good software is built by teams of people. It doesn't matter how good you are if you can't play nice with others.

I'd rather have a team of "merely" good engineers than one "rockstar" creating a toxic work culture. Fuck that noise.

bayindirh · 2026-05-17T20:09:48 1779048588

"Not being a team player" doesn't mean the person is a nuisance, but they can be an introvert who has a limited interaction budget and can work silently and efficiently otherwise.

This generally means the person might not leave their cubicle much or give feedback frequent enough, but this doesn't mean they are not motivated to help others or share knowledge. One can approach and ask a question and get tons of help immediately.

How I know? That's me. I look like a cave dweller from a distance, but I'm not. The only difference I have is human interaction sometimes drains me a lot, so I just concentrate and work, yet everybody get their help immediately if they need them.

Also, no, I don't bite or belittle people. On the contrary.

Assuming the worst in others is bad. If I worked with you, I'd be looking for somewhere else the moment I found out how you think about me.

Remember. People don't leave bad jobs, but bad managers.

stackghost · 2026-05-18T04:51:53 1779079913

I have worked with lots of introverts and my empirical observation is that the introversion/extraversion axis is completely orthogonal to whether or not someone can be a team player.

array_key_first · 2026-05-17T20:13:29 1779048809

You require both team players and "rockstar" individuals. It's not one or the other or a competition, because they do different things.

Yes if you put a someone who can't work on a team on a team and expect team work then that will not work. But that's obvious, so then don't do that. Expecting a homogeneous workforce isn't realistic or optimal.

stackghost · 2026-05-18T03:06:51 1779073611

>You require both team players and "rockstar" individuals.

Hard disagree.

Some of the best, most successful, and most impactful projects I've ever been on had no "rockstars" at all.

People who refer to themselves as rockstars is a huge red flag.

wolvoleo · 2026-05-17T20:28:24 1779049704

I'm not a software engineer at all. And I tend to take on projects nobody else wants because they are too complicated or esoteric.

And I didn't say I'm not capable of being part of a team. Just that I need to have my own responsibilities within a team. I can't deal with micromanagement or excessive coordination like 'standups' every day.

account42 · 2026-05-20T10:09:05 1779271745

In other words, you want replaceable cogs rather than human beings.

gremlinunderway · 2026-05-17T20:18:33 1779049113

Yeah you've completely misread this. The phrase "not being a team player" is a euphemism for someone not willing to do dubiously unethical or illegal (or things that go against internal company policy) things in support of a low level supervisor or manager's wishes. Or more favourably, someone who's unwilling to do things outside of what he's actually paid for or to do things unpaid (or outside working hours etc.). Also known as wage theft.

The guy saying that he has been accused of "not being a team player" isn't literally quoting his management here. He's summarizing that his immediate supervisors don't like him because he's unwilling to enter in some patronage like relationship with them.

The fact that you gave the benefit of the doubt to some faceless employer here instead of an actual person recounting his experiences is really sad and maybe ought to be reason for you to rethink your biases to jump to the conclusion that this guy is a toxic loner. Sounds like you're projecting hard here from some other experience.

wolvoleo · 2026-05-17T20:31:56 1779049916

That is also a thing yeah. It's not really unethical or illegal but our VP has a huge preference for snazzy glitzy projects and never wants to tackle the problems that cause real pain in the organisation because they are not spectacular and don't make him look good. And yes I bring that up whenever it comes into play. I'm definitely not an order-follower.

coderjames · 2026-05-17T19:56:21 1779047781

> I've told them over and over I'm not capable of that

I can relate and empathize. And also provide this suggestion based on my own similar experience: if you can't provide evidence (e.g. doctor's diagnosis) that you are "special" or "not capable of that", then they don't have to care and will take steps to force you out. I wish you all the best.

wolvoleo · 2026-05-17T20:46:52 1779050812

Here in Europe it's different, we have more rights. Unfortunately I don't have an official diagnosis but I'm definitely neurodivergent. I've been meaning to get one but it is difficult.

abawany · 2026-05-17T20:34:48 1779050088

I was once (12 years ago) told: "they debate, they decide, we deliver" along with other "teamwork" pablum. This evil has been with us for a very long time, unfortunately.

gpvos · 2026-05-17T19:55:05 1779047705

IC = Independent contractor (I assume?)

fg137 · 2026-05-17T20:09:14 1779048554

https://www.indeed.com/career-advice/finding-a-job/what-is-a...

WaitWaitWha · 2026-05-17T20:10:42 1779048642

individual contributor. Someone who has no one reporting to them.

greekrich92 · 2026-05-17T20:04:47 1779048287

Individual contributor i.e., non-management

hatsix · 2026-05-17T19:31:13 1779046273

Nonsense. there are way more accommodations for people who wouldn't have had a place 20 years ago... those accommodations have changed what a "standard IC" is. There never was a place for run-of-the-mill geniuses who couldn't be bothered to spend a few hours researching P2P (Person to Person) protocols. They were always pushed off to small companies where the risk was much lower. This hasn't, won't, and shouldn't change. If that makes you salty, I got some things I'd recommend you research.

jrflowers · 2026-05-17T19:19:44 1779045584

Adults pay rent in money, not feelings. The answer to “how could Microsoft leave you homeless?” is “by not paying you”, not some bizarre “by making you feel so bad you lose your house, which you pay for with good feelings”

BoorishBears · 2026-05-17T18:14:34 1779041674

This is an oddly passive-aggressive comment when a much more likely read is they were relying on the funding and the large tech company did what large tech companies do and started moving slowly.

And I can see others already blaming them for relying on the vulnerability for living expenses, but if we can hold the hyper-rationalization for a second, we shouldn't be against the person who expected an organization with more money than God to uphold a deal for relative peanuts, right?

Like yes we all get that large orgs make spending $5 very hard, many claps for being the in-group, but their frustration would be understandable.

866-RON-0-FEZ · 2026-05-17T18:22:00 1779042120

I'm supposed to feel bad that Microsoft didn't immediately wire him an advance on the bounty before validating anything? Have you ever tried to get anything corrected with a corporate payroll department? Try three months minimum.

It's like suggesting someone was relying on a lottery ticket to payout to survive.

BoorishBears · 2026-05-17T18:33:53 1779042833

I tried to be as coddling with my language as possible.

Acknowledged how orgs work, separated blaming the org from sympathizing with their reaction, tried to separate the prudence of their actions from the sticky situation they'd still be left in by the orgs actions...

But it was for naught: people are really ingrained in a weird "might-makes-right" model of corporate operations. "Larry Ellison is a lawnmower" was supposed to be a jeremiad but now it's more like a guiding principle that we browbeat anyone for questioning.

array_key_first · 2026-05-17T20:15:08 1779048908

Yes and that's bad. Saying it's bad doesn't make it not-bad, it just makes it still bad but now we know it's bad.

antonvs · 2026-05-17T19:49:49 1779047389

> we shouldn't be against the person who expected an organization with more money than God to uphold a deal for relative peanuts, right?

You're assuming that there was a deal that wasn't upheld. I don't think we have enough information to assess that. This person's blog posts do read as being somewhat unstable. There's even someone in the comments seemingly genuinely trying to be helpful: "Just wondering if you’re BiPolar (like me) and see a different reality than what is real. Been there."

allset_ · 2026-05-17T15:30:57 1779031857

Presumably, not paying out for these bugs which often take weeks of research to find.

mr_mitm · 2026-05-17T15:43:11 1779032591

Who in their right mind bets on bug bounties to cover their basic needs? They should be highly employable with these kind of skills.

michaelt · 2026-05-17T16:11:45 1779034305

> Who in their right mind bets on bug bounties to cover their basic needs?

Someone with a vulnerability worth as much as a two bedroom apartment?

brudgers · 2026-05-17T17:35:00 1779039300

If you take the statement at face value, that does not appear to be the case. If you don’t take it at face value, the underlying presumptions might be a lot of why they may not be employable.

etchalon · 2026-05-17T16:26:55 1779035215

Someone who doesn't have better options?

cortesoft · 2026-05-17T16:35:27 1779035727

If you have those sorts of skills with a computer, you will have other options

0x3f · 2026-05-17T16:42:51 1779036171

Really depends on your background doesn't it? You could have convictions, be sanctioned, have visa problems, or all kinds of things that are not easily solvable.

qingcharles · 2026-05-17T17:14:05 1779038045

Indeed, and this guy's personality seems a little "difficult" which might make the interview process short. I've known people with insane skills who have such weird personalities that they never get hired. Doing remote bug bounty stuff is a blessing for them.

squigz · 2026-05-17T17:08:31 1779037711

To say nothing of mental health issues.

brudgers · 2026-05-17T17:36:27 1779039387

Or poverty. Or addiction.

Or that entire holy trinity.

mfro · 2026-05-17T16:47:29 1779036449

Please let me know when finding a job in software engineering in 2026 is feasible for everyone with ‘computer skills’.

echoangle · 2026-05-17T16:56:54 1779037014

The guy doesn’t just have „computer skills“ if he found this.

formerly_proven · 2026-05-17T17:43:41 1779039821

Good luck convincing a HR automaton not looking at your resume for the job unposting of that.

echoangle · 2026-05-17T17:54:53 1779040493

Come on, with these skills you could convince someone to give you a job if you’re on the streets otherwise. You might not be a senior engineer in the exact thing you want but you won’t be on the streets.

pocksuppet · 2026-05-17T19:50:33 1779047433

It's not about your skills. It's about how well you can play the HR metagame. This inversely correlates with actual job skills.

gpvos · 2026-05-17T20:00:53 1779048053

Convincing someone, especially an HR person, has very little to do with computer skills.

GolfPopper · 2026-05-17T17:15:07 1779038107

Good with computers and good with people/job search/finances are not the same thing, and are often inversely correlated.

866-RON-0-FEZ · 2026-05-17T18:18:17 1779041897

King Terry was living proof this is not true.

super256 · 2026-05-17T19:42:12 1779046932

Oh hell, no. Does anyone remember Sandboxescaper/Polarbear? Very skilled researcher, but also crashouts and mental problems.

Had a job at MSFT once, but is now struggling to earn money at all and is posting heart breaking stuff on Twitter. https://x.com/WeirdQuadratic

Hope she finds a way out and a more stable and fun job in the future.

MrDarcy · 2026-05-17T17:07:51 1779037671

Then you pay him since you see the value he’s creating so clearly.

cortesoft · 2026-05-17T18:11:40 1779041500

This is a strange argument. I don't have the capital, desire, or skills to employee this guy, or anyone really.

Me not hiring someone doesn't mean the skills aren't valuable.

estimator7292 · 2026-05-17T17:00:29 1779037229

We are, quite notably, in a huge hiring crisis where vast numbers of programmers and researchers can't even get interviews. It really is not that simple

cowpig · 2026-05-17T16:05:01 1779033901

people with values different from yours, presumably

dpark · 2026-05-17T16:13:22 1779034402

This is one it those answers that seems on the surface like it contains insight but on closer inspection it’s vacuous.

This could be rewritten as “because they aren’t you”, which is true but not a meaningful or educational answer.

panflute · 2026-05-17T16:41:17 1779036077

Sure sounds like rhetorical questions or attacking the messenger. Someone can think the bounty industry is going to reward them for actually being exceptional and not look soon enough for other options then pivot to a stance that should give them some quick job offers. If I thought I found an intentional back door I would not engage with an embargo system from the same vendor but I am also not them.

dpark · 2026-05-17T16:58:31 1779037111

> Someone can think the bounty industry is going to reward them for actually being exceptional and not look soon enough for other options then pivot to a stance that should give them some quick job offers

Sure. And that’s a meaningful answer to the question.

“people with values different from yours, presumably” is a condescending nonanswer.

breppp · 2026-05-17T17:43:49 1779039829

This entire thread is generally weird.

If someone has this kind of exploit and can't get a bug bounty for it, and desperately needs the money, he can sell it for 100k+ in a shady black market

LastTrain · 2026-05-17T19:28:52 1779046132

It was about as meaningful as the question it was answering.

zingababba · 2026-05-17T19:27:47 1779046067

https://github.com/BigPolarBear1/The_story

I've been pretty convinced this is SandboxEscaper for awhile now.

krisbolton · 2026-05-16T13:47:40 1778939260

I don't think they "forgot" about processors. It was out of scope. Creating the pipeline to end up with a fully "sovereign" system end-to-end is a decades long process and hundreds of billions of euros. As others have pointed out, in this context "sovereign" meant data processing. This is also a fairly paranoid take. Not to say hardware isn't targeted, but there are other methods. So spending hundreds of billions and several decades to build the fabs to gain assurance... it's a waste of time.

Fnoord · 2026-05-16T13:59:55 1778939995

Exactly, it was out of scope. You cannot in one go have a full-blown scope. It won't work.

This is a sensationalist headline (CBA to RTFA). It isn't a case of all or nothing, it is about becoming less dependent. A country like China follows the same industry, and besides, in a globalist economy like ours we are dependant on each other. So, for example, a lot of hardware components come from China, and assembly happens there as well. That counts for EU (DE, FR, ...), US, CA, RU, UA, CN, IN, etc. But as the talk on 39c3 has shown [1]: we can DIY.

[1] https://media.ccc.de/v/39c3-in-house-electronics-manufacturi...

hinata08 · 2026-05-16T14:06:25 1778940385

France's Scaleway already offers RISC-V bare metal servers. It's a first step that brings most of the value with close to no cost, as RISC-V is cheap nowadays

This cloud provider is a for profit company, not a research institute, so they can see short term commercial value if they do it

krisbolton · 2026-05-15T21:51:15 1778881875

Did that risk materialise? I suppose it would be only the same as credit cards. With a valid warrant authorities can gain access to information. But that's within a legal system designed by an elected parliament. I'm more concerned about ensuring the legal powers are checked and balanced, and stay that way.

like_any_other · 2026-05-15T22:05:34 1778882734

Warrants aren't all you think they are (this is for the USA, but the UK is not exactly a beacon of liberty in comparison, so I doubt it's much better): https://web.archive.org/web/20140718122350/https://www.popeh...

> But that's within a legal system designed by an elected parliament.

Ah well if it's an elected government then the risk of it turning hostile to its people is zero, of course!

And ask "did that risk materialize?" to the people in China, or North Korea, or Russia, or Belarus, or Germany [1], or USA [2]. There are countless examples of the dangers of surveillance, in the present and in history - you don't need a specific example of exactly Oyster cards being used, to know they are a danger.

[1] https://www.theguardian.com/commentisfree/2025/apr/03/german...

[2] https://www.timesofisrael.com/us-administration-argues-it-ca...

jolmg · 2026-05-15T23:01:07 1778886067

> I suppose it would be only the same as credit cards.

The cards seem to accept cash

chadgpt3 · 2026-05-16T06:12:10 1778911930

He means the tracking potential is the same. Is the Oyster card anonymous?

tkocmathla · 2026-05-16T06:47:00 1778914020

Yes, they can be anonymous in the sense that you can buy one in person and top it up without an ID [1].

[1] https://tfl.gov.uk/fares/ways-to-pay/where-to-buy-tickets-an...

jolmg · 2026-05-17T22:34:16 1779057256

> He means the tracking potential is the same.

If you can buy and use with cash, then it isn't.

krisbolton · 2026-04-20T19:42:39 1776714159

They're referring to Pete Hegseth's decision to designate Anthropic a supply chain risk back in early May.

https://www.politico.com/news/2026/03/05/pentagon-tells-anth...

krisbolton · 2026-03-14T12:16:50 1773490610

The framing of A/B testing as a "silent experimentation on users" and invoking Meta is a little much. I don't believe A/B testing is an inherent evil, you need to get the test design right, and that would be better framing for the post imo. That being said, vastly reducing an LLMs effectiveness as part of an A/B test isn't acceptable which appears to be the case here.

SlinkyOnStairs · 2026-03-14T12:35:31 1773491731

> I don't believe A/B testing is an inherent evil, you need to get the test design right, and that would be better framing for the post imo.

I disagree in the case of LLMs.

AI already has a massive problem in reproducibility and reliability, and AI firms gleefully kick this problem down to the users. "Never trust it's output".

It's already enough of a pain in the ass to constrain these systems without the companies silently changing things around.

And this also pretty much ruins any attempt to research Claude Code's long term effectiveness in an organisation. Any negative result can now be thrown straight into the trash because of the chance Anthropic put you on the wrong side of an A/B test.

> That being said, vastly reducing an LLMs effectiveness as part of an A/B test isn't acceptable which appears to be the case here.

The open question here is whether or not they were doing similar things to their other products. Claude Code shitting out a bad function is annoying but should be caught in review.

People use LLMs for things like hiring. An undeclared A-B test there would be ethically horrendous and a legal nightmare for the client.

sfn42 · 2026-03-14T17:20:42 1773508842

Anyone who trusts LLMs to do anything has shit coming. You can not trust them. If you do, that's on you. I don't care if you want to trust it to manage hiring, you can't. If you do anyway then the ethical problems are squarely on you.

People keep complaining about LLMs taking jobs, meanwhile others complain that they can't take their jobs and here I am just using them as a useful tool more powerful than a simple search engine and it's great. No chance it'll replace me, but it sure helps me do ny job better and faster.

DoctorOetker · 2026-03-14T22:49:11 1773528551

Would you have a problem with the following scheme?

Every client is free and encouraged to feed back its financial health: profit for that hour/day/month/...

The AB(-X) test run by the LLM provider uses the correlation of a client's profit with its AB(-X) test, so that participating with the testing improves your profit statistically speaking (sometimes up sometimes down, but on average up).

You may say, what about that hiring decision? One thing is certain: when companies make more profit they are more likely to seek and accept more employees.

986aignan · 2026-03-15T12:49:13 1773578953

That sounds like a good way to get extreme short-term optimization.

Say a particular finetune prioritizes profits right now and makes recommendations like "cut down on maintenance, you can make up for it later with your increased profits and their interest". It produces more profits, and wins the AB test. Later the chickens come home to roost.

You can reduce the problem by using long-term indicators, but then each AB test is very slow.

londons_explore · 2026-03-14T13:13:37 1773494017

I think you would be hard pushed to find any big tech company which doesn't do some kind of A B testing. It's pretty much required if you want to build a great product.

wavefunction · 2026-03-14T14:36:49 1773499009

A responsible company develops an informed user group they can test new changes with and receive direct feedback they can take action on.

londons_explore · 2026-03-14T16:32:18 1773505938

A big tech company has ~10k experiments running at once. Some engineers will be kicking off a few experiments every day. Some will be minor things like font sizes or wording of buttons, whilst others will be entirely new features or changes in rules.

Focus groups have their place, but cannot collect nearly the same scale of information.

rkomorn · 2026-03-14T16:39:54 1773506394

I think a lot of people (myself included) would just like to not be constantly part of some sort of revenue optimization effort.

I don't care, at all, about the "scale of information" for the company's sake.

londons_explore · 2026-03-14T20:38:32 1773520712

Often the experiments are not for revenue - many of them will be optimizing user experience metrics - ie. Load time or user dropoff rate.

They are clearly good for both user satisfaction and the companies bottom line.

jjj123 · 2026-03-14T21:35:56 1773524156

As someone who works in these orgs, only a small fraction are about user experience metrics. 90+% are extracting more short term value with unknown second order effects on usability.

wavefunction · 2026-03-16T15:14:14 1773674054

Big tech companies are not serving their "users" but advertisers, it's a common mistake.

franktankbank · 2026-03-14T19:48:07 1773517687

If you have 10k experiments running then you are probably p-hacking.

embedding-shape · 2026-03-14T13:33:12 1773495192

Yeah, that's why we didn't have anything anyone could possibly consider as a "great product" until A/B testing existed as a methodology.

Or, you could, you know, try to understand your users without experimenting on them, like countless of others have managed to do before, and still shipped "great products".

ryandrake · 2026-03-14T22:17:35 1773526655

I know this is a salty take, but reliance on A/B testing to design products is indicative of product deciders who don't know what they are doing and don't know what their product should be. It's like a chef saying, I want to make a pancake, but trying 50 different combinations of ingredients until one of them ends up being a pancake. If you have to test whether a product works / is good / is profitable, then you didn't know what you were doing in the first place.

Using A/B tests to safely deploy and test bug fixes and change requests? Totally different story.

coldtea · 2026-03-14T13:49:23 1773496163

A/B testing is the child of profit maximization, engagement farming, and enshittification. Not of "great product building".

steve-atx-7600 · 2026-03-14T13:17:57 1773494277

Long term effectiveness? LLMs are such a fast moving target. Suppose anthropic reached out to you and gave you a model id you could pin down for the next year to freeze any a/b tests. Would you really want that? Next month a new model could be released to everyone else - or by a competitor - that’s a big step difference in performance in tasks you care about. You’d rather be on your own path learning about the state of the world that doesn’t exist anymore? nov-ish 2025 and after, for example, seemed like software engineering changed forever because of improvements in opus.

coldtea · 2026-03-14T13:47:29 1773496049

>Suppose anthropic reached out to you and gave you a model id you could pin down for the next year to freeze any a/b tests. Would you really want that?

Where can I sign up?

steve-atx-7600 · 2026-03-14T13:25:29 1773494729

If you really want to keep non-determinism down, you could try (1) see if you can fix the installed version of the clause code client app (I haven’t looked into the details to prevent auto-updating..because bleeding edge person) and (2) you can pin to a specific model version which you think would have to reduce a/b test exposure to some extent https://support.claude.com/en/articles/11940350-claude-code-...

Edit: how to disable auto updates of the client app https://code.claude.com/docs/en/setup#disable-auto-updates

maleldil · 2026-03-15T03:26:45 1773545205

> Suppose anthropic reached out to you and gave you a model id you could pin down for the next year to freeze any a/b tests. Would you really want that?

Yes. I'd like some guarantee that my results are reproducible for some reasonable amount of time. New versions can also introduce regressions. A prompt that works well with today's model might not work with tomorrow's, even if the latter is "better".

garciasn · 2026-03-14T12:50:23 1773492623

> And this also pretty much ruins any attempt to research Claude Code's long term effectiveness in an organisation. Any negative result can now be thrown straight into the trash because of the chance Anthropic put you on the wrong side of an A/B test.

LLMs are non-deterministic anyway, as you note above with your comment on the 'reproducibility' issue. So; any sort of research into CC's long-term effectiveness would already have taken into account that you can run it 15x in a row and get a different response every time.

johnisgood · 2026-03-14T12:55:26 1773492926

Then do not use LLMs for hiring, or use a specific LLM, or self-host your own!

airza · 2026-03-14T13:27:29 1773494849

Isn’t the horrendous ethical and legal decision delegating your hiring process to a black box?

vova_hn2 · 2026-03-14T13:37:56 1773495476

> ethical and legal decision

These are two very different things. I suspect that in some cases pointing finger at a black box instead of actually explaining your decisions can actually shield you from legal liability...

paulryanrogers · 2026-03-14T13:41:45 1773495705

For some proponents, AI is liability washing

raw_anon_1111 · 2026-03-14T13:00:03 1773493203

Would you rather they change things for everyone at once without testing?

aeinbu · 2026-03-14T13:40:37 1773495637

That is not the only other alternative.

You can do A/B testing splitting up your audience in groups, having some audience use A, and others use B - all the time.

I think the article’s author is frustrated over sometimes getting A and at other times B, and not knowing when he is on either.

simianwords · 2026-03-14T14:01:25 1773496885

Strange! You benefitted from all the previous a/b experiments to give you a somewhat optimal model now. But now it’s too inconvenient for you?

plussed_reader · 2026-03-14T14:13:33 1773497613

Informed consent for a paying user is inconvenient?

doc_ick · 2026-03-14T14:49:15 1773499755

Did you read the TOC?

mrgoldenbrown · 2026-03-14T19:29:04 1773516544

Hiding something like this in the TOC rather than explicitly asking users to opt in is a dark pattern. You can't gain the moral highground by cackling that someone should have read the fine print.

doc_ick · 2026-03-15T13:30:59 1773581459

This is technology, some politics, capitalism, and math that is trained on curiously gained data, where does non-selfish morality come in?

MadnessASAP · 2026-03-14T20:43:29 1773521009

All TOS essentially boil down to "we owe you nothing and can change the product at anytime to anything we want at our sole discretion"

Obviously it would be unreasonable to accept such terms without further context. The further context in this case being that Anthropic will maintain Claude as an AI agent and seek to improve it's performance. What is at the heart of this issue is whether or not Anthropics recent A/B testing violated that context. Not whether or not they violated the TOS (they didn't, obviously)

doc_ick · 2026-03-15T13:29:05 1773581345

Ultimately that just sounds like within their own TOC, they were just working on getting the best operational results.

If you wanted something more deterministic write it yourself or get it verified, all hosted llms as far as know does neither.

plussed_reader · 2026-03-17T03:13:37 1773717217

I read the article saying they were testing service changes on paying users without knowledge or explicit consent of the user, that the user had to test and determine why they perceived their sevice changed.

That is a dark pattern to inflict on users expecting consistent output.

jdbernard · 2026-03-14T14:56:29 1773500189

Does anyone?

xg15 · 2026-03-14T19:57:26 1773518246

We actually didn't, if much of that A/B testing was to find the optimally "engagement maximizing" i.e. maximally addictive UI design.

ramoz · 2026-03-14T12:18:58 1773490738

I apologize for doing this - and I agree. I will revise

s3p · 2026-03-14T14:38:38 1773499118

I still think you have a point here. Doing this kind of testing on users unwittingly is unethical in my opinion

everdrive · 2026-03-14T13:00:34 1773493234

>I don't believe A/B testing is an inherent evil,

Evil might be a stretch, but I really hate A/B testing. Some feature or UI component you relied on is now different, with no warning, and you ask a coworker about it, and they have no idea what you're talking about.

Usually, the change is for the worse, but gets implemented anyway. I'm sure the teams responsible have "objective" "data" which "proves" it's the right direction, but the reality of it is often the opposite.

cosmic_cheese · 2026-03-14T13:54:46 1773496486

> I'm sure the teams responsible have "objective" "data" which "proves" it's the right direction, but the reality of it is often the opposite.

In my experience all manner of analytics data frequently gets misused to support whatever narrative the product manager wants it to support.

With enough massaging you can make “objective” numbers say anything, especially if you do underhanded things like bury a previously popular feature three modals deep or put it behind a flag. “Oh would you look at that, nobody uses this feature any more! Must be safe to remove it.”

hollow-moe · 2026-03-14T13:28:49 1773494929

Tech companies really have issues with "informed and conscious consent" doesn't they

mschuster91 · 2026-03-14T13:05:25 1773493525

> The framing of A/B testing as a "silent experimentation on users" and invoking Meta is a little much.

No. Users aren't free test guinea pigs. A/B testing cannot be done ethically unless you actively point out to users that they are being A/B tested and offering the users a way to opt out, but that in turn ruins a large part of the promise behind A/B tests.

bcrl · 2026-03-14T17:35:53 1773509753

Please name a computer science program that has an ethics component.

Yes, I wish software developers were more like actual engineers in this regard.

gnabgib · 2026-03-14T17:37:08 1773509828

All Computer Engineering & Systems Engineering programs in Canada require two ethics components (once at graduation, once at P.Eng)

ryandrake · 2026-03-14T22:14:35 1773526475

Sadly, in the USA, I believe most engineering ethics classes are optional electives, and it shows when you look at the graduating student body today.

saltcured · 2026-03-14T17:41:38 1773510098

Yeah, and if you don't already have an IRB, your organization probably isn't ready to be doing such things responsibly...

mschuster91 · 2026-03-15T00:58:23 1773536303

Meta has had an IRB for well over a decade (following a scandal where they used their users as lab rats) and that didn't stop them from doing any of the BS they did ever since.

tomalbrc · 2026-03-14T12:23:37 1773491017

Would love to know why you would consider invoking Meta “a little much”. Sounds more than appropriate.

krisbolton · 2026-03-14T12:59:52 1773493192

Not to start an internet argument -- I don't think it is appropriate in this context. A/B testing the features of a web app is not unexpected or unethical. So invoking the memory of cambridge analytica (etc) is disproportionate. It's far more legitimate to just discuss how much A/B testing should negatively affect a user. I don't have an answer and it's an interesting and relevant question.

mschuster91 · 2026-03-14T13:09:03 1773493743

> A/B testing the features of a web app is not unexpected or unethical.

It's not "unexpected" but it is still unethical. In ye olde days, you had something like "release notes" with software, and you could inform yourself what changed instead of having to question your memory "didn't there exist a button just yesterday?" all the time. Or you could simply refuse to install the update, or you could run acceptance tests and raise flags with the vendor if your acceptance tests caused issues with your workflow.

Now with everything and their dog turning SaaS for that sweet sweet recurring revenue and people jerking themselves off over "rapid deployment", with the one doing the most deployments a day winning the contest? Dozens if not hundreds of "releases" a day, and in the worst case, you learn the new workflow only for it to be reverted without notice again. Or half your users get the A bucket, the other half gets the B bucket, and a few users get the C bucket, so no one can answer issues that users in the other bucket have. Gaslighting on a million people scale.

It sucks and I wish everyone doing this only debilitating pain in their life. Just a bit of revenge for all the pain you caused to your users in the endless pursuit for 0.0001% more growth.

xg15 · 2026-03-14T19:55:59 1773518159

> It's far more legitimate to just discuss how much A/B testing should negatively affect a user. I don't have an answer and it's an interesting and relevant question.

You don't have an answer on "how much should A/B testing negatively affect a user"? So "a lot" would be on the table?

xg15 · 2026-03-14T19:51:11 1773517871

> The framing of A/B testing as a "silent experimentation on users"

Sorry, but how is A/B testing not exactly that? The experiments may be on non-disruptive things like button color, but they're experiments no less.

The users are also rarely informed about the experiment taking place, let alone on the motivation or evaluation criteria.

cyanydeez · 2026-03-14T15:22:36 1773501756

Relying on a paid service for anything significant is basically accepting the Company Store feudal serfdom.

Enshittification is coming for AI.

krisbolton · 2026-02-11T20:53:41 1770843221

The article even says "[...] some Nest devices record event histories and store them on-device. The third-gen wired Nest Doorbell can save up to 10 seconds of clips, while the first and second-gen wired doorbells can save up to three hours of event history, all without a subscription.".

kbelder · 2026-02-12T00:13:18 1770855198

>The third-gen wired Nest Doorbell can save up to 10 seconds of clips, while the first and second-gen wired doorbells can save up to three hours of event history

Wow, the march of progress.

plagiarist · 2026-02-12T01:26:09 1770859569

It's like you can see the moment they started gaining market share and could deliver shittier products.

lazide · 2026-02-12T12:02:40 1770897760

It’s all optimization.