It'll be a legal thing. You're reporting on behalf of yourself / a legal entity, so another system or entity can't say for you. I get it, but it really is a waste of time.
I think Apple became much better at security in recent years. One example which I think is indicative of their approach to security - they bothered to add a hardware microphone disconnect when a macbook is closed. Source: https://support.apple.com/en-gb/guide/security/secbbd20b00b/...
This is different though right? He found one (? we don't know who you're referring to - post sources for a higher quality discussion) vulnerability, he already knew it was there, etc. Anthropic didn't claim no other model can find vulnerabilities, nor that it's impossible with smaller models. They're claiming Mythos is a step-change in ability for end-to-end vulnerability discover and exploit creation. And that other frontier models are close behind.
There is independent research out there on frontier model security capability. AI Security Institute (UK) put out their paper comparing Mythos to other frontier models in early April. They've been tracking frontier model security capability since early 2023, so it's a decent dataset. https://www.aisi.gov.uk/blog/our-evaluation-of-claude-mythos...
I read it as the author is / was going through the vulnerability disclosure process with Microsoft and they're annoyed for unclear reasons and decided to publicly disclose, rather than being an insider.
Many brilliant people have serious mental health issues that preclude their ability to regulate their emotions and act maturely in serious situations e.g. responsible vulnerability disclosure.
I've watched genius-level IQ people get fired time and again because they don't know how to work with others at a basic kindergarten level.
To be honest if I got fired in a mean or unfair way I'd definitely hit back at my employer in such a manner if I'd have the ability to. I'm unlikely to have that though as I'm not aware of any saucy company secrets. But if this is what happened I think it's pretty justified.
The secret here seems to be that Microsoft caches the key somewhere even when it's supposed to be only in the TPM! That's a pretty big revelation IMO.
> The secret here seems to be that Microsoft caches the key somewhere even when it's supposed to be only in the TPM!
Not what happened here (I reserve my judgment wrt the promised TPM+PIN exploit).
In the default TPM-only mode of BitLocker, the secret is in fact in the TPM, which will (as instructed by Windows upon key creation) release it to the correct OS running on the correct computer. Notably not in the picture is any user-provided data: measured boot is the only protection. It is only the correct programming of the OS that makes it request an account password (completely unrelated to the disk-encryption cryptography) before letting the user poke at the disk, which the OS can at that point already decrypt.
Well, turns out the programming is such that if you ask politely it’ll just pop an Administrator(?) shell.
> Not what happened here (I reserve my judgment wrt the promised TPM+PIN exploit).
Yes this is the one I'm referring to.
I have noticed it myself, it has happened to me that my system rebooted to install updates and it did not pass through the blue TPM pin entry screen at that point. That was a big red flag for me. A normal reboot always does that, even a 'hot' reboot.
> BitLocker is now suspended, which means the drive remains encrypted, but Windows temporarily stores the unlock information so firmware changes won’t immediately trigger recovery.
> A normal reboot always [forces the TPM pin entry screen], even a 'hot' reboot.
In TPM-only mode, I only see the screen—which asks for an recovery key that serves an alternative to the TPM-borne secret, not for whatever you are calling the “TPM PIN” here—whenever I update the firmware or the bootloader (the latter from the other side of the dual-boot setup). Otherwise it boots straight to the login screen, which meshes with the measured-boot-only theory of operation I’ve described above. There’s nothing nefarious in this part, even if I think it exposes an unwisely large attack surface (e.g. the USB stack). I suspect you simply reboot so rarely you’re never hitting the happy path.
No I have the explicit PIN turned on. That means it requires a Pin entry on each boot. It's not the recovery screen though it looks similar. It's also not a password that's then hashed. It unlocks the TPM with a short pin, the number of attempts is limited by the TPM itself so that it doesn't get brute forced.
This is not a standard option, I think it can only be set through a group policy.
> To be honest if I got fired in a mean or unfair way I'd definitely hit back at my employer in such a manner if I'd have the ability to.
I knew a contractor that developed a habit of not paying his workers for a short time. After people started walking off job sites with his tools and showing up at his house demanding to get paid, he magically found the money to pay them.
It’s pretty unsurprising how vindictive regular people rapidly become when they’ve been ripped off.
Reporting wrongdoing to the ones doing it doesn't work. Perhaps they relied on Microsoft a bit too much for their livelihood and are just beginning to reevaluate their decisions. It's not so rare for brilliant people to live a life of the mind and not pay enough attention to their material conditions. But defining that as "serious mental health issues" is such a cheap shot.
> Reporting wrongdoing to the ones doing it doesn't work.
Most large companies — including Microsoft [1] — have an internal affairs call center where you can anonymously report issues of malfeasance — assuming that's what happened here.
Yeah I'm getting a lot of pressure to be a "team player" lately. I've told them over and over I'm not capable of that and that has never been a problem before. But we have a hipster new VP who is really pushy and wants to generalise everything.
"Not being a team player" doesn't mean the person is a nuisance, but they can be an introvert who has a limited interaction budget and can work silently and efficiently otherwise.
This generally means the person might not leave their cubicle much or give feedback frequent enough, but this doesn't mean they are not motivated to help others or share knowledge. One can approach and ask a question and get tons of help immediately.
How I know? That's me. I look like a cave dweller from a distance, but I'm not. The only difference I have is human interaction sometimes drains me a lot, so I just concentrate and work, yet everybody get their help immediately if they need them.
Also, no, I don't bite or belittle people. On the contrary.
Assuming the worst in others is bad. If I worked with you, I'd be looking for somewhere else the moment I found out how you think about me.
Remember. People don't leave bad jobs, but bad managers.
I have worked with lots of introverts and my empirical observation is that the introversion/extraversion axis is completely orthogonal to whether or not someone can be a team player.
You require both team players and "rockstar" individuals. It's not one or the other or a competition, because they do different things.
Yes if you put a someone who can't work on a team on a team and expect team work then that will not work. But that's obvious, so then don't do that. Expecting a homogeneous workforce isn't realistic or optimal.
I'm not a software engineer at all. And I tend to take on projects nobody else wants because they are too complicated or esoteric.
And I didn't say I'm not capable of being part of a team. Just that I need to have my own responsibilities within a team. I can't deal with micromanagement or excessive coordination like 'standups' every day.
Yeah you've completely misread this. The phrase "not being a team player" is a euphemism for someone not willing to do dubiously unethical or illegal (or things that go against internal company policy) things in support of a low level supervisor or manager's wishes. Or more favourably, someone who's unwilling to do things outside of what he's actually paid for or to do things unpaid (or outside working hours etc.). Also known as wage theft.
The guy saying that he has been accused of "not being a team player" isn't literally quoting his management here. He's summarizing that his immediate supervisors don't like him because he's unwilling to enter in some patronage like relationship with them.
The fact that you gave the benefit of the doubt to some faceless employer here instead of an actual person recounting his experiences is really sad and maybe ought to be reason for you to rethink your biases to jump to the conclusion that this guy is a toxic loner. Sounds like you're projecting hard here from some other experience.
That is also a thing yeah. It's not really unethical or illegal but our VP has a huge preference for snazzy glitzy projects and never wants to tackle the problems that cause real pain in the organisation because they are not spectacular and don't make him look good. And yes I bring that up whenever it comes into play. I'm definitely not an order-follower.
> I've told them over and over I'm not capable of that
I can relate and empathize. And also provide this suggestion based on my own similar experience: if you can't provide evidence (e.g. doctor's diagnosis) that you are "special" or "not capable of that", then they don't have to care and will take steps to force you out. I wish you all the best.
Here in Europe it's different, we have more rights. Unfortunately I don't have an official diagnosis but I'm definitely neurodivergent. I've been meaning to get one but it is difficult.
I was once (12 years ago) told: "they debate, they decide, we deliver" along with other "teamwork" pablum. This evil has been with us for a very long time, unfortunately.
Nonsense. there are way more accommodations for people who wouldn't have had a place 20 years ago... those accommodations have changed what a "standard IC" is. There never was a place for run-of-the-mill geniuses who couldn't be bothered to spend a few hours researching P2P (Person to Person) protocols. They were always pushed off to small companies where the risk was much lower. This hasn't, won't, and shouldn't change. If that makes you salty, I got some things I'd recommend you research.
Adults pay rent in money, not feelings. The answer to “how could Microsoft leave you homeless?” is “by not paying you”, not some bizarre “by making you feel so bad you lose your house, which you pay for with good feelings”
This is an oddly passive-aggressive comment when a much more likely read is they were relying on the funding and the large tech company did what large tech companies do and started moving slowly.
And I can see others already blaming them for relying on the vulnerability for living expenses, but if we can hold the hyper-rationalization for a second, we shouldn't be against the person who expected an organization with more money than God to uphold a deal for relative peanuts, right?
Like yes we all get that large orgs make spending $5 very hard, many claps for being the in-group, but their frustration would be understandable.
I'm supposed to feel bad that Microsoft didn't immediately wire him an advance on the bounty before validating anything? Have you ever tried to get anything corrected with a corporate payroll department? Try three months minimum.
It's like suggesting someone was relying on a lottery ticket to payout to survive.
I tried to be as coddling with my language as possible.
Acknowledged how orgs work, separated blaming the org from sympathizing with their reaction, tried to separate the prudence of their actions from the sticky situation they'd still be left in by the orgs actions...
But it was for naught: people are really ingrained in a weird "might-makes-right" model of corporate operations. "Larry Ellison is a lawnmower" was supposed to be a jeremiad but now it's more like a guiding principle that we browbeat anyone for questioning.
> we shouldn't be against the person who expected an organization with more money than God to uphold a deal for relative peanuts, right?
You're assuming that there was a deal that wasn't upheld. I don't think we have enough information to assess that. This person's blog posts do read as being somewhat unstable. There's even someone in the comments seemingly genuinely trying to be helpful: "Just wondering if you’re BiPolar (like me) and see a different reality than what is real. Been there."
If you take the statement at face value, that does not appear to be the case. If you don’t take it at face value, the underlying presumptions might be a lot of why they may not be employable.
Really depends on your background doesn't it? You could have convictions, be sanctioned, have visa problems, or all kinds of things that are not easily solvable.
Indeed, and this guy's personality seems a little "difficult" which might make the interview process short. I've known people with insane skills who have such weird personalities that they never get hired. Doing remote bug bounty stuff is a blessing for them.
Come on, with these skills you could convince someone to give you a job if you’re on the streets otherwise. You might not be a senior engineer in the exact thing you want but you won’t be on the streets.
We are, quite notably, in a huge hiring crisis where vast numbers of programmers and researchers can't even get interviews. It really is not that simple
Sure sounds like rhetorical questions or attacking the messenger. Someone can think the bounty industry is going to reward them for actually being exceptional and not look soon enough for other options then pivot to a stance that should give them some quick job offers. If I thought I found an intentional back door I would not engage with an embargo system from the same vendor but I am also not them.
> Someone can think the bounty industry is going to reward them for actually being exceptional and not look soon enough for other options then pivot to a stance that should give them some quick job offers
Sure. And that’s a meaningful answer to the question.
“people with values different from yours, presumably” is a condescending nonanswer.
If someone has this kind of exploit and can't get a bug bounty for it, and desperately needs the money, he can sell it for 100k+ in a shady black market
I don't think they "forgot" about processors. It was out of scope. Creating the pipeline to end up with a fully "sovereign" system end-to-end is a decades long process and hundreds of billions of euros. As others have pointed out, in this context "sovereign" meant data processing. This is also a fairly paranoid take. Not to say hardware isn't targeted, but there are other methods. So spending hundreds of billions and several decades to build the fabs to gain assurance... it's a waste of time.
Exactly, it was out of scope. You cannot in one go have a full-blown scope. It won't work.
This is a sensationalist headline (CBA to RTFA). It isn't a case of all or nothing, it is about becoming less dependent. A country like China follows the same industry, and besides, in a globalist economy like ours we are dependant on each other. So, for example, a lot of hardware components come from China, and assembly happens there as well. That counts for EU (DE, FR, ...), US, CA, RU, UA, CN, IN, etc. But as the talk on 39c3 has shown [1]: we can DIY.
France's Scaleway already offers RISC-V bare metal servers.
It's a first step that brings most of the value with close to no cost, as RISC-V is cheap nowadays
This cloud provider is a for profit company, not a research institute, so they can see short term commercial value if they do it
Did that risk materialise? I suppose it would be only the same as credit cards. With a valid warrant authorities can gain access to information. But that's within a legal system designed by an elected parliament. I'm more concerned about ensuring the legal powers are checked and balanced, and stay that way.
> But that's within a legal system designed by an elected parliament.
Ah well if it's an elected government then the risk of it turning hostile to its people is zero, of course!
And ask "did that risk materialize?" to the people in China, or North Korea, or Russia, or Belarus, or Germany [1], or USA [2]. There are countless examples of the dangers of surveillance, in the present and in history - you don't need a specific example of exactly Oyster cards being used, to know they are a danger.
The framing of A/B testing as a "silent experimentation on users" and invoking Meta is a little much. I don't believe A/B testing is an inherent evil, you need to get the test design right, and that would be better framing for the post imo. That being said, vastly reducing an LLMs effectiveness as part of an A/B test isn't acceptable which appears to be the case here.
> I don't believe A/B testing is an inherent evil, you need to get the test design right, and that would be better framing for the post imo.
I disagree in the case of LLMs.
AI already has a massive problem in reproducibility and reliability, and AI firms gleefully kick this problem down to the users. "Never trust it's output".
It's already enough of a pain in the ass to constrain these systems without the companies silently changing things around.
And this also pretty much ruins any attempt to research Claude Code's long term effectiveness in an organisation. Any negative result can now be thrown straight into the trash because of the chance Anthropic put you on the wrong side of an A/B test.
> That being said, vastly reducing an LLMs effectiveness as part of an A/B test isn't acceptable which appears to be the case here.
The open question here is whether or not they were doing similar things to their other products. Claude Code shitting out a bad function is annoying but should be caught in review.
People use LLMs for things like hiring. An undeclared A-B test there would be ethically horrendous and a legal nightmare for the client.
Anyone who trusts LLMs to do anything has shit coming. You can not trust them. If you do, that's on you. I don't care if you want to trust it to manage hiring, you can't. If you do anyway then the ethical problems are squarely on you.
People keep complaining about LLMs taking jobs, meanwhile others complain that they can't take their jobs and here I am just using them as a useful tool more powerful than a simple search engine and it's great. No chance it'll replace me, but it sure helps me do ny job better and faster.
Would you have a problem with the following scheme?
Every client is free and encouraged to feed back its financial health: profit for that hour/day/month/...
The AB(-X) test run by the LLM provider uses the correlation of a client's profit with its AB(-X) test, so that participating with the testing improves your profit statistically speaking (sometimes up sometimes down, but on average up).
You may say, what about that hiring decision? One thing is certain: when companies make more profit they are more likely to seek and accept more employees.
That sounds like a good way to get extreme short-term optimization.
Say a particular finetune prioritizes profits right now and makes recommendations like "cut down on maintenance, you can make up for it later with your increased profits and their interest". It produces more profits, and wins the AB test. Later the chickens come home to roost.
You can reduce the problem by using long-term indicators, but then each AB test is very slow.
I think you would be hard pushed to find any big tech company which doesn't do some kind of A B testing. It's pretty much required if you want to build a great product.
A big tech company has ~10k experiments running at once. Some engineers will be kicking off a few experiments every day. Some will be minor things like font sizes or wording of buttons, whilst others will be entirely new features or changes in rules.
Focus groups have their place, but cannot collect nearly the same scale of information.
As someone who works in these orgs, only a small fraction are about user experience metrics. 90+% are extracting more short term value with unknown second order effects on usability.
Yeah, that's why we didn't have anything anyone could possibly consider as a "great product" until A/B testing existed as a methodology.
Or, you could, you know, try to understand your users without experimenting on them, like countless of others have managed to do before, and still shipped "great products".
I know this is a salty take, but reliance on A/B testing to design products is indicative of product deciders who don't know what they are doing and don't know what their product should be. It's like a chef saying, I want to make a pancake, but trying 50 different combinations of ingredients until one of them ends up being a pancake. If you have to test whether a product works / is good / is profitable, then you didn't know what you were doing in the first place.
Using A/B tests to safely deploy and test bug fixes and change requests? Totally different story.
Long term effectiveness? LLMs are such a fast moving target. Suppose anthropic reached out to you and gave you a model id you could pin down for the next year to freeze any a/b tests. Would you really want that? Next month a new model could be released to everyone else - or by a competitor - that’s a big step difference in performance in tasks you care about. You’d rather be on your own path learning about the state of the world that doesn’t exist anymore? nov-ish 2025 and after, for example, seemed like software engineering changed forever because of improvements in opus.
>Suppose anthropic reached out to you and gave you a model id you could pin down for the next year to freeze any a/b tests. Would you really want that?
If you really want to keep non-determinism down, you could try (1) see if you can fix the installed version of the clause code client app (I haven’t looked into the details to prevent auto-updating..because bleeding edge person) and (2) you can pin to a specific model version which you think would have to reduce a/b test exposure to some extent https://support.claude.com/en/articles/11940350-claude-code-...
> Suppose anthropic reached out to you and gave you a model id you could pin down for the next year to freeze any a/b tests. Would you really want that?
Yes. I'd like some guarantee that my results are reproducible for some reasonable amount of time. New versions can also introduce regressions. A prompt that works well with today's model might not work with tomorrow's, even if the latter is "better".
> And this also pretty much ruins any attempt to research Claude Code's long term effectiveness in an organisation. Any negative result can now be thrown straight into the trash because of the chance Anthropic put you on the wrong side of an A/B test.
LLMs are non-deterministic anyway, as you note above with your comment on the 'reproducibility' issue. So; any sort of research into CC's long-term effectiveness would already have taken into account that you can run it 15x in a row and get a different response every time.
These are two very different things. I suspect that in some cases pointing finger at a black box instead of actually explaining your decisions can actually shield you from legal liability...
Hiding something like this in the TOC rather than explicitly asking users to opt in is a dark pattern. You can't gain the moral highground by cackling that someone should have read the fine print.
All TOS essentially boil down to "we owe you nothing and can change the product at anytime to anything we want at our sole discretion"
Obviously it would be unreasonable to accept such terms without further context. The further context in this case being that Anthropic will maintain Claude as an AI agent and seek to improve it's performance. What is at the heart of this issue is whether or not Anthropics recent A/B testing violated that context. Not whether or not they violated the TOS (they didn't, obviously)
I read the article saying they were testing service changes on paying users without knowledge or explicit consent of the user, that the user had to test and determine why they perceived their sevice changed.
That is a dark pattern to inflict on users expecting consistent output.
Evil might be a stretch, but I really hate A/B testing. Some feature or UI component you relied on is now different, with no warning, and you ask a coworker about it, and they have no idea what you're talking about.
Usually, the change is for the worse, but gets implemented anyway. I'm sure the teams responsible have "objective" "data" which "proves" it's the right direction, but the reality of it is often the opposite.
> I'm sure the teams responsible have "objective" "data" which "proves" it's the right direction, but the reality of it is often the opposite.
In my experience all manner of analytics data frequently gets misused to support whatever narrative the product manager wants it to support.
With enough massaging you can make “objective” numbers say anything, especially if you do underhanded things like bury a previously popular feature three modals deep or put it behind a flag. “Oh would you look at that, nobody uses this feature any more! Must be safe to remove it.”
> The framing of A/B testing as a "silent experimentation on users" and invoking Meta is a little much.
No. Users aren't free test guinea pigs. A/B testing cannot be done ethically unless you actively point out to users that they are being A/B tested and offering the users a way to opt out, but that in turn ruins a large part of the promise behind A/B tests.
Meta has had an IRB for well over a decade (following a scandal where they used their users as lab rats) and that didn't stop them from doing any of the BS they did ever since.
Not to start an internet argument -- I don't think it is appropriate in this context. A/B testing the features of a web app is not unexpected or unethical. So invoking the memory of cambridge analytica (etc) is disproportionate. It's far more legitimate to just discuss how much A/B testing should negatively affect a user. I don't have an answer and it's an interesting and relevant question.
> A/B testing the features of a web app is not unexpected or unethical.
It's not "unexpected" but it is still unethical. In ye olde days, you had something like "release notes" with software, and you could inform yourself what changed instead of having to question your memory "didn't there exist a button just yesterday?" all the time. Or you could simply refuse to install the update, or you could run acceptance tests and raise flags with the vendor if your acceptance tests caused issues with your workflow.
Now with everything and their dog turning SaaS for that sweet sweet recurring revenue and people jerking themselves off over "rapid deployment", with the one doing the most deployments a day winning the contest? Dozens if not hundreds of "releases" a day, and in the worst case, you learn the new workflow only for it to be reverted without notice again. Or half your users get the A bucket, the other half gets the B bucket, and a few users get the C bucket, so no one can answer issues that users in the other bucket have. Gaslighting on a million people scale.
It sucks and I wish everyone doing this only debilitating pain in their life. Just a bit of revenge for all the pain you caused to your users in the endless pursuit for 0.0001% more growth.
> It's far more legitimate to just discuss how much A/B testing should negatively affect a user. I don't have an answer and it's an interesting and relevant question.
You don't have an answer on "how much should A/B testing negatively affect a user"? So "a lot" would be on the table?
The article even says "[...] some Nest devices record event histories and store them on-device. The third-gen wired Nest Doorbell can save up to 10 seconds of clips, while the first and second-gen wired doorbells can save up to three hours of event history, all without a subscription.".
>The third-gen wired Nest Doorbell can save up to 10 seconds of clips, while the first and second-gen wired doorbells can save up to three hours of event history
reply