Curious what the A/B test actually changed -- the article mentions tool confirmation dialogs behaving inconsistently, which lines up with what I noticed last week. Would be nice if Anthropic published a changelog or at least flagged when behavior is being tested.
Could you provide the details of the complete verification?
*On the original story you only showed Claude like responses, not how you dug into the binary