Biggest trap of Simpson's paradox is the results can change with every level of ...

TheMrZZ · on Aug 12, 2024

To add some proofs to my answer, I actually coded a Z3 program to prove it! The 3-variables version takes too long to resolve, but I got results for the 2-variables version (tumor size + gender):

Results can be found in this GSheet: https://docs.google.com/spreadsheets/d/1tsBhElTgXjVTeas8quar...

Code is here: https://gist.github.com/TheMrZZ/c33927ca2cc917997a67d7f84b82...

I'm currently running the 3-variables version, hopefully I'll get results this afternoon.

We can clearly see the same problems that arise in the 1-variable Simpson's paradox (widely different population sizes).

Narhem · on Aug 11, 2024

Like Tumors are higher dimensional objects than what human brains are trained to perceive.

gradschoolfail · on Aug 12, 2024

For pedagogues and practitioners alike: there is a subtle connection between Simpson’s paradox and the wild geometry of relative entropy. This might be partly why effect sizes are also contentious.

Besides Ellenberg’s mind-altering discussion of that link[1], see hints on the second page of:

https://www.qeios.com/read/XB1N2A/pdf

[1] "[the point of Simpson’s paradox] isn't really to tell us which viewpoint to take but to insist that we keep both the parts and the whole in mind at once."

Ellenberg, from Shape: The Hidden Geometry of Information, Biology, Strategy, Democracy, and Everything Else (2021)

oh_my_goodness · on Aug 11, 2024

If the previous comment is right, then this one is plainly also true in some sense. I'm disappointed to see downvotes.

TheMrZZ · on Aug 12, 2024

> If the previous comment is right

I actually coded a Z3 program to prove it! The 3-variables version takes too long to resolve, but I got results for the 2-variables version (tumor size + gender):

Results can be found in this GSheet: https://docs.google.com/spreadsheets/d/1tsBhElTgXjVTeas8quar...

Code is here: https://gist.github.com/TheMrZZ/c33927ca2cc917997a67d7f84b82...

I'm currently running the 3-variables version, hopefully I'll get results this afternoon.

We can clearly see the same problems that arise in the 1-variable Simpson's paradox (widely different population sizes).

mb7733 · on Aug 12, 2024

I think the real-world resolution to this problem is straightforward though. You should look at the finest level of granularity available, and pick the best treatment in the relevant subpopulation for the patient.

jefftk · on Aug 12, 2024

Unfortunately our level of certainty generally falls off as we increase the granularity. For example, imagine the patient is a 77yo Polish-American man, and we're lucky enough to have one historical result for 77yo Polish-American men. That man got treatment A and did better than expected. But say if we go out to 70-79y white men we have 1,000 people, of which 500 got treatment A and generally did significantly worse than the 500 who got treatment B. While the more granular category gives us a little information, the sample size is so small that we would be foolish to discard the less granular information.

mb7733 · on Aug 12, 2024

This is all true. I originally added a disclaimer to my post that said "assuming you have enough data to support the level of granularity" but I removed it for brevity because I thought it was implied -- small sample size isn't part of Simpson's paradox. My apologies for being unclear

throwawaystress · on Aug 12, 2024

The smaller the subpopulation, the higher the variance, and the less significant the result.