I'm not sure how you'd reject that flakey test. We're talking <3% so first let's assume that you don't even see the failure until 10 other PRs get merged. Not only do you not know what caused the failure, it could be that the failure is in a test which was in the code for ages but the new code breaks its assumptions / initial environment.
Sometimes you can't just point at one thing and say reject this or revert that without a long investigation.
If the test failure is detected (so, you get super lucky) you should immediately reject the code, including the tests failing before some other fixups that didn't effect that test... Oftentimes it will take a long time to surface these, but I'm of the opinion that a broken build is show stopping until it's resolved - that doesn't mean 5-alarm all devs rush to the scene, but it does mean a free person picking it up as their next task or, barring that, bumping someone off of feature or other work to address the issue.
It may take a while... but while that flakey test exists in your codebase it will leverage a constant cost on all of your developers.
Which code? When you randomly run into the flakey test, in most cases it's not coming from the change which was just tested. You'd reject some random, unrelated PR
Sometimes you can't just point at one thing and say reject this or revert that without a long investigation.