Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

> Most of the time it's the test itself that is flaky

I recently went through a heavy de-flaking on a suite of Selenium tests. I found this comment to be true in my case; it was reasonable-seeming assumptions in the tests that caused flakiness more often than anything else. The second most common cause was timing or networking issues with the Selenium farm.

Spending the time actually de-flaking the tests was quite enlightening and lead to some new best practices for both writing tests, and for spinning up Selenium instances.

Because of that experience, I'm not sure I would agree with giving up on tests that fail less than 3% of the time, because fixing one of those cases can sometimes fix all of them. Learning the root causes of test failures lead me to implement some fixes that increased the stability of the entire test suite. Sometimes there's only one problem but it causes all the tests in a group to fail one at a time, infrequently and seemingly randomly.



I'm having similar issues with Selenium test at the moment. Can you give any insights into deflaking tests failing on timing or networking issues (I'm guessing it's a case by case basis, but generic tips might work for some problems)? This is by far the biggest pair of reasons for the suite failing for us, so any help would be really...er...helpful! :D


For me, the main class of flaky Selenium tests were expecting CSS related changed to appear on-screen in the same frame or one frame later, when sometimes they take more than one frame. The main result was to rarely/never execute multiple actions in a row without waiting; the tests changed considerably into pairs of [ action -> wait for specific CSS change on screen ] throughout all the tests. Also I did have some sleep calls for a few hard to test things, even though I knew it was a bad practice, and I spent time eliminating all of them and figuring out the fundamental reason it was hard to measure and/or adding something to the front end code to signal via CSS that a long running operation was complete.

On the setup side, the main networking issue for me was ssh. This might be different for you if you use either a Selenium cloud service, or a proper VPN setup. I was spinning up a virtual network on AWS for a Selenium farm in a fairly ad-hoc and manual way using ssh. The tests were sometimes (only very occasionally) starting before the ssh connection was actually ready, so I had to put a delay in the setup script to send & wait until a packet had actually been received before launching the test. I used netcat for that.


My approach has been to write a layer into our testing system that retries `find_element_by_*` for n seconds on failure (5 works for most of our tests) before reporting problems. I mean to eventually generalize and open-source them (I will try to contribute back, but since I only do this for Python, it might be rejected).

Just having this means a lot less time finding those problems for me.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: