Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

Executing an NFA on search text takes linear time and space, so what I said is true. ;-) In practice, it is hard to make NFA execution as fast as backtracking engines. (PCRE famously implements an NFA, calls it a DFA, and uses that to declare that the DFA engine is slow, which is incredibly misleading.[1] Thank you, Mr. Friedl. sigh)

Production grade regex engines with a DFA (like GNU grep, RE2 and Rust's) do conversion lazily. By doing it lazily, at most one new DFA state is added for each byte in the input in the worst case, which maintains the linear time bound. Unfortunately, this can result in memory growth proportional to the search text, which is why all such implementations use a fixed-size cache of states that is flushed once it's full. It works well in practice, but can slow down dramatically (to about the speed of an NFA) if the cache of states needs to be flushed frequently. The most common provoker of such behavior is large counted repetitions, e.g., `\pL{100}`.

[1] - http://pcre.org/current/doc/html/pcre2matching.html#SEC4



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: