Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

This. I wrote something that did this a few years ago. It took n patterns (not regex, simpler) and turned them into one DFA state table with a function ptr stored for each final state. Then it had a tiny runtime. Never thought of using it for route tables however.

Nice idea.

Edit: I did consider rewriting the runtime as a forth style interpreter as well but the time never appeared.



How long does it take to build the DFA when a new route is added? That seems like the major time performance problem point if new routes are added often or regularly.


To be honest I didn't time it and it wasn't thread safe and I don't have the source any more so I can't answer that. It doesn't do a lot so I imagine it would be relatively cheap.

The function it performed to rebuild the DFA was to create an NFA for the new pattern stream (it worked on char *) and then walk both the existing master NFA and the merge them. Then it was converted from NFA to DFA via dragon book copy and paste algorithm (my discrete math isn't great). Both NFA and DFA were represented as a bunch of C structs tied together by pointers. Then a table generator walked the new DFA and generated a new state table. It was very naive but covered the common case where large parts of the byte stream to be matched were similar.

I've got MacVim up and writing it again. May post it as a Show HN if I get the time to finish it.


Worst case is absurd, yes. But most (all?) added routes that exhibit that behavior are routes that would exhibit that behavior when run as a regex (and hence converted into a DFA) on their own. (I.e. generally if you would shoot yourself in the foot with this method you've already shot yourself in the foot.)

Your worst case is O(nd + O(constructing the new route's DFA)), where "n" is the number of nodes in the previous DFA and "d" is the number of nodes in the DFA of the new route.

Note that constructing r's DFA can be done beforehand, and hence the actual time the router will be offline is only O(nd).

(Assuming a simple 2 DFA -> 1 NFA -> DFA merge. Your worst case is that every possible pair from the two, and singleton of the two is present in the final DFA, i.e. ({node from old, node from new} -> nd + {node from old} -> n + {node from new} -> d) -> O(nd + n + d) -> O(n*d). Note that this isn't the classic O(2^n) behavior, because we know that, due to the old DFA and new route DFA being DFAs, only one node in n and one node in d can be present at a time in the final stateset.

A better bet may be to use the classic "lazily-constructed DFA" approach. In other words, keep it as an NFA while only converting (sets of) nodes to DFA nodes when necessary. If you are worried about memory, you can even do this only for "hot" sections of the NFA, or only for those sections of the NFA that have the most performance gain for the number of nodes added.


I'm not sure I can think of a case where new routes should be added dynamically, is there ?


That's the exact thing that caused Netflix's problem. Read the article.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: