Forum Navigation
Some regular expressions take a very long time to compile (at run time)
Posted: 04/11/10 16:29:36I have been converting some code that parses html pages looking for tags within <> braces using regular expressions. On changing from phobos to Tango I have noticed that compiling regular expressions that include the < and > characters as literals takes quite a long time. By compile I mean when the Regex object is instantiated at run time.
For example, on my machine the following 5 Regex lines take 94ms, 359ms, 563ms, 2734ms and 7438ms respectively at run time:
Regex re1; re1 = new Regex(r"a\<([^>]*)\>\<([^>]*)\>\<([^>]*)\>"); re1 = new Regex(r"aaaaaaaa\<([^>]*)\>\<([^>]*)\>\<([^>]*)\>"); re1 = new Regex(r"aaaaaaaaaaaa\<([^>]*)\>\<([^>]*)\>\<([^>]*)\>"); re1 = new Regex(r"a{40}\<([^>]*)\>\<([^>]*)\>\<([^>]*)\>"); re1 = new Regex(r"a{80}\<([^>]*)\>\<([^>]*)\>\<([^>]*)\>");I am escaping the < and > characters so that they are literal and not the look ahead or look behind operators. The only difference between these regular expressions is the number of leading 'a' characters.
The rapid increase in compile time for relatively minor additional regular expression complexity seems excessive. Am I missing something ?
I am using release 5428 of Tango v0.99.9 with DMD v1.056 on Windows.