FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Compile-time regexp: getting it right

 
Post new topic   Reply to topic     Forum Index -> DDL - D Dynamic Libraries
View previous topic :: View next topic  
Author Message
Don Clugston



Joined: 05 Oct 2005
Posts: 91
Location: Germany (expat Australian)

PostPosted: Mon Jul 03, 2006 1:30 am    Post subject: Compile-time regexp: getting it right Reply with quote

I've been giving some thought to compile-time regular expressions.
There are three benefits to compile-time regexps, over the run-time approach used in (say) std.regexp:
(a) catch errors at compile time instead of throwing an exception at run-time;
(b) eliminate the cost of compiling the regexp in the first call;
(c) performance improvement by generating more customized execution code.

The approaches we've tried to date, involving recursive templates expansion or mixins, have tried to accomplish all of these simultaneously. This seems to be too ambitious. In particular, I think that in the general case, with the current limitations of mixins, we have no chance of out-performing the "state-machine-string with giant switch statement" approach.

Instead, I think it's better to split the problem up. The compile-time regexp should only compile the regexp string into a state-machine string, detecting errors in the process. Then, it can classify the string, to choose which run-time should be used. There is probably a small but important subset of regexps for which it's worth trying to generate optimal code.


With this approach to compile-time regexps, the development approach should be:
* Get it working for run-time regexps first. Initially, the compile-time regexps would just pass the string to the run-time parser, optimiser and engine. Make sure we have a good run-time regexp, but using the compile-time syntax.
* Duplicate the run-time compilation into the compile-time parser. (Note: even the optimisation step could be left at run-time; initially, we only need to get benefit (a)).
* Only move things to compile-time when they have a demonstrable speed benefit.

I also think that the Perl6 regexp syntax is well worth considering. I think Larry makes some excellent points about how ugly traditional regexp syntax is (and it seems similar to the philosophy of D vs C++); we should consider grabbing some ideas from there, especially, the stuff about capturing vs non-capturing brackets.

http://dev.perl.org/perl6/doc/design/apo/A05.html

Comments?
Back to top
View user's profile Send private message
kris



Joined: 27 Mar 2004
Posts: 1494
Location: South Pacific

PostPosted: Mon Jul 03, 2006 3:15 pm    Post subject: Reply with quote

Note that (b) can be offset when using std.Regex by stashing pre-constructed Regex instances. In other words, you don't have to compile the pattern at the point of usage.
Back to top
View user's profile Send private message
Don Clugston



Joined: 05 Oct 2005
Posts: 91
Location: Germany (expat Australian)

PostPosted: Tue Jul 04, 2006 12:28 am    Post subject: Reply with quote

Quote:
Note that (b) can be offset when using std.Regex by stashing pre-constructed Regex instances. In other words, you don't have to compile the pattern at the point of usage.


That's true. And if it turned out that searching for the regexp in a list of previously compiled ones was an expensive operation, a compile-time hash function could be used (and that's trivial to write).
This could be a lot less work than I thought.[/quote]
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic     Forum Index -> DDL - D Dynamic Libraries All times are GMT - 6 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group