Note: This website is archived. For up-to-date information about D projects and development, please visit wiki.dlang.org.

root/trunk/scregexp.d

Revision 25, 99.3 kB (checked in by mp4, 14 years ago)

added printclasscode and getclasscodefunctions

Line 
1 // scregexp
2 // Statically compiled /Compile-time regular expression Perl-compatible parser, version 1.0
3 // Author: Marton Papp, borrowing heavily from the regexp 2.0 parser written by Don Clugston.
4 // This version uses CTFE for simplicity, flexibility, higher performance and easy debugging
5 //
6 // It is a top/down recursive descent regexp parser with backtracking.
7 // The license is bsd-like, see the end of this file.
8 //
9 //
10 // Terminology:
11 // a regex "sequence" is a set of consecutive "term"s,
12 // each of which consists of a "naked term", optionally followed by
13 // a "quantifier" (*,+,?, {m}, {m,} or {m,n}).
14 // A "naked term" is either a "sequence" or an "atom".
15
16 /*
17 Regular expression formats:
18 "/regex/options" e.g. "/.+/s"
19 "regex/options" e.g. ".+/s"
20 "/regex/" e.g. "/.+/"
21 "regex" e.g. ".+"
22 possible options: m s x i (they behave as in Perl)
23 x is now implemented fully. # comments out one line
24
25 Features currently supported:
26    it matches as if /../sm were used in Perl.
27    * match previous expression 0 or more times
28    + match previous expression 1 or more times
29    ? match previous expression 0 or 1 times
30    {m,n}  match previous expression between m and n times
31    {m,} match previous expression m or more times
32    {,n} match previous expression between 0 and m times
33    {n} match previous expression exactly n times.
34    . match any character if s option is used.
35    . match any non-\n character if s is not set.
36    other characters match themselves
37    a|b   match regular expression a or b
38    (?: ) uncaptured grouping
39    ( ) captured grouping
40    (?> ) independent subexpression
41    (?= ) lookahead subexpression
42    (?! ) negative lookahead subexpression
43    (?<name>) or (?'name') named captured grouping
44    \k<name> or \k'name'  insert previously captured named group 
45    ^  anchor,start of line if m option is used
46    ^  anchor,start of string if no option is used
47    $  anchor,end of line (at \n and the end of string) if m option is used
48    $  anchor,the end of string or before \n at the end of string if no m is used
49    \A start of string
50    \z end of string
51    [abc]  match any character in character class abc
52    [^abc] match any character not in character class abc
53    @n  match string variables passed into the functions as extra parameters. (this is a non-standard extension).
54    escape characters
55    \d,\s,\w,\D,\S,\W
56    [\d\s\w\D\S\W] in character classes too
57    All matches are greedy except if an additional ? used.
58    Use ? after quantifiers (*,+,{n,m},?) to make matches non-greedy/lazy.
59    Use + after quantifiers (*,+,{n,m},?) to make matches possessive.
60    \1..\9 to match previously captured subsequences
61    These characters have to be escaped to match them: ()[]?*+.{}^$|@
62    / needs to escaped if it is used at the start or at the end of regex
63   
64 Compile time error handling:
65    redundant ) is found
66    redundant ] is found
67    Regexp must not end with \\
68    Unmatched parenthesis
69    unsupported quantifier
70    unmatched { in regular expression
71    if \1..\9 references a group not accessable or not defined
72    use of + or {m,n} on any and this capturing group is not supported
73    use of * or {0,} on any and this capturing group is not supported
74    start of range of a character range is bigger than ending range
75    \k must be followed by 'name' or <name>
76    Closing > missing
77    Closing ' missing
78    unmatched [ in regular expression
79   
80 Run time error handling:
81    if \1..\9 references a group not captured
82   
83 Limits:
84    length of input string : max of int
85    number of groups: max of int
86    regular expression size: limited by char[] size and stack
87    number of 's : 9 (\1..\9)
88   
89 Functions:
90
91    bool test!(char[] regexp)(char[] stringtosearch)
92    short form: t
93    returns true if regexp matches the beginning of stringtosearch 
94
95    char[] search!(char[] regexp)(char[] stringtosearch)
96    short form: s
97    finds the first match of regexp
98    returns the substring found
99    returns null if nothing found
100   
101    int index!(char[] regexp)(char[] stringtosearch, int indextostartat)
102    short form: i
103    finds the first match of regexp
104    returns its starting index in stringtosearch
105    returns -1 if nothing found
106   
107    indexrec index2!(char[] regexp)(char[] stringtosearch, int indextostartat)
108    short form: i2
109    finds the first match of regexp
110    returns its start and end (last char+1) in stringtosearch
111    returns indexrec(-1,-1) if nothing found
112   
113    indexrec[] indexall!(char[] regexp)(char[] stringtosearch, int indextostartat,..)
114    short form: ia
115    finds all occurances of the regexp in stringtosearch from left to right
116    Matches follow each other. No overlaps are possible. in Perl:/ /g
117    returns the start and end (last char+1) of found strings in stringtosearch
118    returns an an empty array if nothing found
119   
120    char[][] searchall!(char[] regexp)(char[] stringtosearch,..)
121    short form: sa
122    finds all occurances of the regexp in stringtosearch from left to right
123    Matches follow each other. No overlaps are possible. It is similar to Perl:/ /g
124    returns found strings
125    returns an an empty array if nothing found
126   
127    grouprec [] indexgroups!(char[] regexp)(char[] stringtosearch,..)
128    short form: ig
129    finds the first match of regexp, the captured groups are returned
130    returns the start/end indexes of captured groups and the whole string matched(at index 0)
131    returns null if nothing found
132    grouprec(something,-1) means that given group was not captured
133   
134    grouprec [][] indexgroupsall!(char[] regexp)(char[] stringtosearch,..)
135    short form: iga
136    finds all occurances of the regexp in stringtosearch from left to right
137    Matches follow each other. No overlaps are possible. in Perl:/ /g
138    returns the start and end (last char+1) of found groups of all matches
139    returns an an empty array if nothing found
140   
141    char [] group(char[] stringtosearch,grouprec g,int groupno)
142    can be used to have access to the results of searchgroups in an easier way
143    g should come from searchgroups
144    returns groupno-th captured group
145    if groupno is 0, returns the whole match
146    returns []/null if no match for given group
147   
148    void searchgroupstest(char[]reg)(char[]input)
149    for testing, prints found matches,groups
150    reg regular expression to use
151    input input string to parse
152   
153    void searchgroupsalltest(char[]reg)(char[]input)
154    for testing,prints all solutions found
155    reg regular expression to use
156    input input string to parse
157   
158    void searchgroupstest2(char[]reg)(char[]input, char[][] target)
159    for testing,compares found matches with target, it stops the program in case of failure
160    reg regular expression to use
161    input input string to parse
162    target expected matches/groups
163   
164    void printcode(char[] reg)
165    prints the generated D code from reg
166    char[] tr(char[]reg)(char[] convertable)   -fast transliteration throught switches
167         e.g. tr!("/12345/abced/")(str)
168         
169    void printclasscode(char [] reg)
170    prints screg class as it should look after mixins are processed   
171   
172    void getclasscode(char [] reg)
173    returns a string containing screg class as it should look after mixins are processed       
174
175 Class:
176   screg - another way to search in strings
177   bool match(char[] searchstrin); - match searchstrin , returns true if match is found
178                                     if it is called again, it finds the next match
179                                     It is similar to search without using anchors
180   bool gmatch(char[] searchstrin); - match as if \G(?:searchstrin) , returns true if match is found
181                                     if it is called again, it finds the next match
182                                     
183   char[] _(int groupno) - returns given group matched (_(0) returns the whole string matched
184   char[] [int groupno] - used as an array
185                            returns given group matched ([0] returns the whole string matched
186   char[] ismatched(int groupno) - returns if the given group matched
187   char[] exists(int groupno) - returns if the given group exists, not necessarily matched
188   int pos() - return current position from where matching is attempted (as in Perl)   
189   void pos(int pin) - set current position where matching is attempted
190   void restart() - restart, set position to 0
191 Example:
192   auto reg1=new screg!("/ab/"); // regular expression to use
193   while (reg1.match("abxabxab"))
194   {
195     writefln(reg1._(0));
196   }   
197  
198   auto reg3=new screg!("/(?<day>monday)|(?<day>tuesday)|(?<day>wednesday)/");
199   if (reg3.match("wednesday"))
200   {
201      // reg3.groupname.day gives back the group number
202      writefln(reg3._(reg3.groupname.day)); //prints the matched day
203      writefln(reg3.getday());//prints the matched day in another way
204   }
205 */
206
207 // Points of interest:
208 // * The parser is able to treat all 'quantifier's in a single mixin function, while still applying
209 //   optimisations (eg, there's absolutely no difference between {1,} and "+").
210 // * There is absolutely no parameter passing inside the regexp engine. Even functions which
211 //   can't be inlined will have very low calling cost.
212 // * Consequently, the speed is excellent. The main unnecessary operations are the checks to see whether we
213 //   are at the end of the string.
214 //   This could be greatly improved by precalculating the minimum length required for a match,
215 //   at least for subsequences of fixed length.
216 // * Since each mixin can be given access to any desired runtime or compile-time parameters,
217 //   the scheme is extremely flexible.
218
219 module scregexp;
220 version(Tango)
221 {}
222 else
223 { version = Phobos;}
224 version(Phobos)
225 import std.string;
226 version(Tango)
227 {
228   import tango.text.Ascii;
229   alias toUpper toupper;
230   alias toLower tolower;
231   alias icompare icmp;
232   import tango.io.Stdout;
233 }
234 //---------------------------------------------------------------------
235 // Part 0 : Functions from the meta library
236 //---------------------------------------------------------------------
237
238 /******************************************************
239  *  ulong atoui!(char [] s);
240  *
241  *  Converts an ASCII string to an uint.
242  */
243 uint atoui(char [] s, uint result = 0, int indx = 0)
244 {
245   if (s.length == indx)
246     return result;
247   else if (s[indx]<'0' || s[indx]>'9')
248       return result;
249     else
250       return atoui(s, result * 10 + s[indx] - '0', indx + 1);
251 }
252 char[] tostring(uint i)
253 {
254   uint i2 = i / 10;
255   uint digit = i - i2 * 10;
256   if (i >= 10)
257   {
258     char[] s;
259     s = tostring(i2)~cast(char)(digit + 48);
260     return s;
261   }
262   else
263   {
264     char[] s;
265     s = ""~cast(char)(digit + 48);
266     return s;
267   }
268 } 
269
270 //---------------------------------------------------------------------
271 // Part I : Functions for parsing a regular expression string literal.
272 //---------------------------------------------------------------------
273 // None of these generate any code.
274
275 // retuns index of first char in regstr which equals ch, or -1 if not found
276 // escaped chars are ignored
277 int unescapedFindFirst(char [] regstr, char ch, int indx = 0)
278 {
279   if (regstr.length <= indx)
280     return - 1; // not found
281   else if (regstr[indx] == ch) return indx;
282     else if (regstr[indx] == '\\')
283         // if it's escaped, prevent it from matching.
284         return unescapedFindFirst(regstr, ch, indx + 2);
285       else return unescapedFindFirst(regstr, ch, indx + 1);
286 }
287
288 int sizeOfComment(char [] regstr)
289 {
290   int indx;
291   if (regstr.length <= indx)
292     return - 1;
293   while (regstr.length > indx )
294   {
295     if (regstr[indx] == 13)
296     {
297       indx++;
298       if (regstr.length>indx && regstr[indx] == 10)
299         return indx;
300       return indx;
301     }
302     if (regstr[indx] == 10)
303     return indx;
304     indx++;
305   }
306   return indx;
307 } 
308
309 // Returns the number of chars at the start of regstr which are made up by
310 // a repetition expression (+, *, ?, {,} )
311 int quantifierConsumed(char [] regstr)
312 {
313   if (regstr.length == 0) return 0;
314   else if (regstr[0] == '+' || regstr[0] == '*' || regstr[0] == '?') return 1;
315     else if (regstr[0] == '{') {
316         if (unescapedFindFirst(regstr, '}') ==  - 1) {
317           assert(0, "\nError: unmatched { in regular expression");
318             //writefln("Error: unmatched { in regular expression");
319             //assert(0);
320         } else return 1 + unescapedFindFirst(regstr, '}');
321       } else return 0;
322 }
323
324 int quantifiergreedinessConsumed(char [] regstr)
325 {
326   if (regstr.length == 0) return 0;
327   else if (regstr[0] == '?') return 1;
328     else return 0;
329 }
330
331 int quantifierpossessivenessConsumed(char [] regstr)
332 {
333   if (regstr.length == 0) return 0;
334   else if (regstr[0] == '+') return 1;
335     else return 0;
336 }
337
338 // The minimum allowable number of repetitions for this quantifier
339 uint quantifierMin(char [] regstr)
340 {
341   if (regstr[0] == '*' || regstr[0] == '?') return 0;
342   else if (regstr[0] == '+') return 1;
343     else {
344       assert (regstr[0] == '{') ;
345       return atoui(regstr[1..$]);
346   }
347 }
348
349 // The maximum allowable number of repetitions for this quantifier
350 uint quantifierMax(char [] regstr)
351 {
352   if (regstr[0] == '*' || regstr[0] == '+') return uint.max;
353   else if (regstr[0] == '?') return 1;
354     else if (regstr[0] == '{') {
355         if (unescapedFindFirst(regstr, ',') ==  - 1) // "{n}"
356           return quantifierMin(regstr);
357         else if (regstr[$ - 2] == ',') // "{n,}"
358             return uint.max;
359           else // "{n,m}"
360             return atoui(regstr[ 1 + unescapedFindFirst(regstr, ',') .. $]);
361       } else {
362         assert(0, "\nError: unsupported quantifier " ~ regstr);
363        
364   }
365 }
366
367 bool quantifierGreediness(char [] regstr)
368 {
369   if (regstr.length == 0) {
370     return true;
371   }
372   else if (regstr[0] == '?') {
373       return false;
374     } else {
375       return true;
376   }
377 }
378
379 bool quantifierPossessiveness(char [] regstr)
380 {
381   if (regstr.length == 0) {
382     return false;
383   }
384   else if (regstr[0] == '+') {
385       return true;
386     } else {
387       return false;
388   }
389 }
390
391 // find the index of the first |, or -1 if not found.
392 // ignores escaped items, and anything in parentheses.
393 int findUnion(char [] regstr, bool isx, int indx = 0, int numopenparens = 0)
394 {
395   int findUnionc;
396   if (indx >= regstr.length)
397     findUnionc =  - 1;
398   else if (numopenparens == 0 && regstr[indx] == '|')
399       findUnionc = indx;
400     else if (regstr[indx] == ')')
401         findUnionc = findUnion(regstr, isx, indx + 1, numopenparens - 1);
402       else if (indx + 2<regstr.length && regstr[indx..indx + 3] == "(?:")
403           findUnionc = findUnion(regstr, isx, indx + 3, numopenparens + 1);
404         else if (indx + 2<regstr.length && regstr[indx..indx + 3] == "(?>")
405             findUnionc = findUnion(regstr, isx, indx + 3, numopenparens + 1);
406           else if (indx + 2<regstr.length && regstr[indx..indx + 3] == "(?=")
407               findUnionc = findUnion(regstr, isx, indx + 3, numopenparens + 1);
408             else if (indx + 2<regstr.length && regstr[indx..indx + 3] == "(?!")
409                 findUnionc = findUnion(regstr, isx, indx + 3, numopenparens + 1);
410               else if (indx + 2<regstr.length && regstr[indx..indx + 3] == "(?<")
411                   findUnionc = findUnion(regstr, isx, indx + 3, numopenparens + 1);
412                 else if (indx + 2<regstr.length && regstr[indx..indx + 3] == "(?'")
413                     findUnionc = findUnion(regstr, isx, indx + 3, numopenparens + 1);
414   //      else if (indx + 2<regstr.length && regstr[indx..indx + 2] == "(?"
415         //         && regstr[3]>="1" && regstr[3]<="9")
416       //      findUnionc = findUnion(regstr, indx + 3, numopenparens + 1);
417                   else if (regstr[indx] == '(')
418                       findUnionc = findUnion(regstr, isx, indx + 1, numopenparens + 1);
419                     else if (regstr[indx] == '\\') // skip the escaped character
420                         findUnionc = findUnion(regstr, isx, indx + 2, numopenparens);
421                       else
422                       {
423                         if (regstr[indx] == '[')
424                         {
425                           int brsize = unescapedFindFirst(regstr[indx..$], ']');
426                           if (brsize == - 1)
427                             assert(0, "\nError: unmatched [ in regular expression:"~regstr);
428                           findUnionc = findUnion(regstr, isx, indx + 1 + brsize, numopenparens);
429                         }
430                         else
431                           if (isx && regstr[indx] == '#')
432                             findUnionc = findUnion(regstr, isx, indx + 1 + sizeOfComment(regstr[indx..$]), numopenparens);
433                           else
434                             findUnionc = findUnion(regstr, isx, indx + 1, numopenparens);
435   }
436   return findUnionc;
437 }
438
439 // keeps going until the number of ) parens equals the number of ( parens.
440 // All escaped characters are ignored.
441 // BUG: what about inside [-] ?
442 int parenConsumed(char [] regstr, int numopenparens = 0)
443 {
444   if (regstr.length == 0) {
445        // pragma(msg, "Unmatched parenthesis");
446     assert(0,"\nUnmatched parenthesis");
447        // assert(0);
448   } else if (regstr[0] == ')') {
449       if (numopenparens == 1) return 1; // finished!
450       else return 1 + parenConsumed(regstr[1..$], numopenparens - 1);
451     } else if (regstr.length>2 && regstr[0..3] == "(?:") {
452         return 3 + parenConsumed(regstr[3..$], numopenparens + 1);
453       } else if (regstr.length>2 && regstr[0..3] == "(?>") {
454           return 3 + parenConsumed(regstr[3..$], numopenparens + 1);
455         } else if (regstr.length>2 && regstr[0..3] == "(?=") {
456             return 3 + parenConsumed(regstr[3..$], numopenparens + 1);
457           } else if (regstr.length>2 && regstr[0..3] == "(?!") {
458               return 3 + parenConsumed(regstr[3..$], numopenparens + 1);
459             } else if (regstr.length>2 && (regstr[0..3] == "(?<" || regstr[0..3] == "(?'") ) {
460                 uint namesize = groupnameConsumed(regstr[3..$]);
461                 if (regstr.length == 3 + namesize || regstr[0..3] == "(?<" && regstr[3 + namesize] != '>'
462                     || regstr[0..3] == "(?'" && regstr[3 + namesize] != '\'')
463                 {
464                   if (regstr[0..3] == "(?<")
465                     assert(0,"\nClosing > missing in regular expression:"~regstr);
466                   else
467                     assert(0,"\nClosing ' missing in regular expression:"~regstr);
468                 }
469                 uint start = 3 + namesize + 1;
470                 return start + parenConsumed(regstr[start..$], numopenparens + 1);
471               } else if (regstr[0] == '(') {
472                   return 1 + parenConsumed(regstr[1..$], numopenparens + 1);
473                 } else if (regstr[0] == '[') {
474                     uint brsize = 1 + unescapedFindFirst(regstr, ']');
475                     return brsize + parenConsumed(regstr[brsize..$], numopenparens);;
476                   } else if (regstr[0] == '\\' && regstr.length>1)
477         // ignore \(, \).
478                       return 2 + parenConsumed(regstr[2..$], numopenparens);
479                     else
480                       return 1 + parenConsumed(regstr[1..$], numopenparens);
481 }
482
483 // the naked term, with no quantifier. Either an atom, or a subsequence.
484 int atomConsumed(char [] regstr)
485 {
486   int atomConsumedc;
487    //pragma(msg,"atom consumed " ~ regstr);
488   if (regstr.length>2 && regstr[0..3] == "(?:") atomConsumedc = parenConsumed(regstr);
489   else if (regstr.length>2 && regstr[0..3] == "(?>") atomConsumedc = parenConsumed(regstr);
490     else if (regstr.length>2 && regstr[0..3] == "(?=") atomConsumedc = parenConsumed(regstr);
491       else if (regstr.length>2 && regstr[0..3] == "(?!") atomConsumedc = parenConsumed(regstr);
492         else if (regstr.length>2 && regstr[0..3] == "(?<") atomConsumedc = parenConsumed(regstr);
493           else if (regstr.length>2 && regstr[0..3] == "(?'") atomConsumedc = parenConsumed(regstr);
494             else if (regstr[0] == '(') atomConsumedc = parenConsumed(regstr);
495               else if (regstr[0] == '[') atomConsumedc = 1 + unescapedFindFirst(regstr, ']');
496                 else if (regstr[0] == ')') {assert(0, "\nError: ) encountered without an opening ( in regular expression"~regstr);}
497                   else if (regstr[0] == ']') {assert(0, "\nError: ] encountered without an opening [ in regular expression"~regstr);}
498                     else if (regstr[0] == '\\') { // escape character
499                         if (regstr.length>1) {
500                           if (regstr[1] == 'k')
501                           {
502                             if (regstr.length>2)
503                             {
504                               if (regstr[2] == '<')
505                               {
506                                 uint namesize = groupnameConsumed(regstr[3..$]);
507                                 if (regstr.length == 3 + namesize || regstr[3 + namesize] != '>')
508                                 {
509                                   assert(0,"\nClosing > missing in regular expression"~regstr);
510                                 }
511                                 return atomConsumedc = 3 + namesize + 1;
512                               }
513                               if (regstr[2] == '\'')
514                               {
515                                 uint namesize = groupnameConsumed(regstr[3..$]);
516                                 if (regstr.length == 3 + namesize || regstr[3 + namesize] != '>')
517                                 {
518                                   assert(0,"\nClosing ' missing in regular expression"~regstr);
519                                 }
520                                 return atomConsumedc = 3 + namesize + 1;
521                               }
522                               assert(0,"\nError: \\k must be followed by 'name' or <name> in regular expression"~regstr);
523                             }
524                             else
525                               assert(0, "\nError: \\k must be followed by 'name' or <name> in regular expression"~regstr);
526                           }
527                           else
528                             atomConsumedc = 2;
529                         } else {
530                           assert(0, "\nError: Regexp must not end with \\ in regular expression"~regstr);
531            // writefln("Error: Regexp must not end with \\ ");
532            // assert(0);
533                         }
534                       } else if (regstr[0] == '@') { // NONSTANDARD: referenced parameter
535                           atomConsumedc = 2;
536                         } else atomConsumedc = 1; // match single char
537   return atomConsumedc;
538 }
539
540 int groupnameConsumed(char [] regstr)
541 {
542   int pp = 0;
543   if (regstr.length>0 && (
544                           regstr[pp] >= 'a' && regstr[pp] <= 'z' || regstr[pp] >= 'A' && regstr[pp] <= 'Z'))
545   {
546     while (regstr.length>0 && (regstr[pp] >= 'a' && regstr[pp] <= 'z'
547                                || regstr[pp] >= 'A' && regstr[pp] <= 'Z'
548                                || regstr[pp] >= '0' && regstr[pp] <= '9'
549                                ))
550     {
551       pp++;
552      
553     }
554     return pp;
555   }
556   else
557     assert(0, "\nError: Name of the group is missing after (?< in regular expression"~regstr);
558   return 0;
559 }
560
561 int atomCharacterConsumed(char [] regstr, bool isx, out int whitespaceno)
562 {
563   int atomConsumedc;
564  // if (options["x"])
565  // {
566   whitespaceno = 0;
567  
568   if (regstr[0] ==' ' || regstr[0] == '\t' || regstr[0] == '\n')
569   {
570     whitespaceno = 1;
571   }
572   if (isx && regstr[0] == '#')
573   return 0;
574  // }
575    //pragma(msg,"atom consumed " ~ regstr);
576   if (regstr[0] == '\\') { // escape character
577     if (regstr.length>1) {
578       if (!((regstr[1] >= '0' && regstr[1] <= '9') || regstr[1] == 's'
579             || regstr[1] == 'd' || regstr[1] == 'w'
580             || regstr[1] == 'S' || regstr[1] == 'k'
581             || regstr[1] == 'D' || regstr[1] == 'W'
582             || regstr[1] == 'A' || regstr[1] == 'z' ))
583         atomConsumedc = 2;
584       else
585         atomConsumedc = 0;
586     } else {
587       assert(0, "\nError: Regexp must not end with \\");
588            // writefln("Error: Regexp must not end with \\ ");
589            // assert(0);
590     }
591   } else if (regstr[0] == '@' || regstr[0] == '$' || regstr[0] == '^'
592              || regstr[0] == '.' || regstr[0] == '[' || regstr[0] == '(' ) { // NONSTANDARD: referenced parameter
593       atomConsumedc = 0;
594     } else
595       atomConsumedc = 1; // match single char
596   return atomConsumedc;
597 }
598
599 // parses a term from the front of regstr (which must not be empty).
600 // consisting of an atom, optionally followed by a quantifier.
601 int termConsumed(char [] regstr)
602 {
603  /*   int atomC = atomConsumed(regstr);
604     int quantifierC= quantifierConsumed(regstr[atomC..$]);
605     int quantifiergreedinessC=quantifiergreedinessConsumed(regstr[atomC + quantifierC ..$]);
606     int termConsumed = atomC + quantifierC+quantifiergreedinessC;
607     return termConsumed;*/
608   uint ac = atomConsumed(regstr);
609   return ac +
610   quantifierConsumed(regstr[ac..$]) +
611   quantifiergreedinessConsumed(regstr[ac + quantifierConsumed(regstr[ac..$]) ..$]) +
612   quantifierpossessivenessConsumed(regstr[ac + quantifierConsumed(regstr[ac..$]) ..$]);
613 }
614
615 //parses a character sequence without quantifiers
616 int characterSequenceConsumed(char []regstr, bool [char[]] options, out int realchars)
617 {
618   realchars = 0;
619   if (regstr.length == 0)
620   {
621     return 0;
622   }
623   int whitespaceno;
624   int atomC = atomCharacterConsumed(regstr, options["x"], whitespaceno);
625   if (atomC>0)
626   {
627     int quantifierC = quantifierConsumed(regstr[atomC..$]);
628     if (quantifierC == 0)
629     {
630       if (options["x"] && whitespaceno>0)
631       {
632       }
633       else
634       {
635         realchars = 1;
636       }
637       int crealchars;
638       atomC += characterSequenceConsumed(regstr[atomC..$], options, crealchars);
639       realchars += crealchars;
640     }
641     else
642     {
643       atomC = 0;
644     }
645   }
646   return atomC;
647 }
648
649 //---------------------------------------------------------------------
650 // Part II: mixins which generate the final code
651 //---------------------------------------------------------------------
652 // Unlike most regexp engines, which turn the pattern string into a table-based state machine,
653 // this one generates a binary tree of nested functions. Each node in the tree corresponds to
654 // a D template, and is generated as a mixin.
655
656 // At compile time, each ctfe is passed a subset of a regexp string.
657 // It generates a member function bool aname(), which updates a pointer p,
658 // and returns true if a match was found.
659
660 // Each ctfe has access to the following values:
661 // At compile time:
662 //     fullpattern -- the complete, unparsed regular expression string
663 // At run time:
664 //     searchstr (read only) -- the string being searched
665 //     p --- the first character in searchstr which is not yet matched.
666 //     param[0..8] -- the quasi-static parameter strings @1..@9 to match.
667
668 // Additional variables or constants can be added as desired.
669
670 // Most of the complexity in the regexp engine comes from the optional quantifiers.
671 // In general, they can only determine how far to match by testing if the entire remainder
672 // of the pattern can be matched.
673 //
674 // Each ctfe also recieves a function 'next'. This has a member bool fn() which
675 // returns true if the remainder of the regexp match is successful.
676 // All regexps must ensure that next is called.
677
678 // Note that unless p is reset to 0, it will automatically behave as a global search,
679 // continuing from the last place it left off.
680
681
682 int findOptions(char [] pattern)
683 {
684   for (int i = 0;i<pattern.length;i++)
685   {
686     if (pattern[pattern.length - 1 - i..pattern.length - i] == "/" )
687     {
688       if (pattern.length - 2 - i >= 0 && pattern[pattern.length - 2 - i] == '\\')
689       {
690         return pattern.length;
691       }
692       return pattern.length - i - 1;
693     }
694   }
695   return pattern.length;
696 }
697 char[] removeGroupCode(char[] code)
698 {
699   char[] code2;
700   for (int p = 0;p<code.length - 10;p++)
701   {
702     if (code[p..p + 9] == "//Regexp:")
703     {
704       while (code[p] != 10 && code[p] != 13)
705       {
706         code2~=code[p];
707         p++;
708       }
709     }
710     if (code[p..p + 10] == "/*gstart*/")
711     {
712       p += 10;
713       while (code[p..p + 8] != "/*gend*/")
714       {
715         p++;
716       }
717       p += 7;
718     }
719     else
720       code2~=code[p];
721   }
722   code2~=code[code.length - 10..code.length];
723   return code2;
724 }
725
726 char[] parseRegexp(char [] pattern, bool getcode = false)
727 {
728   char[] endSequence = alwaysTrue() ;
729   int groupno = 0;
730   char[] pattern2;
731   if (pattern.length>0 && pattern[0] == '/' )
732   {
733     pattern2 = pattern[1..$];
734   }
735   else
736   {
737     pattern2 = pattern;
738   }
739   int opt = findOptions(pattern2);
740   bool [char[]] options = ["i":false, "x":false, "s":false, "m":false];
741    // options["i"]=false;
742  
743   if (opt != pattern2.length)
744   {
745     foreach(c;pattern2[opt + 1..$])
746     {
747       options[""~c] = true;
748     }
749     //opt--;
750 //      assert(0,pattern~tostring(opt));
751   }
752   int globalfuncno;
753   char[][] groupnames;
754   char[] groupdcl;
755   char[] code = "//Regexp:"~toLiteralString(pattern2)~ "\n"~
756   endSequence~regSequence("engine", groupno, pattern2[0..opt],
757                           options, "next_alwaystrue", groupnames, groupdcl, globalfuncno);
758   char[] decl;
759   /*for (int i = 1;i <= groupno;i++)
760   {
761     decl~="int bracketend"~tostring(i)~"=-1;\n";
762   }*/
763   if (groupno >= 1)
764     decl~="int bracketend["~tostring(groupno + 1)~"]=-1;\n";
765  
766  
767  // if (groupnames.length>0)
768   {
769     decl~="struct groupnamerec {\n"~groupdcl;
770  /* foreach (gname; groupnames.keys)
771   {
772     
773   }*/
774   /*foreach (gname,gno; groupnames)
775   {
776     decl~="uint "~gname~"="~tostring(gno)~";\n";
777   }*/
778     decl~="}\ngroupnamerec groupname;\n";
779   }
780   if (getcode && groupdcl.length>5)
781   for (int i = 0;i<groupdcl.length - 5;i++)
782   {
783     if (groupdcl[i..i + 5] =="uint ")
784     {
785      
786       for (int j = i + 5;j<groupdcl.length;j++)
787       {
788         if (groupdcl[j] == '=')
789         {
790           decl~="char[] get"~
791           groupdcl[i + 5..j]~"(){ return _(groupname."~groupdcl[i + 5..j]~");}\n";
792          
793         }
794       }
795     }
796   }
797   //assert(0,"at the end , the groupno is "~tostring(groupno));
798   if (groupno == 0) //remove group related code
799   {
800     code = removeGroupCode(code);
801   }
802   return decl~code;
803 }
804
805 char[] alwaysTrue() // used to mark the end of a sequence
806 {
807   return "bool next_alwaystrue () { return true; }\n";
808 }
809 private struct retSequence
810 {
811   char[] code;
812   int groupno;
813 } 
814
815 bool getFirstChar(char [] regstr,char c)
816 {
817  return false; 
818 }  
819
820
821 // regstr is a sequence of productions, possibly containing a union
822 char[] regSequence(char fnname[], ref int groupno, char [] regstr,
823                    bool[char[]] options, char[] next,
824                    ref char[][] groupnames, ref char[] groupdcl, ref int globalfuncno)
825 {
826   char [] code = "";
827   int fu = findUnion(regstr, options["x"]);
828    //Stdout("regSequence:"~regstr).newline;
829   if (fu ==  - 1) {
830         // No unions to worry about
831    
832     code = regNoUnions(fnname, groupno, regstr, options, next, groupnames, groupdcl, globalfuncno);
833   } else {
834     int[][] cases;
835     bool[] validcase;
836     for (int j=0;j<256;j++)
837     {
838    validcase~=false;
839    cases~=[];
840  }
841     int[] altgroupno;
842     int altno = 1;
843     int tofu;
844     tofu = - 1;
845     int newfu = fu;
846     char[] controllercode;
847     int nocases = 0;
848     altgroupno~=0;//dummy
849        controllercode~= "
850             int oldp = p;\n";
851               code ~="
852         bool "~ fnname~ "() { //regSequence options
853           
854           ";
855     do
856     {
857       fu = tofu + 1;
858       tofu = newfu+tofu + 1;
859       char c;
860       bool firstchar = false;
861       nocases++;
862       altgroupno~=groupno;
863      // Stdout.format("{} {}\n",fu,tofu);
864       if (getFirstChar(regstr[fu..tofu], c))
865       {
866         cases[c]~=altno;
867         validcase[c] = true;
868       }
869       else
870       { // wrap it up
871         if (nocases<4) // 2+ 1
872         {
873           for (int j = nocases - 1;j >= 1 ;j--)
874           {
875             controllercode~= "
876             if (option"~tostring(altno - j)~"()) return true;
877             p = oldp;/*gstart*/
878             group.length="~tostring(altgroupno[altno - j] + 1)~";/*gend*/
879                  ";
880           }
881         }
882         else //more than 2 consecutive cases go into a switch
883         {
884           controllercode~="
885                         if (searchstr.length>p)
886                         switch(searchstr[p]) {";
887           for (int cc = 0;cc<256 ;cc++)
888           {
889             if (validcase[cc])
890             {
891               controllercode~="case '"~"':";
892               for (int j = 0;j<cases[cc].length;j++)
893               {
894                 controllercode~= "
895             if (option"~tostring(cases[cc][j])~"()) return true;
896             p = oldp;/*gstart*/
897             group.length="~tostring(altgroupno[cases[cc][j]] + 1)~";/*gend*/
898                  ";
899                
900               }
901               controllercode~="break;\n";
902              
903              validcase[cc]=false;
904                          cases[cc]=[]; 
905             }
906           }
907           controllercode~="default;\n";
908         } ;
909         nocases=0;
910         controllercode~= "
911             if (option"~tostring(altno)~"()) return true;
912             p = oldp;/*gstart*/
913             group.length="~tostring(altgroupno[altno] + 1)~";/*gend*/
914                  ";
915                
916       }
917      
918      
919       code ~=regSequence("option"~tostring(altno), groupno, regstr[fu..tofu],
920                          options, next, groupnames, groupdcl, globalfuncno);
921      
922       altno++;
923      
924       newfu = findUnion(regstr[tofu + 1..$], options["x"]);
925     } while (newfu != - 1);
926    
927     code~=
928     regSequence("option"~tostring(altno), groupno, regstr[tofu + 1..$], options, next, groupnames, groupdcl, globalfuncno) ;
929         controllercode~="
930             if (option"~tostring(altno)~"()) return true;
931             p = oldp;/*gstart*/
932             group.length="~tostring(groupno + 1)~";/*gend*/
933            // writefln(\"regSequence\",~"~toLiteralString(regstr)~");
934             return false;
935         }";
936     code~=controllercode;
937   }
938   return code;
939 }
940
941 int countGroups(char [] regstr)
942 {
943   if (regstr.length == 0)
944   {
945     return 0;
946   }
947   else if (regstr.length >2 && regstr[0..3] == "(?:")
948     {
949       return 0;
950     }
951     else if (regstr.length >2 && regstr[0..3] == "(?>")
952       {
953         return 0;
954       }
955       else if (regstr.length >2 && regstr[0..3] == "(?=")
956         {
957           return 0;
958         }
959         else if (regstr.length >2 && regstr[0..3] == "(?!")
960           {
961             return 0;
962           }
963           else if (regstr[0] == '(')
964             {
965               return 1;
966             }
967             else
968             {
969               return 0;
970   }
971 }
972
973 int findLastLetter(char[] fnname)
974 {
975   int i = 0;
976   foreach (c;fnname)
977   {
978     if (!((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z')))
979     {
980       return i;
981     }
982     i++;
983   }
984   return i;
985 } 
986
987 // regstr is a sequence of terms, all of which must be matched
988 // Does not contain any unions
989 char[] regNoUnions(char[] fnname, ref int groupno, char [] regstr,
990                    bool [char []] options, char[] next,
991                    ref char[][] groupnames, ref char[] groupdcl, ref int globalfuncno)
992 {
993   char[] code;
994   int skipcs;
995   int skip;
996    // assert(0,"no way "~regstr);
997   //  writefln("regNoUnions "~regstr);
998   //if (regstr.length>0)
999  // {
1000   if (regstr == "")
1001   {
1002     return "bool "~fnname~"(){ return "~next~"();}";
1003   }
1004   code =
1005   "//regNoUnions "~toLiteralString(regstr[0..termConsumed(regstr)]) ~"\n";
1006  // }
1007   if (options["x"])
1008   {
1009     int i = 0;
1010     while (i<regstr.length && (regstr[i] ==' ' || regstr[i] == '\t' || regstr[i] == '\n'))
1011     {
1012       i += 1;
1013     }
1014     if (i<regstr.length && regstr[i] == '#')
1015     {
1016       int soc = sizeOfComment(regstr[i..$]);
1017       if (soc>0)
1018         i += 1 + soc;
1019     }
1020     if (i>0)
1021     {
1022       if (i<regstr.length)
1023       {
1024         return regNoUnions(fnname, groupno, regstr[i..$], options, next, groupnames, groupdcl, globalfuncno);
1025       }
1026       else
1027       {
1028         return "bool "~fnname~"(){ return "~next~"();}";
1029       }
1030     }
1031   }
1032  //assert((regstr.length ==0),"regstr.length cannot be zero " ~ regstr );
1033   if (regstr.length == termConsumed(regstr)) {
1034         // there's only a single item (possibly including a quantifier)
1035        // pragma(msg, "\nhere at the moment3");
1036     code~= regTerm(fnname, groupno, regstr, options, next, groupnames, groupdcl, globalfuncno);
1037     //  pragma(msg, "\nhere at the moment4");
1038   } else {
1039     int realskipcs;
1040     skipcs = characterSequenceConsumed(regstr, options, realskipcs);
1041     if (regstr.length == skipcs)
1042     {
1043       code~= regCharacterSequence(fnname, groupno, regstr, options, next);
1044     }
1045     else
1046     {
1047    // writefln(skipcs);
1048     //int skip;
1049       char[] second;
1050       if (realskipcs>1)
1051       {
1052         skip = skipcs;
1053        
1054       }
1055       else
1056       {
1057         skip = termConsumed(regstr);
1058       }
1059 //    int g=groupno+countGroups(regstr);
1060       int g = groupno;
1061       char[] newnextname = "next"~tostring(globalfuncno);
1062       globalfuncno++;
1063       if (realskipcs>1)
1064       {
1065     //   assert(0,"hi"~tostring(skip)~regstr);
1066         second = regSequence(newnextname, groupno, regstr[skip..$],
1067                              options, next, groupnames, groupdcl, globalfuncno);
1068         code~= "bool "~fnname~"() { 
1069                       
1070             "~second~regCharacterSequence("first", g, regstr[0..skip], options, newnextname)~" // regTerm
1071             return first();
1072         }
1073       ";
1074       }
1075       else
1076       {
1077         int oldgroupno = groupno;
1078         int glfno = 0;
1079         char[][] groupnamesdummy;
1080         char[] groupdcldummy;
1081         //regTerm("", groupno, regstr[0..skip], options, "", groupnames,groupdcl,glfno); //just to get groupno
1082         char[] first = regTerm("first", groupno, regstr[0..skip], options, newnextname, groupnames, groupdcl, globalfuncno);
1083         second = regSequence(newnextname, groupno, regstr[skip..$],
1084                              options, next, groupnames, groupdcl, globalfuncno);
1085         code~="
1086         bool "~fnname~"() {           
1087             "~second~first~" // regTerm
1088             return first();
1089         }
1090       ";
1091       }
1092     }
1093   }
1094   return code;
1095 }
1096 char[] regStop(char[] fnname, int groupno, char[] next)
1097 { 
1098   return "
1099      bool "~fnname~"() { //regStop
1100      ""bracketend["~tostring(groupno)~"]=p;
1101      return "~next~"();   
1102      }";
1103 }
1104
1105 char[] regCharacterSequence(char[] fnname, int groupno, char [] regstr, bool [char []] options, char[] next)
1106 {
1107   int i = 0;
1108   int matchsize = 0;
1109   char [] match = "";
1110   char[] code;
1111   while (i<regstr.length)
1112   {
1113     if (regstr.length + i>1 && regstr[i] == '\\') {
1114       match~=toLiteralString(regstr[i + 1]);
1115       i += 2;
1116       matchsize++;
1117     } else {
1118       if (options["x"] && (regstr[i] ==' ' || regstr[i] == '\t' || regstr[i] == '\n'))
1119       {
1120       }
1121       else
1122       {
1123         match~=toLiteralString(regstr[i]);
1124         matchsize++;
1125       }
1126       i++;
1127         // match single character
1128     }
1129    
1130   }
1131   if (!options["i"])
1132   {
1133   // writefln("match "~match);
1134     code ="bool "~fnname~"() {             
1135                 if (p+"~tostring(matchsize)~">searchstr.length || searchstr[p..p+"~tostring(matchsize)~"]!=\""~match~"\") return false;
1136                 p+="~tostring(matchsize)~";
1137                 return "~next~"();
1138             }
1139           ";
1140   }
1141   else
1142   {
1143     code ="bool "~fnname~"() {
1144                 if (p+"~tostring(matchsize)~">searchstr.length || icmp(searchstr[p..p+"~tostring(matchsize)~"],\""~match~"\")!=0) return false;
1145                 p+="~tostring(matchsize)~";
1146                 return "~next~"();
1147             }
1148           ";
1149   }
1150   return code;
1151 }
1152
1153
1154 // the term without a quantifier. Here we deal with embedded subsequences.
1155 char[] regSingleTerm(char[] fnname, ref int groupno, char [] regstr, bool [char []] options,
1156                      char[] next, ref char[][] groupnames, ref char[] groupdcl, int globalfuncno)
1157 {
1158   char[] code;
1159  
1160   if (regstr.length>2 && regstr[0..3] == "(?:") {
1161             // A sequence always calls next.
1162     code = regSequence(fnname, groupno, regstr[3..$ - 1], options, next, groupnames, groupdcl, globalfuncno);
1163   }
1164   else if (regstr.length>2 && regstr[0..3] == "(?>") {
1165             // A sequence always calls next.
1166       code = regSequence("independent", groupno, regstr[3..$ - 1],
1167                          options, "next_alwaystrue",
1168                          groupnames, groupdcl, globalfuncno);
1169       code ="bool "~fnname~"() {\n"~code ;
1170       code~="return independent() && "~next~"();}\n";
1171     }
1172     else if (regstr.length>2 && regstr[0..3] == "(?=") {
1173             // A sequence always calls next.
1174         code = regSequence("lookahead", groupno, regstr[3..$ - 1], options,
1175                            "next_alwaystrue", groupnames, groupdcl, globalfuncno);
1176         code ="bool "~fnname~"() {\n"~code ;
1177         code~=" int oldp=p;
1178               if (!lookahead()) return false;
1179               p=oldp;
1180                       return "~next~"();           
1181             }\n";
1182       }
1183       else if (regstr.length>2 && regstr[0..3] == "(?!") {
1184             // A sequence always calls next.
1185           code = regSequence("negativelookahead", groupno, regstr[3..$ - 1], options,
1186                              "next_alwaystrue", groupnames, groupdcl, globalfuncno);
1187           code ="bool "~fnname~"() {\n"~code ;
1188           code~="   int oldp=p;
1189               if (negativelookahead()) return false;
1190               p=oldp;
1191                       return "~next~"();           
1192             }\n";
1193         }
1194         else if (regstr[0] == '(' || regstr.length>2 && (regstr[0..3] == "(?<" || regstr[0..3] == "(?'")) {
1195             // A sequence always calls next.
1196             groupno++;
1197             char[] stop = regStop("stop"~tostring(groupno), groupno, next);
1198             char[] bracketvar = "bracketend["~tostring(groupno)~"]";
1199             int cgroupno = groupno;
1200             uint seqstart = 1;
1201             char [] setgroupname;
1202             char[] savegroupnamevalue;
1203             char[] restoregroupnamevalue;
1204             if (regstr.length>2 && (regstr[0..3] == "(?<" || regstr[0..3] == "(?'"))
1205             {
1206               char[] name = regstr[3..3 + groupnameConsumed(regstr[3..$])];
1207          //groupnames[name]=groupno; does not work as ctfe
1208               int groupexists;
1209               groupexists = getGroupno(groupnames, name);
1210              
1211          /*if (groupdcl.length>5)
1212          for (int i=0;i<groupdcl.length-5;i++)
1213          {
1214              if (groupdcl[i..i+5]=="uint ")
1215              {
1216                 
1217                  for (int j=i+5;j<groupdcl.length;j++)
1218                  {
1219                      if (groupdcl[j]=='=')
1220                      {
1221                         if (groupdcl[i+5..j]==name)
1222                         {
1223                           groupexists=true;
1224                           break;
1225                         } 
1226                      } 
1227                     }   
1228              }
1229          }*/
1230               seqstart = 3 + groupnameConsumed(regstr[3..$]) + 1;
1231               setgroupname = "groupname."~name~"="~tostring(groupno)~";\n";
1232               savegroupnamevalue ="int tempgroupname=groupname."~name~";\n";
1233               restoregroupnamevalue = "groupname."~name~"=tempgroupname;\n";
1234               if (groupexists == - 1)
1235               {groupdcl~="uint "~name~"="~tostring(groupno)~";\n";
1236                   //if (groupnames.length<=groupno)
1237                   //  groupnames.length= groupno;
1238                   //groupnames[groupno]=name;
1239                 groupnames~=name~"="~tostring(groupno);
1240               }
1241             }
1242            
1243            
1244             code =" //single term
1245             bool "~fnname~"() { // (
1246              "~stop~"
1247               //int "~bracketvar~";
1248             "~regSequence("a", groupno, regstr[seqstart..$ - 1], options, "stop"~tostring(groupno), groupnames, groupdcl, globalfuncno)~ "
1249               int oldp=p;
1250               if (group.length<="~tostring(cgroupno)~")
1251               {
1252                  group.length="~tostring(cgroupno + 1)~";
1253               }           
1254               group["~tostring(cgroupno)~"]=grouprec(oldp,-1);
1255               "~savegroupnamevalue~"
1256               "~setgroupname~"
1257               bool r=a();
1258               if (r) 
1259               {             
1260               group["~tostring(cgroupno)~"]=grouprec(oldp,"~bracketvar~");
1261            //   writefln(\"grouprec1 \",oldp,\" \",p);
1262               }
1263               else
1264               {
1265               "~restoregroupnamevalue~"
1266               group.length="~tostring(cgroupno)~"; 
1267               } 
1268              return r;
1269               }
1270             ";
1271           } else {
1272             // A simple atom doesn't call next, so we need to do it here.
1273             code ="bool "~fnname~"() {
1274                 "~regAtom(groupno, regstr, groupnames, options) ~ "
1275                 return fn() && "~next~"();
1276             }
1277           ";
1278   }
1279   return code;
1280 }
1281
1282 // Evaluate one term (without quantifier).
1283 // This helper class has two purposes:
1284 // (1) to restore the 'p' pointer when we return.
1285 // (2) ensure that at least one character was consumed
1286 char[] regSequenceDontUpdateP(char[] fnname, int groupno, char [] regstr, bool [char [] ] options, ref int globalfuncno)
1287 {
1288   //int globalfuncno=0;
1289   char[][] groupnames; //dummy
1290   char[] groupdcl;
1291   return "bool "~fnname~"() { //regSequenceDontUpdateP
1292         " ~regSequence("x", groupno, regstr, options, "next_alwaystrue",
1293                        groupnames, groupdcl, globalfuncno)~"
1294         // It's only a successful match if _something_ was consumed
1295         if (p==theinitialp) return false;     
1296         int oldp = p;
1297         if (!x()) return false;
1298         p = oldp; 
1299         return true;               
1300     }
1301   ";
1302 }
1303
1304 //  Calls the naked term twice, but only updates 'p' after the first one.
1305 // Evaluate the term, knowing that what comes after will be the same as this.
1306 char[] regTermTwice(char[] fnname, int groupno, char [] regstr, bool [char[]] options, int t, ref int globalfuncno)
1307 {
1308   char[] code;
1309   char[] groupdcl;
1310   char[][] groupnames; //dummy
1311   if (regstr.length>2 && (regstr[0..3] == "(?:" || regstr[0..3] == "(?>"
1312                           || regstr[0..3] == "(?=" || regstr[0..3] == "(?!")) {
1313    
1314     char [] suddendeath = regSequenceDontUpdateP("suddendeath", groupno, regstr[3..t - 1],
1315                                                  options, globalfuncno); //groupno may be incorrect but does not matter
1316    
1317     code ="
1318         bool "~fnname~"()
1319         {
1320             // While evaluating this first sequence, if this is a sequence
1321             // that potentially has zero length (ie, everything is a *, ? or {m,} term),
1322             // each term should attempt to consume at least one character if possible.
1323             int theinitialp = p; 
1324          " ~suddendeath ~regSequence("a", groupno, regstr[3..t - 1], options, "suddendeath",
1325                                      groupnames, groupdcl, globalfuncno)~
1326     "return a();
1327        }
1328        ";
1329   }
1330   else if (regstr[0] == '(') {
1331       char []suddendeath = regSequenceDontUpdateP("suddendeath", groupno + 1, regstr[1..t - 1], options, globalfuncno);
1332       int g = groupno + 1;
1333       code ="
1334         bool "~fnname~"()
1335         {
1336             // While evaluating this first sequence, if this is a sequence
1337             // that potentially has zero length (ie, everything is a *, ? or {m,} term),
1338             // each term should attempt to consume at least one character if possible.
1339             int theinitialp = p;
1340            "~suddendeath~ regSequence("a", g, regstr[1..t - 1], options, "suddendeath",
1341                                       groupnames, groupdcl, globalfuncno)~ "
1342              int oldp=p;             
1343               bool r=a(); /*gstart*/
1344               if (group.length<="~tostring(groupno + 1)~")
1345               {
1346                  group.length="~tostring(groupno + 2)~";
1347               }
1348               group["~tostring(groupno + 1)~"]=grouprec(oldp,-1);
1349               if (r)
1350               {
1351           //    writefln(\"grouprec \",oldp,\" \",p);
1352               }
1353               else
1354               {
1355               group.length="~tostring(groupno + 1)~"; 
1356               } /*gend*/
1357              return r;
1358             //return a.fn();
1359        }
1360        ";
1361     } else {
1362       code ="
1363             bool "~fnname~"() {
1364                 // It's easy with atoms, because we know they always eat something.
1365                 // BUG: Maybe this will fail when null @n strings are passed in?               
1366                 "~regAtom(groupno, regstr, groupnames, options)~"
1367                 return fn();
1368             }
1369     ";
1370   }
1371   return code;
1372 }
1373
1374
1375 // the atom, optionally followed by a quantifier.
1376 // Here we deal with all kinds of repitition,
1377 // but we make different optimisations depending on the allowable repeats.
1378 char[] regTerm(char[] fnname, ref int groupno, char [] regstr, bool [char []] options,
1379                char[] next, ref char[][] groupnames, ref char[] groupdcl, ref int globalfuncno)
1380 {
1381   char [] code;
1382   if (atomConsumed(regstr) == regstr.length) {
1383    
1384             // there is no quantifier, just use the naked term
1385     code = regSingleTerm(fnname, groupno, regstr, options, next, groupnames, groupdcl, globalfuncno);
1386   } else {
1387     int t = atomConsumed(regstr);
1388     uint qc = quantifierConsumed(regstr[t..$]);
1389     uint repmin = quantifierMin(regstr[t..$]);
1390     uint repmax = quantifierMax(regstr[t..$]);
1391     uint greedy = quantifierGreediness(regstr[t + qc..$]);
1392     uint possess = quantifierPossessiveness(regstr[t + qc..$]);
1393     uint cg = countGroups(regstr);
1394    
1395     code ="
1396         bool "~fnname~"(){
1397           
1398             
1399             // HORRENDOUSLY inefficient! In some cases, we generate the quantified term THREE TIMES!
1400             // The first one contains the rest of the search tree.
1401             // This is used when we think we can do (atom).(next) for an early exit
1402             "~regTerm("atomAndNext", groupno, regstr[0..t], options, next, groupnames, groupdcl, globalfuncno) ~ "
1403
1404         //    debug writefln(fullpattern, \" Quantifier \",regstr ,  \" starting at \", searchstr[p..$]);
1405             ";
1406     if (possess)
1407     {
1408       code~=regTerm("atom", groupno, regstr[0..t], options, "next_alwaystrue", groupnames, groupdcl, globalfuncno);
1409     }
1410     if (repmin == 0 && repmax == 1) {
1411       code~="
1412                 // \"?\", or \"{0,1}\". Worth optimising seperately
1413                 int oldp = p;
1414                 ";
1415       if (possess)
1416       {
1417         code~=" // possesive
1418                 if (atom())
1419                   {
1420                   if (!"~next~"())
1421                     return false;
1422                    return true;
1423                   }
1424                 p = oldp;
1425                 if ("~next~"())
1426                 {
1427                   return true;       
1428                 } 
1429                 p = oldp;
1430                 return false; 
1431                 }";
1432       }
1433       else
1434         if (!greedy)
1435         {
1436           if (next == "next_alwaystrue")
1437           {
1438             code~="return true;}";
1439           }
1440           else
1441             code~="if ("~next~"()) { return true; }
1442                 p = oldp;
1443                 return atomAndNext();
1444                 }";
1445         }
1446         else //greedy
1447         {
1448           if (next == "next_alwaystrue")
1449           {
1450             code~="// pragma(msg, \"?greedy\");
1451                 if (atomAndNext())
1452                   {
1453                     return true;
1454                   }
1455                 p = oldp;
1456                 return true;                     
1457                 }";
1458           }
1459           else
1460             code~=" // pragma(msg, \"?greedy\");
1461                 if (atomAndNext())
1462                   {
1463                     return true;
1464                   }
1465                 p = oldp;
1466                 if ("~next~"())
1467                 {
1468                   return true;       
1469                 } 
1470                 p = oldp;
1471                 return false; 
1472                 }
1473               ";
1474       }
1475     } else {
1476       code~="
1477                 // Here's where we generate the redundant term.
1478                 // If we can't do (atom).(next), we must be able to do
1479                 // (atom).(atom) to stay in the game.                       
1480                   "~regTermTwice("atomonly", groupno, regstr[0..t], options, t, globalfuncno) ;
1481       if (repmin == 0 && repmax == uint.max) {
1482                     // optimise for \"*\", \"{0,}\"
1483         if (cg>0)
1484         {
1485           assert(0,"\nError:use of * or {0,} on any and this capturing group "~regstr[0..t]~" is not supported in regular expression"~regstr);
1486         }
1487         if (possess) // * or {0,}
1488         {
1489           code~=" // possessive
1490                      int oldp;
1491                      int veryoldp=p;
1492                      int newp=-1;           
1493                    // p=oldp;
1494                     do {
1495                       // Can we do (atom)?
1496                       oldp = p;
1497                       if (atom())
1498                       { newp=p;
1499                       }         
1500                      } while (p != oldp);
1501                      if (newp!=-1)
1502                      {
1503                       p=newp;
1504                             
1505                       return "~next~"();
1506                      } //nothing is matched
1507                     p = veryoldp;
1508                     if ("~next~"()) return true; // success but we want longer ones
1509                     return false;
1510                     }";
1511         }
1512         else if (!greedy)
1513           {
1514            
1515             if (next == "next_alwaystrue")
1516               code~=" // optimise for non-greedy\"*\", \"{0,}\"                 
1517                     return true; // We can finish right now.                 
1518                    }
1519                      ";
1520             else
1521               code~=" // optimise for non-greedy\"*\", \"{0,}\"
1522                     int oldp=p;
1523                     if ("~next~"()) return true; // We can finish right now.
1524                     p=oldp;
1525                     do {
1526                       // Can we do (atom).(next) ?
1527                       oldp = p;
1528                       if (atomAndNext()) { return true; }
1529                       p = oldp;
1530                       // We need to do (atom).(atom) to have any chance of continuing.                 
1531                       // also, it must have consumed at least one character, or there is no hope.                 
1532                     } while (atomonly() && p != oldp);
1533                     return false;
1534                     }
1535                      ";
1536           }
1537           else // optimise for greedy\"*\", \"{0,}\"
1538           {
1539             //atom cannot contain capturing groups
1540            
1541             if (next == "next_alwaystrue")
1542               code~=" // greedy
1543                      int oldp;
1544                      int veryoldp=p;
1545                      int newp=-1;           
1546                    // p=oldp;
1547                     do {
1548                       // Can we do (atom).(next) ?
1549                       oldp = p;
1550                       if (atomAndNext())
1551                       { newp=p;
1552                       }
1553                       p = oldp;
1554                       // We need to do (atom).(atom) to have any chance of continuing.                 
1555                       // also, it must have consumed at least one character, or there is no hope.                 
1556                     } while (atomonly() && p != oldp);
1557                      if (newp!=-1)
1558                      {
1559                       p=newp;
1560                             
1561                       return true;
1562                      }
1563                     p = veryoldp;
1564                     return true; // success but we want longer ones               
1565                     }";
1566             else
1567               code~=" // greedy
1568                      int oldp;
1569                      int veryoldp=p; /*gstart*/   
1570                      grouprec [] savegroups; /*gend*/
1571                      int newp=-1;           
1572                    // p=oldp;
1573                     do {
1574                       // Can we do (atom).(next) ?
1575                       oldp = p;
1576                       if (atomAndNext())
1577                       { newp=p; /*gstart*/
1578                         int sl=group.length-("~tostring(groupno + 1)~");
1579                         if (sl>0)
1580                         {
1581                         savegroups.length=sl; 
1582                         savegroups[0..$]=group["~tostring(groupno + 1)~"..$];
1583                         }   
1584                         group.length="~tostring(groupno + 1)~"; /*gend*/
1585                       }
1586                       p = oldp;
1587                       // We need to do (atom).(atom) to have any chance of continuing.                 
1588                       // also, it must have consumed at least one character, or there is no hope.                 
1589                     } while (atomonly() && p != oldp);
1590                      if (newp!=-1)
1591                      {
1592                       p=newp; /*gstart*/
1593                       group.length="~tostring(groupno + 1)~"+savegroups.length;
1594                       if (savegroups.length>0)
1595                       {
1596                         group["~tostring(groupno + 1)~"..$]=savegroups[0..$];
1597                       }  /*gend*/
1598                             
1599                       return true;
1600                      }
1601                     p = veryoldp; /*gstart*/
1602                     group.length="~tostring(groupno + 1)~"; /*gend*/
1603                     if ("~next~"()) return true; // success but we want longer ones
1604                     return false;
1605                     }";
1606         }
1607       } else { // \"+\", or \"{m,n}\"
1608         if (cg>0)
1609         {
1610           assert(0,"\nError:use of + or {m,n} on any and this capturing group "~regstr[0..t]~" is not supported in regular expression"~regstr);
1611         }
1612         if (possess)
1613         {
1614   // possessive start
1615           code~="//possessive\"+\", or \"{m,n}\"
1616                     int numreps=0; // how many repeats have we found?
1617                     int oldp;
1618                     int newp=-1; \n";
1619           if (repmin == 0)
1620           {
1621             code~=" newp=p;                 
1622                     ";
1623           }
1624           code~="
1625                     do {
1626                         oldp = p;
1627                         numreps++;
1628                         if (numreps>="~tostring(repmin)~" && atom()) {
1629                           newp=p;                         
1630                         }
1631                       
1632                         ";
1633           if (repmax<uint.max) {
1634             code~=" // optimise for \"+\", \"{n,}\"
1635                             if (numreps == "~tostring(repmax)~") break;
1636                             ";
1637           }
1638           code~="
1639                      } while (p!=oldp);
1640                      if (newp!=-1)
1641                      {
1642                       p=newp;
1643                       return "~next~"();
1644                      }
1645                     return false;
1646                   }
1647                 ";
1648          
1649     // possessive end       
1650         }
1651         else
1652           if (!greedy)
1653           {
1654             code~="// non-greedy\"+\", or \"{m,n}\"
1655                     int numreps=0; // how many repeats have we found?
1656                     int oldp;\n";
1657             if (repmin == 0)
1658             {
1659               if (next == "next_alwaystrue")
1660                 code~=" return true;
1661                     ";
1662               else
1663                 code~="
1664                     oldp=p;
1665                     if ("~next~"()) return true;
1666                     p = oldp;
1667                     ";
1668             }
1669             code~="
1670                     do {
1671                         oldp = p;
1672                         numreps++;
1673                         if (numreps>="~tostring(repmin)~" && atomAndNext()) return true;
1674                         p = oldp;
1675                     ";
1676             if (repmax<uint.max) {
1677               code~=" // optimise for \"+\", \"{n,}\"
1678                             if (numreps == "~tostring(repmax)~") return false;
1679                             ";
1680             }
1681             code~="
1682                      } while (atomonly() && p!=oldp);
1683                     return false;
1684                     }
1685 ";
1686           }
1687           else // greedy
1688           {
1689             code~="//greedy\"+\", or \"{m,n}\"
1690                     int numreps=0; // how many repeats have we found?
1691                     int oldp;
1692                     int newp=-1; /*gstart*/
1693                     grouprec [] savegroups;/*gend*/\n";
1694             if (repmin == 0)
1695             {
1696               code~="
1697                     oldp=p;
1698                     if ("~next~"())
1699                     {
1700                       newp=p; /*gstart*/                   
1701                         int sl=group.length-"~tostring(groupno + 1)~";
1702                         if (sl>0)
1703                         {
1704                         savegroups.length=sl;
1705                         savegroups[0..$]=group["~tostring(groupno + 1)~"..$];
1706                         } 
1707                         group.length="~tostring(groupno + 1)~";  /*gend*/           
1708                     }
1709                     p = oldp;
1710                     ";
1711             }
1712             code~="
1713                     do {
1714                         oldp = p;
1715                         numreps++;
1716                         if (numreps>="~tostring(repmin)~" && atomAndNext()) {
1717                           newp=p;
1718                         /*gstart*/
1719                         int sl=group.length-("~tostring(groupno + 1)~");
1720                         if (sl>0)
1721                         {
1722                         savegroups.length=sl;
1723                         savegroups[0..$]=group["~tostring(groupno + 1)~"..$];
1724                         } 
1725                         group.length="~tostring(groupno + 1)~"; /*gend*/     
1726                         }
1727                         p = oldp;
1728                         ";
1729             if (repmax<uint.max) {
1730               code~=" // optimise for \"+\", \"{n,}\"
1731                             if (numreps == "~tostring(repmax)~") break;
1732                             ";
1733             }
1734             code~="
1735                      } while (atomonly() && p!=oldp);
1736                      if (newp!=-1)
1737                      {
1738                       p=newp; /*gstart*/
1739                       group.length="~tostring(groupno + 1)~"+savegroups.length;
1740                       if (savegroups.length>0)
1741                       {
1742                         group["~tostring(groupno + 1)~"..$]=savegroups[0..$];
1743                       } /*gend*/
1744                       return true;
1745                      }
1746                     return false;
1747                   }
1748                 ";
1749         }
1750       }
1751     }
1752    
1753   }
1754   return code;
1755 }
1756 char[] slashimp(char[] abbr, bool[char[]] options)
1757 {
1758   return "
1759         bool fn() { // character class
1760                 if (p<searchstr.length && ("~charMatches(abbr, "searchstr[p]", options)~"))
1761                 {
1762                   p++;
1763                   return true;
1764                 } 
1765                 return false;
1766             }
1767           ";
1768 } 
1769 char [] Backreference(int maxgroupno, int groupno, char[] groupname)
1770 {
1771  
1772   if (groupno>maxgroupno)
1773   {
1774     assert(0,"\nmax group number at this point:"~
1775            tostring(maxgroupno) ~ " bad \\"~tostring(groupno)~
1776            ":group "~tostring(groupno)
1777            ~" cannot be referenced as it is not available at this point");
1778   }
1779        // assert(0,"bad \\"~regstr[1]~ ":group "~regstr[1]~" cannot be referenced as it is not captured");
1780   char[] groupcode;
1781   char[] groupid;
1782   if (groupname != "")
1783   {
1784     groupcode = "groupname."~groupname;
1785     groupid = groupname;
1786   }
1787   else
1788   {
1789     groupcode = tostring(groupno);
1790     groupid = groupcode;
1791   }
1792   return " bool fn() {   
1793        int gs=group["~ groupcode~"].start;
1794        int ge=bracketend["~ groupcode~"];
1795        int gsize=ge-gs;
1796        // writefln(`backref group `,gs,\" \",ge,`searchstr `,\" \",p,\" \",p+gsize,\""~ groupcode~"\","~ groupcode~");
1797         //writefln(`backref group `,searchstr[gs..ge],`searchstr `,searchstr[p..p+gsize]);
1798        if (gsize<0)
1799        {
1800          throw new Exception(\"bad \\\\"~groupid~ ":group "~groupid~" cannot be referenced as it is not captured\");
1801           // assert(0,\"bad \\\\"~tostring(groupno)~ ":group "~tostring(groupno)~" cannot be referenced as it is not captured\");
1802       }
1803        if (p+gsize<=searchstr.length && searchstr[gs..ge]==searchstr[p..p+gsize])
1804                {
1805                  p+=gsize;
1806                  return true;
1807                 }
1808              return false;
1809             }
1810           ";
1811  
1812 } 
1813
1814 int getGroupno(ref char[][] groupnames, ref char[] nametofind)
1815 {
1816   int f = - 1;
1817  
1818   for (int i = 0;i<groupnames.length;i++)
1819   {
1820     char[] findthis = nametofind;
1821     findthis~="=";
1822     if (groupnames[i].length>nametofind.length
1823         &&
1824         groupnames[i][0..nametofind.length + 1] == findthis)
1825     {
1826       f = atoui(groupnames[i][nametofind.length + 1..$]);
1827       break;
1828     }
1829   }
1830   return f;
1831  
1832 } 
1833 // generate a parser for an atom
1834 // IN: regstr is a valid atom, without a repeat
1835 // OUT: if atom is matched, return true, and update p.
1836 //      if atom is not matched, return false, and leave p unchanged.
1837 char[] regAtom(int groupno, char [] regstr, char[][] groupnames, bool[char[]] options)
1838 { 
1839   if (regstr[0] == '[') {
1840     if (regstr[1] == '^')
1841     {
1842       return "
1843         bool fn() { // inverse character class 
1844               if (p<searchstr.length && (!"~charMatches(regstr[2..$ - 1], "searchstr[p]", options)~"))
1845                 {
1846                   p++;
1847                   return true;
1848                 } 
1849                 return false;
1850          } ";
1851     } else {
1852       return "
1853         bool fn() { // character class
1854                 if (p<searchstr.length && ("~charMatches(regstr[1..$ - 1], "searchstr[p]", options)~"))
1855                 {
1856                   p++;
1857                   return true;
1858                 } 
1859                 return false;
1860             }
1861           ";
1862     }
1863   } else if (regstr[0] == '.') { // match any
1864       if (options["s"])
1865       {
1866         return "
1867         bool fn() { //.
1868             if (p==searchstr.length) return false;
1869             p++;
1870             return true;
1871         }
1872       ";
1873       }
1874       else
1875       {
1876         return "
1877         bool fn() { //.
1878             if (p==searchstr.length || searchstr[p]=='\\n' ) return false;
1879             p++;
1880             return true;
1881         }
1882       ";
1883        
1884       }
1885     } else if (regstr.length>1 && regstr[0..2] == "\\w") { // match a word letter
1886         bool [char[]] toptions = ["i":false];
1887         return slashimp("\\w", toptions);
1888       }
1889       else if (regstr.length>1 && regstr[0..2] == "\\s") // match whitespace
1890           return slashimp("\\s", options);
1891         else if (regstr.length>1 && regstr[0..2] == "\\d") // match numbers
1892             return slashimp("\\d", options);
1893           else if (regstr.length>1 && regstr[0..2] == "\\D") // match numbers
1894               return slashimp("\\D", options);
1895             else if (regstr.length>1 && regstr[0..2] == "\\W") // match numbers
1896                 return slashimp("\\W", options);
1897               else if (regstr.length>1 && regstr[0..2] == "\\S") // match numbers
1898                   return slashimp("\\S", options);
1899                 else if (regstr.length>1 && regstr[0] == '\\' && (regstr[1] == 'k' ))
1900                   {
1901                     if (regstr[2] == '<' || regstr[2] == '\'')
1902                     {
1903                       uint namesize = groupnameConsumed(regstr[3..$]);
1904                       char[] name = regstr[3..3 + namesize];
1905                       int f = getGroupno(groupnames, name);
1906                       if (f == - 1)
1907                         assert(0,"\nError:bad group name "~regstr[3..3 + namesize]~ ":it does not exist");
1908                       return Backreference(groupno, f, name);
1909                     }
1910                     assert(0, "\nError:internal");
1911                    
1912                   }
1913                   else if (regstr.length>1 && regstr[0] == '\\' && (regstr[1] >= '1' && regstr[1] <= '9' )) {
1914                       return Backreference(groupno, atoui(""~regstr[1]), "");
1915                     }
1916                     else if (regstr[0] == '@') { // NONSTANDARD: referenced parameter
1917                         return regParameter(atoui(regstr[1..$]) - 1, options);
1918                       } else if (regstr[0] == '^') { // start of line
1919                           if (options["m"])
1920                           {
1921                             return "
1922         bool fn() {
1923             return (p==0 || searchstr[p-1]=='\\n');
1924         }
1925         ";
1926                           }
1927                           else
1928                           {
1929                             return "
1930         bool fn() {
1931             return (p==0);
1932         }
1933         ";
1934                           }
1935                         } else if (regstr[0] == '$') { // end of line
1936                             if (options["m"])
1937                             {
1938                               return "
1939         bool fn() {
1940             return (p==searchstr.length || searchstr[p+1]=='\\n');
1941         }   
1942         ";
1943                             }
1944                             else
1945                             {
1946                               return "
1947         bool fn() {
1948             return (p==searchstr.length || (p==searchstr.length-1 && searchstr[p]=='\\n'));
1949         }   
1950         ";
1951                             }
1952                           } else if (regstr.length>1 && regstr[0] == '\\' && regstr[1] == 'A') { // end of line
1953                               return "
1954         bool fn() {
1955             return (p==0);
1956         }   
1957         ";
1958                             } else if (regstr.length>1 && regstr[0] == '\\' && regstr[1] == 'z') { // end of line
1959                                 return "
1960         bool fn() {
1961             return (p==searchstr.length);
1962         }   
1963         ";
1964                               } else if (regstr.length>1 && regstr[0] == '\\') { // escaped char
1965                                   char[] regstr1 = toLiteralChar(regstr[1]);
1966                                   if (!options["i"])
1967                                   {
1968                                     return "
1969          bool fn() {
1970             if (p==searchstr.length || searchstr[p]!="~regstr1~") return false;
1971             p++;
1972             return true;
1973         }
1974       ";
1975                                   }
1976                                   else
1977                                   {
1978                                     return "
1979          bool fn() {
1980             if (p==searchstr.length || icmp([searchstr[p]],["~regstr1~"])!=0) return false;
1981             p++;
1982             return true;
1983         }
1984       ";
1985                                    
1986                                    
1987                                   }
1988                                 } else {
1989         // match single character
1990                                   char[] regstr0 = toLiteralChar(regstr[0]);
1991                                   if (!options["i"])
1992                                   {
1993                                     return "
1994         bool fn() {
1995             if (p==searchstr.length || searchstr[p]!="~regstr0~") return false;
1996             p++;
1997             return true;
1998         }
1999       ";
2000                                   }
2001                                   else
2002                                   {
2003                                     return "
2004          bool fn() {
2005             if (p==searchstr.length || icmp([searchstr[p]],["~regstr0~"])!=0) return false;
2006             p++;
2007             return true;
2008         }
2009       ";
2010                                   }
2011   }
2012 }
2013
2014 // match a variable string, which will be passed as a parameter.
2015 char[] regParameter(int parmnum, bool [char[]]options)
2016 {
2017   if (!options["i"])
2018   {
2019     return "
2020     bool fn() {
2021         if (p + param["~tostring(parmnum)~"].length > searchstr.length) return false;
2022         if (searchstr[p..p+param["~tostring(parmnum)~"].length] != param["~tostring(parmnum)~"]) return false;
2023         p+=param["~tostring(parmnum)~"].length;
2024         return true;
2025     }
2026   ";
2027   }
2028   else
2029   {
2030     return "
2031     bool fn() {
2032         if (p + param["~tostring(parmnum)~"].length > searchstr.length) return false;
2033         if (icmp(searchstr[p..p+param["~tostring(parmnum)~"].length],param["~tostring(parmnum)~"])!=0) return false;
2034         p+=param["~tostring(parmnum)~"].length;
2035         return true;
2036     }
2037   ";
2038    
2039   }
2040 }
2041
2042 //"a-zA-Z0-9_"
2043
2044 char[] toLiteralString(char[] s)
2045 {
2046   char[] sout;
2047   foreach(c;s)
2048   {
2049     sout~=toLiteralString(c);
2050   }
2051   return sout;
2052 }
2053
2054 char[] toLiteralString(char c)
2055 {
2056   if (c == '\'')
2057   {
2058     return "\\\'";
2059   }
2060   else if (c == '\"')
2061     {
2062       return "\\\"";
2063     }
2064     else if (c == '\\')
2065       {
2066         return "\\\\";
2067       }
2068       else if (c == '\n')
2069         {
2070           return "\\n";
2071         }
2072         else if (c == '\r')
2073           {
2074             return "\\r";
2075           }
2076           else if (c == '\t')
2077             {
2078               return "\\t";
2079             }
2080             else
2081             {
2082               return ""~c;
2083   }
2084 }
2085
2086
2087 char[] toLiteralChar(char c)
2088 {
2089   return "\'"~toLiteralString(c)~"\'";
2090  
2091 }
2092 // ps1<=pe1 and ps2<=pe2
2093 bool intersectPeriods(int ps1, int pe1, int ps2, int pe2, out int is1, out int ie1)
2094 {
2095   if (pe1<ps2)
2096   {
2097     return false;
2098   }
2099   if (pe2<ps1)
2100   {
2101     return false;
2102   }
2103   if (pe1 <= pe2)
2104   {
2105     if (ps1 <= ps2)
2106     {
2107       is1 = ps2;
2108       ie1 = pe1;
2109       return true;
2110     }
2111     else
2112     {
2113       is1 = ps1;
2114       ie1 = pe1;
2115       return true;
2116     }
2117   }
2118   else
2119   {
2120     if (ps1 <= ps2)
2121     {
2122       is1 = ps2;
2123       ie1 = pe2;
2124       return true;
2125     }
2126     else
2127     {
2128       is1 = ps1;
2129       ie1 = pe2;
2130       return true;
2131     }
2132    
2133   }
2134 }
2135
2136
2137 // return true if char ch is matched by the character class regstr.
2138 char[] charMatches(char [] regstr, char []ch, bool [char[]] options)
2139 {
2140   char[] code;
2141   if (regstr.length == 0) return "false";
2142   else if (regstr.length >= 3 && regstr[1] == '-' && regstr[0] != '\\' ) {
2143       if (regstr[0]>regstr[2])
2144       {
2145         assert(0, "Error:>"~regstr[0..3]~"< start of range of a character range is bigger than ending range");
2146       }
2147       if (!options["i"])
2148       {
2149         char[] regstr0 = toLiteralChar(regstr[0]);
2150         char[] regstr2 = toLiteralChar(regstr[2]);
2151         return "("~ch~">="~regstr0~" && "~ch~"<="~regstr2~") || "~charMatches(regstr[3..$], ch, options);
2152       }
2153       else
2154       {
2155         int is1, ie1;
2156         char is2, ie2;
2157         bool i1, i2;
2158         char[] code1, code2;
2159         i1 = intersectPeriods('a', 'z', regstr[0], regstr[2], is1, ie1);
2160         if (i1)
2161         {
2162           int isi, iei;
2163           char uis, uie;
2164           uis = toupper([is1])[0] ; uie = toupper([ie1])[0] ;
2165           code1 = "("~ch~">="~toLiteralChar(uis)~" && "~ch~"<="~toLiteralChar(uie)~") || ";
2166           if (intersectPeriods(uis, uie, regstr[0], regstr[2], isi, iei))
2167           {
2168             if (isi == uis && iei == uie) //UPPER-UPPER is in regstr[0]-regstr[2]
2169             {
2170               code1 = "";
2171             }
2172           }
2173         }
2174         i1 = intersectPeriods('A', 'Z', regstr[0], regstr[2], is1, ie1);
2175         if (i1)
2176         {
2177           int isi, iei;
2178           char lis, lie;
2179           lis = tolower([is1])[0] ; lie = tolower([ie1])[0] ;
2180           code2 = "("~ch~">="~toLiteralChar(lis)~" && "~ch~"<="~toLiteralChar(lie)~") || ";
2181           if (intersectPeriods(lis, lie, regstr[0], regstr[2], isi, iei))
2182           {
2183             if (isi == lis && iei == lie) //UPPER-UPPER is in regstr[0]-regstr[2]
2184             {
2185               code2 = "";
2186             }
2187           }
2188         }
2189         char[] regstr0 = toLiteralChar(regstr[0]);
2190         char[] regstr2 = toLiteralChar(regstr[2]);
2191         return code1~ code2~ "("~ch~">="~regstr0~" && "~ch~"<="~regstr2~") || "~charMatches(regstr[3..$], ch, options);
2192       }
2193     }
2194     else if (regstr.length >= 2 && regstr[0..2] == "\\w") {
2195         return charMatches("a-zA-Z0-9_", ch, options) ~ " || "~charMatches(regstr[2..$], ch, options);
2196       }
2197       else if (regstr.length >= 2 && regstr[0..2] == "\\d") {
2198           return charMatches("0-9", ch, options)~" || "~charMatches(regstr[2..$], ch, options);
2199         }
2200         else if (regstr.length >= 2 && regstr[0..2] == "\\s") {
2201             return charMatches(" \t\n\r", ch, options)~" || "~charMatches(regstr[2..$], ch, options);
2202           }
2203           else if (regstr.length >= 2 && regstr[0..2] == "\\W") {
2204               return "!("~charMatches("a-zA-Z0-9_", ch, options) ~ ") || "~charMatches(regstr[2..$], ch, options);
2205             }
2206             else if (regstr.length >= 2 && regstr[0..2] == "\\D") {
2207                 return "!("~charMatches("0-9", ch, options)~") || "~charMatches(regstr[2..$], ch, options);
2208               }
2209               else if (regstr.length >= 2 && regstr[0..2] == "\\S") {
2210                   return "!("~charMatches(" \t\n\r", ch, options)~") || "~charMatches(regstr[2..$], ch, options);
2211                 }
2212                 else if (regstr[0] == '\\')
2213                   {
2214                     if (regstr.length == 1)
2215                     {
2216                       assert(0,"\nError:a character is missing after \\");
2217                     }
2218                     char[] regstr0 = toLiteralChar(regstr[1]);
2219                     return "("~ch~"=="~regstr0~") || "~charMatches(regstr[2..$], ch, options);
2220                   }
2221                   else
2222                   {
2223                     char[] regstr0 = toLiteralChar(regstr[0]);
2224                     return "("~ch~"=="~regstr0~") || "~charMatches(regstr[1..$], ch, options);}
2225 }
2226
2227
2228 //---------------------------------------------------------------------
2229 // Part III: the public interface of the regexp engine
2230 //---------------------------------------------------------------------
2231
2232 // Does the regexp match the pattern?
2233 template test(char [] fullpattern)
2234 {
2235  
2236   bool test(char [] searchstr, char [][] param...) {
2237     int p = 0; // start at the beginning of the string
2238     grouprec [] group;
2239     mixin (parseRegexp(fullpattern));
2240     return engine();
2241   }
2242 }
2243
2244 class screg(char [] fullpattern)
2245 {
2246   private:
2247   int p; // next index to test
2248         //int groupno=0;
2249   int x;
2250  
2251   char[] searchstr;
2252   char[][9] param;
2253   public:
2254   grouprec [] group;
2255   this()
2256   {
2257     p = 0;
2258   }
2259   this(int startp, char[][]parameters...)
2260   {
2261     p = startp;
2262     int i;
2263     foreach(par;parameters)
2264     param[i++] = par;
2265   }
2266   mixin (parseRegexp(fullpattern, true));
2267   alias match matches;
2268   bool match(char[] searchstrin)
2269   {
2270     group.length = 0;
2271     searchstr = searchstrin;
2272     //pragma(msg,"screg.match:"~fullpattern);
2273     for (int x = p; x<searchstr.length;++x) {
2274       p = x;
2275       if (engine()) {
2276         if (group.length == 0)
2277         {
2278           group.length = 1;
2279         }
2280         group[0] = grouprec(x, p);
2281         return true;
2282       }
2283     }
2284     return false;
2285   }
2286   bool gmatch(char[] searchstrin)
2287   {
2288     group.length = 0;
2289     searchstr = searchstrin;
2290     if (engine()) {
2291       if (group.length == 0)
2292       {
2293         group.length = 1;
2294       }
2295       group[0] = grouprec(x, p);
2296       return true;
2297     }
2298     return false;
2299   }
2300   char[] _(int groupno)
2301   {
2302     return .group(searchstr, group, groupno);
2303   }
2304  
2305   char[] opIndex(int groupno)
2306   {
2307     return .group(searchstr, group, groupno);
2308   }
2309   bool exists(int groupno)
2310   {
2311     if (group.length >= groupno)
2312       return false;
2313     if (groupno<0)
2314       return false;
2315   }
2316   alias ismatched defined;
2317   bool ismatched(int groupno)
2318   {
2319     if (groupno >= group.length)
2320       return false;
2321     return (group[groupno].end !=  - 1);
2322   }
2323   int pos()
2324   {
2325     return p;
2326   }
2327   void pos(int pin)
2328   {
2329     p = pin;
2330   }
2331   void restart()
2332   {
2333     p = 0;
2334   }
2335 }
2336
2337 /// Return first substring which matches the pattern.
2338 /// Note that some patterns will return an empty string as a valid result.
2339 //template search
2340 //{
2341 char [] search(char [] fullpattern)(char [] searchstr, char [][] param...) {
2342   int p; // next index to test
2343         //int groupno=0;
2344   grouprec [] group;
2345   //pragma(msg,parseRegexp(fullpattern));
2346   mixin (parseRegexp(fullpattern));
2347   for (int x = 0; x<searchstr.length;++x) {
2348     p = x;
2349     if (engine()) return searchstr[x..p];
2350   }
2351   return null; // no match
2352 }
2353 //}
2354
2355 //simple version, escapes not supported
2356 char [] parsetr(char[] fullpattern)
2357 {
2358   if (fullpattern.length<3 || fullpattern[0..1] != "/")
2359     assert(0,"tr has syntax error");
2360   int i;
2361   char[] code ="char c2;\nswitch(c){";
2362   int s1, e1, s2, e2;
2363   s1 = 1;
2364   for (i = s1;i<fullpattern.length;i++)
2365   {
2366     if (fullpattern[i] == '/')
2367     {
2368       e1 = i;
2369       break;
2370     }
2371   }
2372   if (e1 == 0)
2373   assert(0,"tr has syntax error,/ missing");
2374   s2 = e1 + 1;
2375  
2376   for (i = s2;i<fullpattern.length;i++)
2377   {
2378     if (fullpattern[i] == '/')
2379     {
2380       e2 = i;
2381       break;
2382     }
2383   }
2384  
2385   if (e2 == 0)
2386   assert(0,"tr has syntax error,last / missing");
2387   if (e2 - s2 != e1 - s1)
2388     assert(0,"input character set is not the same as output character set");
2389   for (i = s1;i<e1;i++)
2390   {
2391     char[] c = fullpattern[i..i + 1];
2392     if (c == "'")
2393       c = "\\'";
2394     char[] c2 = fullpattern[s2..s2 + 1];
2395     s2++;
2396     if (c2 == "'")
2397       c2 = "\\'";
2398     code~="case '"~c~"': c2='"~c2~"';break;\n";
2399   }
2400  
2401   code~="default: c2=c;}";
2402  
2403   char[]code2 = "char[] convert(){
2404         char[] outstr;
2405         foreach(c;convertable)
2406         {
2407       "~code~"
2408         outstr~=c2;
2409         }
2410     return outstr;}
2411     ";
2412     //assert(0,code2);
2413   return code2;
2414 }
2415
2416 char [] parselistofwords(char[] list)
2417 {
2418  
2419   int i;
2420   char[] code = "switch(str){";
2421   int s1, e1, s2, e2;
2422   s1 = 1;
2423   char[][] strs;
2424   while (i<list.length)
2425   {
2426     int j = i;
2427     while (list[j] >= '0' && list[j] <= '9' ||
2428            list[j] >= 'a' && list[j] <= 'z' ||
2429            list[j] >= 'A' && list[j] <= 'Z' || list[j] == '_' )
2430     {
2431       j++;
2432       if (j >= list.length)
2433         break;
2434     }
2435     if (i != j)
2436     {
2437             //str[list[i..j]]=true;
2438       code~="case \""~list[i..j]~"\": found=true;break;\n";
2439       i = j;
2440     }
2441     else
2442       i++;
2443   }
2444  
2445   code~="default: found=false;}";
2446  
2447   char[]code2 = "bool ismatched(){
2448         bool found;
2449       "~code~"
2450     return found;}
2451     ";
2452     //assert(0,code2);
2453   return code2;
2454 }
2455
2456 bool matchlist(char [] list)(char [] str) {
2457  // int something;
2458   //pragma(msg,parseRegexp(fullpattern));
2459   mixin (parselistofwords(list));
2460   //char[] o;//=convert_();
2461   return ismatched;
2462 }
2463
2464
2465
2466 // usage: tr!("/a/b/")(str)
2467 char [] tr(char [] fullpattern)(char [] convertable) {
2468  // int something;
2469   //pragma(msg,parseRegexp(fullpattern));
2470   mixin (parsetr(fullpattern));
2471   //char[] o;//=convert_();
2472   return convert; // no match
2473 }
2474
2475
2476
2477
2478
2479
2480 //template searchgroups(char [] fullpattern)
2481 //{
2482 grouprec [] indexgroups(char [] fullpattern)(char [] searchstr, char [][] param...) {
2483   int p; // next index to test
2484         //int groupno=0;
2485   grouprec [] group;
2486          // pragma(msg,"here 3");
2487   mixin (parseRegexp(fullpattern));// engine;
2488  
2489   for (int x = 0; x<searchstr.length;++x) {
2490     p = x;
2491     if (engine()) {
2492       if (group.length == 0)
2493       {
2494         group.length = 1;
2495       }
2496       group[0] = grouprec(x, p);
2497       return group;
2498     }
2499   }
2500   return null; // no match
2501 }
2502 //}
2503 grouprec [][] indexgroupsall(char [] fullpattern)(char [] searchstr, int startindex = 0, char [][] param = []) {
2504   int p; // next index to test
2505   grouprec [] group;
2506   mixin (parseRegexp(fullpattern));
2507   grouprec [][] solutions;
2508   int soli = 0;
2509   for (int x = startindex; x<searchstr.length;++x) {
2510     p = x;
2511     if (engine())
2512     {
2513       if (group.length == 0)
2514       {
2515         group.length = 1;
2516       }
2517       group[0] = grouprec(x, p);
2518       solutions.length = soli + 1;
2519       solutions[soli] = group;
2520       soli++;
2521       x = p - 1;
2522     }
2523   }
2524   return solutions; // return all
2525 }
2526
2527 char [] group(char[] str, grouprec [] g, int groupno)
2528 {
2529   if (g[groupno].end ==  - 1)
2530   {
2531     return [];
2532   }
2533   return str[g[groupno].start..g[groupno].end];
2534 } 
2535
2536
2537
2538 template index(char [] fullpattern)
2539 {
2540   int index(char [] searchstr, int startindex, char [][] param...) {
2541     int p; // next index to test
2542     grouprec [] group;
2543     int rp;
2544     mixin (parseRegexp(fullpattern));
2545     for (int x = startindex; x<searchstr.length;++x) {
2546       p = x;
2547       rp = 0;
2548       if (engine()) return x;
2549     }
2550     return - 1; // no match
2551   }
2552 }
2553
2554 struct indexrec
2555 {
2556   int start;
2557   int end; //as for array slices, last char + 1
2558 }
2559
2560 struct grouprec
2561 {
2562   int start;
2563   int end =  - 1; //as for array slices, last char + 1
2564   char[] toString()
2565   {
2566     version(Tango)
2567     return Stdout.layout.convert("[{},{}]", start, end);
2568     else
2569       return format("[%d,%d]", start, end);
2570   }
2571 }
2572
2573 template index2(char [] fullpattern)
2574 {
2575   indexrec index2(char [] searchstr, int startindex, char [][] param...) {
2576     int p; // next index to test
2577     int rp;
2578     grouprec [] group;
2579     mixin (parseRegexp(fullpattern));
2580     for (int x = startindex; x<searchstr.length;++x) {
2581       p = x;
2582       rp = 0;
2583       if (engine()) return indexrec(x, p);
2584     }
2585     return indexrec( - 1, - 1); // no match
2586   }
2587 }
2588
2589 template indexall(char [] fullpattern)
2590 {
2591   indexrec[] indexall(char [] searchstr, int startindex = 0, char [][] param = []) {
2592     int p; // next index to test
2593     grouprec [] group;
2594     mixin (parseRegexp(fullpattern));
2595     indexrec[] solutions;
2596     int soli = 0;
2597     for (int x = startindex; x<searchstr.length;++x) {
2598       p = x;
2599       if (engine())
2600       {
2601         solutions.length = soli + 1;
2602         solutions[soli] = indexrec(x, p);
2603         soli++;
2604         x = p - 1;
2605       }
2606     }
2607     return solutions; // return all
2608   }
2609 }
2610
2611 template indexalloverlapping(char [] fullpattern)
2612 {
2613   indexrec[] indexalloverlapping(char [] searchstr, int startindex = 0, char [][] param = []) {
2614     int p; // next index to test
2615     grouprec [] group;
2616     mixin (parseRegexp(fullpattern));
2617     indexrec[] solutions;
2618     int soli = 0;
2619     for (int x = startindex; x<searchstr.length;++x) {
2620       p = x;
2621       if (engine())
2622       {
2623         solutions.length = soli + 1;
2624         solutions[soli] = indexrec(x, p);
2625         soli++;
2626       }
2627     }
2628     return solutions; // return all
2629   }
2630 }
2631
2632
2633
2634
2635 template searchall(char [] fullpattern)
2636 {
2637   char[][] searchall(char [] searchstr, char [][] param...) {
2638     int p; // next index to test
2639     grouprec [] group;
2640        //  pragma(msg, "here"~parseRegexp(fullpattern));
2641     mixin (parseRegexp(fullpattern));
2642     char[][] solutions;
2643     int soli = 0;
2644     for (int x = 0; x<searchstr.length;++x) {
2645       p = x;
2646             //writefln("starting ",x);
2647       if (engine())
2648       {
2649         solutions.length = soli + 1;
2650         solutions[soli] = searchstr[x..p];
2651               //writefln(">>", searchstr[x..p], "<<");
2652         soli++;
2653         x = p - 1;
2654       }
2655     }
2656     return solutions; // return them
2657   }
2658 }
2659
2660 template searchalloverlapping(char [] fullpattern)
2661 {
2662   char[][] searchalloverlapping(char [] searchstr, char [][] param...) {
2663     int p; // next index to test
2664     grouprec [] group;
2665        //  pragma(msg, "here"~parseRegexp(fullpattern));
2666     mixin (parseRegexp(fullpattern));
2667     char[][] solutions;
2668     int soli = 0;
2669     for (int x = 0; x<searchstr.length;++x) {
2670       p = x;
2671             //writefln("starting ",x);
2672       if (engine())
2673       {
2674         solutions.length = soli + 1;
2675         solutions[soli] = searchstr[x..p];
2676               //writefln(">>", searchstr[x..p], "<<");
2677         soli++;
2678       }
2679     }
2680     return solutions; // return them
2681   }
2682 }
2683 version(none)
2684 //version(shortscregexpfuncnames)
2685 {
2686   alias test t;
2687   alias index i;
2688   alias index2 i2;
2689   alias search s;
2690   alias searchall sa;
2691   alias indexall ia;
2692   alias indexgroups ig;
2693   alias indexgroupsall iga;
2694 }
2695
2696
2697
2698 //---------------------------------------------------------------------
2699 //                         EXAMPLE
2700 //---------------------------------------------------------------------
2701 version(Phobos)
2702 import std.stdio;
2703 version(test)
2704 {
2705   void main()
2706   {
2707     version(Phobos)
2708     writefln("BEGINNING UNIT TESTS\n");
2709     else
2710       Stdout("BEGINNING UNIT TESTS\n").newline;
2711     assert(search!("ab")("aaab") == "ab");
2712    
2713    
2714     assert(search!("a*b")("aaab") == "aaab");
2715     assert(search!("a*(b)")("aaab") == "aaab");
2716    
2717     assert(search!("((a*b))")("aaab") == "aaab");
2718     assert(search!("(a*)b")("aaab") == "aaab");
2719    
2720     assert(search!("(?:b*a*)*b")("aaab") == "aaab");
2721    
2722     assert(search!("b+cd")("acdbbcabbcdaaab") == "bbcd");
2723    
2724     assert(search!("b?cd")("abcacbacdb") == "cd");
2725    
2726     assert(search!("(ab)?abc")("aababcab") == "ababc");
2727     assert(search!("(?:ab)*abc")("aababcab") == "ababc");
2728     assert(search!("((?:a)*|xyz)b")("aaab") == "aaab");
2729    
2730     assert(search!("(?:ab)*(abb)")("bababb") == "ababb");
2731     assert(search!("e?(?:ab)*b+?")("eaaababbbbaac") == "ababb");
2732     assert(search!("(?:ab*)*c")("bbbababbaaabaaaabbbbc") == "ababbaaabaaaabbbbc");
2733     char [] quasistatic = "m";
2734     assert(search!("(@1.*?@1)")("they said D can't do metaprogramming?", quasistatic) == "metaprogram");
2735     assert(search!("[h-za]*g")("metaprogramming") == "taprog");
2736     assert(search!("(?:a*)*b")("cacaaab") == "aaab");
2737     assert(search!("(?:a*b*)*c")("dababdaabababbaaabbbcab") == "aabababbaaabbbc");
2738     assert(search!("(?:(?:a*b*)|da)*b")("fasdaaab") == "daaab");
2739     assert(search!("aaab??")("aaabbb") == "aaa");
2740     assert(search!("aaab?")("aaabbb") == "aaab");
2741     //assert(search!("aaa?")("aa") == "aa");
2742    
2743     char [] qq;
2744     version(Phobos)
2745     writefln("=========");
2746     version(Tango)
2747     Stdout("=========").newline;
2748     qq = search!("(?:(?:a*b*)|da)*b")("fasdaaab");
2749     version(Phobos) writefln("Result: ----", qq, "---");
2750     version(Tango) Stdout("Result: ----", qq, "---").newline;
2751     version(Phobos) writefln("END OF UNIT TESTS\n");
2752     version(Tango) Stdout("END OF UNIT TESTS\n").newline;
2753     version(Phobos) writefln("All tests are passed if you set -debug");
2754     version(Tango) Stdout("All tests are passed if you set -debug").newline;
2755    
2756   }
2757 }
2758 //-------------------------------------------------------------------------------
2759 /+
2760
2761 // NOT CURRENTLY USED
2762
2763 // Finds the number of instances of 'ch' in str which aren't preceded by a backslash
2764 // ch must not be a backslash.
2765 template unescapedCount(char [] str, char ch)
2766 {
2767     static if (str.length==0) const int unescapedCount = 0;
2768     else static if (str[0]=='\\' && str.length>1) const int unescapedCount = unescapedCount!(str[2..$], ch);
2769     else static if (str[0]==ch) const int unescapedCount = 1 + unescapedCount!(str[1..$], ch);
2770     else const int unescapedCount = unescapedCount!(str[1..$], ch);
2771 }
2772
2773 +/
2774
2775
2776
2777 void searchgroupstest(char[]reg)(char[]input)
2778 {
2779   version(Phobos)
2780   writefln("reg:%s", reg," input:%s", input);
2781   version(Tango)
2782   Stdout.format("reg:{} input:{}", reg, input).newline;
2783   int groupno = 0;
2784   foreach(e;indexgroups!(reg)(input))
2785   {
2786     if (e.start <= e.end)
2787     {
2788       version(Phobos)
2789       writefln("group %d:>>%s<<", groupno, input[e.start..e.end]);
2790       version(Tango)
2791       Stdout.format("group {}:>>{}<<", groupno, input[e.start..e.end]).newline;
2792     }
2793     else
2794     {
2795       version(Phobos)
2796       writefln("group %d:", groupno, "[", e.start, ",", e.end, "]");
2797       version(Tango)
2798       Stdout.format("group {}:", groupno, "[", e.start, ",", e.end, "]").newline;
2799     }
2800     groupno++;
2801   }
2802   version(Phobos) writefln("-------------------");
2803   version(Tango) Stdout("-------------------").newline;
2804 } 
2805
2806 void searchgroupsalltest(char[]reg)(char[]input)
2807 {
2808   int sno = 0;
2809  
2810   version(Phobos)
2811   writefln("reg:%s", reg," input:%s", input);
2812   version(Tango)
2813   Stdout.format("reg:{} input:{}", reg, input).newline;
2814   foreach(s;indexgroupsall!(reg)(input))
2815   {
2816     version(Phobos) writefln("solution ", sno++);
2817     version(Tango) Stdout.format("solution {}", sno++).newline;
2818     foreach(e;s)
2819     {
2820       if (e.start <= e.end)
2821       {
2822         version(Phobos) writefln("group >>%s<<", input[e.start..e.end]);
2823         version(Tango)Stdout.format("group >>{}<<", input[e.start..e.end]).newline;
2824       }
2825       else
2826       {
2827         version(Phobos) writefln("group %s", e.start," ", e.end);
2828         version(Tango)Stdout.format("group {} {}", e.start, e.end).newline;
2829       }
2830     }
2831   }
2832   version(Phobos) writefln("-------------------");
2833   version(Tango)Stdout("-------------------").newline;
2834 } 
2835 void searchgroupstest2(char[]reg)(char[]input, char[][] target)
2836 {
2837   version(Phobos) writefln("reg:%s", reg," input:%s", input);
2838   version(Tango)Stdout.format("reg:{} input:{}", reg, input).newline;
2839   pragma(msg, "searchgroupstest2:"~reg);
2840   int i = 0;
2841   int oi = 0;
2842   int failed = 0;
2843   foreach(e;indexgroups!(reg)(input)) {
2844     bool compare = true;
2845     if (e.start <= e.end)
2846     {
2847       version(Phobos) writef("group", i," >>%s<<", input[e.start..e.end]);
2848       version(Tango)Stdout.format ("group {} >>{}<<", i, input[e.start..e.end]);
2849     }
2850     else
2851     {
2852       version(Phobos) writef("group", i," ","not matched [", e.start," ", e.end, "]");
2853       version(Tango)Stdout.format ("group {} not matched [{} {}]", i, e.start, e.end);
2854     }
2855     if (i >= target.length)
2856     {
2857       //writefln("\nnumber of groups failed");
2858       failed++;
2859       //writefln("\n");
2860       compare = false;
2861     }
2862  //   writefln(">range ",e.start," ",e.end);
2863     if (e.start>e.end )
2864     {
2865    //   writefln(">range ",e.start," ",e.end);
2866       if (target.length>i && target[i] is null)
2867       {
2868         version(Phobos) writefln(" passed");
2869         version(Tango)Stdout (" passed").newline;
2870       }
2871       else
2872       {
2873         version(Phobos) writefln(" failed");
2874         version(Tango)Stdout (" failed").newline;
2875         failed++;
2876       }
2877     }
2878     else if (compare && input[e.start..e.end] == target[i])
2879       {
2880         version(Phobos) writefln(" passed");
2881         version(Tango)Stdout (" passed").newline;
2882       }
2883       else
2884       {
2885         if (compare)
2886         {
2887           version(Phobos) writefln(" failed expected %s", target[i]);
2888           version(Tango)Stdout.format (" failed expected {}", target[i]).newline;
2889         }
2890         failed++;
2891     }
2892     oi++;
2893     i++;
2894   }
2895   if (oi != target.length)
2896   {
2897     version(Phobos) writefln("number of groups failed");
2898     version(Tango)Stdout ("number of groups failed").newline;
2899     failed++;
2900   }
2901   else
2902   {
2903     version(Phobos) writefln("number of groups is passed");
2904     version(Tango)Stdout ("number of groups is passed").newline;
2905   }
2906   version(Phobos) writefln("-------------------");
2907   version(Tango)Stdout ("-------------------").newline;
2908   if (failed>0)
2909   {
2910     version(Phobos) writefln("test failed");
2911     version(Tango)Stdout ("test failed").newline;
2912     assert(0);
2913   }
2914   else
2915   {
2916     version(Phobos) writefln("test ok");
2917     version(Tango)Stdout ("test ok").newline;
2918   }
2919 } 
2920
2921 void printcode(char[] reg)
2922 {
2923   version(Phobos) writefln("%s", parseRegexp(reg));
2924   version(Tango)Stdout.format ("{}", parseRegexp(reg)).newline;
2925 }
2926
2927 void printclasscode(char[] reg)
2928 {
2929  version(Phobos) writefln("%s", getclasscode(reg));
2930  version(Tango)Stdout.format ("{}", getclasscode(reg)).newline;   
2931 }
2932
2933 char[] getclasscode(char[] reg)
2934 {
2935   char [] code;
2936 code="
2937 version(Tango)
2938 {}
2939 else
2940 { version = Phobos;}
2941 version(Phobos)
2942 import std.string;
2943 version(Tango)
2944 {
2945   import tango.text.Ascii;
2946   alias toUpper toupper;
2947   alias toLower tolower;
2948   alias icompare icmp;
2949   import tango.io.Stdout;
2950 }
2951 struct grouprec
2952 {
2953   int start;
2954   int end =  - 1; //as for array slices, last char + 1
2955   char[] toString()
2956   {
2957     version(Tango)
2958     return Stdout.layout.convert(\"[{},{}]\", start, end);
2959     else
2960       return format(\"[%d,%d]\", start, end);
2961   }
2962 }
2963 char [] group(char[] str, grouprec [] g, int groupno)
2964 {
2965   if (g[groupno].end ==  - 1)
2966   {
2967     return [];
2968   }
2969   return str[g[groupno].start..g[groupno].end];
2970 }
2971  
2972 class screg
2973 {
2974   private:
2975   int p; // next index to test
2976         //int groupno=0;
2977   int x;
2978  
2979   char[] searchstr;
2980   char[][9] param;
2981   public:
2982   grouprec [] group;
2983   this()
2984   {
2985     p = 0;
2986   }
2987   this(int startp, char[][]parameters...)
2988   {
2989     p = startp;
2990     int i;
2991     foreach(par;parameters)
2992     param[i++] = par;
2993   }"~
2994   parseRegexp(reg, true)~
2995   "
2996   alias match matches;
2997   bool match(char[] searchstrin)
2998   {
2999     group.length = 0;
3000     searchstr = searchstrin;
3001     //pragma(msg,\"screg.match:\"~fullpattern);
3002     for (int x = p; x<searchstr.length;++x) {
3003       p = x;
3004       if (engine()) {
3005         if (group.length == 0)
3006         {
3007           group.length = 1;
3008         }
3009         group[0] = grouprec(x, p);
3010         return true;
3011       }
3012     }
3013     return false;
3014   }
3015   bool gmatch(char[] searchstrin)
3016   {
3017     group.length = 0;
3018     searchstr = searchstrin;
3019     if (engine()) {
3020       if (group.length == 0)
3021       {
3022         group.length = 1;
3023       }
3024       group[0] = grouprec(x, p);
3025       return true;
3026     }
3027     return false;
3028   }
3029   char[] _(int groupno)
3030   {
3031     return .group(searchstr, group, groupno);
3032   }
3033  
3034   char[] opIndex(int groupno)
3035   {
3036     return .group(searchstr, group, groupno);
3037   }
3038   bool exists(int groupno)
3039   {
3040     if (group.length >= groupno)
3041       return false;
3042     if (groupno<0)
3043       return false;
3044   }
3045   alias ismatched defined;
3046   bool ismatched(int groupno)
3047   {
3048     if (groupno >= group.length)
3049       return false;
3050     return (group[groupno].end !=  - 1);
3051   }
3052   int pos()
3053   {
3054     return p;
3055   }
3056   void pos(int pin)
3057   {
3058     p = pin;
3059   }
3060   void restart()
3061   {
3062     p = 0;
3063   }
3064 }
3065 "; 
3066   return code;
3067 } 
3068
3069 //-------------
3070 // unit tests
3071 //-------------
3072 version (testmeta) {
3073   static assert(quantifierConsumed("{456}345") == 5);
3074   static assert(parenConsumed("(45(6)4)5") == 8);
3075   static assert(parenConsumed(`(45\(6)45`) == 7);
3076 }
3077
3078 /*
3079 Copyright (c) 2006 Walter Bright
3080 (basic framework, regular expression engine, basic documentation)
3081 Copyright (c) 2007-2009 Marton Papp
3082 (added /w,/s,/d,extended character classes, added groups and backreferences,
3083  options (msxi), extended documentation, non-greedy constucts
3084  converted testing functions into Tango
3085 )
3086 Copyright (c) 2008  (yidabu  g m a i l at com) All rights reserved
3087 *modified by yidabu to make it work with Tango
3088 ( D Programming Language China : http://www.d-programming-language-china.org/ )       
3089 All rights reserved.
3090
3091 Redistribution and use in source and binary forms, with or without
3092 modification, are permitted provided that the following conditions
3093 are met:
3094 1. Redistributions of source code must retain the above copyright
3095    notice, this list of conditions and the following disclaimer.
3096 2. Redistributions in binary form must reproduce the above copyright
3097    notice, this list of conditions and the following disclaimer in the
3098    documentation and/or other materials provided with the distribution.
3099 3. The name of the author may not be used to endorse or promote products
3100    derived from this software without specific prior written permission.
3101
3102 THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
3103 IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
3104 OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
3105 IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
3106 INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
3107 NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
3108 DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
3109 THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
3110 (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
3111 THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
3112 */
Note: See TracBrowser for help on using the browser.