| 1 |
// scregexp |
|---|
| 2 |
// Statically compiled /Compile-time regular expression Perl-compatible parser, version 1.0 |
|---|
| 3 |
// Author: Marton Papp, borrowing heavily from the regexp 2.0 parser written by Don Clugston. |
|---|
| 4 |
// This version uses CTFE for simplicity, flexibility, higher performance and easy debugging |
|---|
| 5 |
// |
|---|
| 6 |
// It is a top/down recursive descent regexp parser with backtracking. |
|---|
| 7 |
// The license is bsd-like, see the end of this file. |
|---|
| 8 |
// |
|---|
| 9 |
// |
|---|
| 10 |
// Terminology: |
|---|
| 11 |
// a regex "sequence" is a set of consecutive "term"s, |
|---|
| 12 |
// each of which consists of a "naked term", optionally followed by |
|---|
| 13 |
// a "quantifier" (*,+,?, {m}, {m,} or {m,n}). |
|---|
| 14 |
// A "naked term" is either a "sequence" or an "atom". |
|---|
| 15 |
|
|---|
| 16 |
/* |
|---|
| 17 |
Regular expression formats: |
|---|
| 18 |
"/regex/options" e.g. "/.+/s" |
|---|
| 19 |
"regex/options" e.g. ".+/s" |
|---|
| 20 |
"/regex/" e.g. "/.+/" |
|---|
| 21 |
"regex" e.g. ".+" |
|---|
| 22 |
possible options: m s x i (they behave as in Perl) |
|---|
| 23 |
x is now implemented fully. # comments out one line |
|---|
| 24 |
|
|---|
| 25 |
Features currently supported: |
|---|
| 26 |
it matches as if /../sm were used in Perl. |
|---|
| 27 |
* match previous expression 0 or more times |
|---|
| 28 |
+ match previous expression 1 or more times |
|---|
| 29 |
? match previous expression 0 or 1 times |
|---|
| 30 |
{m,n} match previous expression between m and n times |
|---|
| 31 |
{m,} match previous expression m or more times |
|---|
| 32 |
{,n} match previous expression between 0 and m times |
|---|
| 33 |
{n} match previous expression exactly n times. |
|---|
| 34 |
. match any character if s option is used. |
|---|
| 35 |
. match any non-\n character if s is not set. |
|---|
| 36 |
other characters match themselves |
|---|
| 37 |
a|b match regular expression a or b |
|---|
| 38 |
(?: ) uncaptured grouping |
|---|
| 39 |
( ) captured grouping |
|---|
| 40 |
(?> ) independent subexpression |
|---|
| 41 |
(?= ) lookahead subexpression |
|---|
| 42 |
(?! ) negative lookahead subexpression |
|---|
| 43 |
(?<name>) or (?'name') named captured grouping |
|---|
| 44 |
\k<name> or \k'name' insert previously captured named group |
|---|
| 45 |
^ anchor,start of line if m option is used |
|---|
| 46 |
^ anchor,start of string if no option is used |
|---|
| 47 |
$ anchor,end of line (at \n and the end of string) if m option is used |
|---|
| 48 |
$ anchor,the end of string or before \n at the end of string if no m is used |
|---|
| 49 |
\A start of string |
|---|
| 50 |
\z end of string |
|---|
| 51 |
[abc] match any character in character class abc |
|---|
| 52 |
[^abc] match any character not in character class abc |
|---|
| 53 |
@n match string variables passed into the functions as extra parameters. (this is a non-standard extension). |
|---|
| 54 |
escape characters |
|---|
| 55 |
\d,\s,\w,\D,\S,\W |
|---|
| 56 |
[\d\s\w\D\S\W] in character classes too |
|---|
| 57 |
All matches are greedy except if an additional ? used. |
|---|
| 58 |
Use ? after quantifiers (*,+,{n,m},?) to make matches non-greedy/lazy. |
|---|
| 59 |
Use + after quantifiers (*,+,{n,m},?) to make matches possessive. |
|---|
| 60 |
\1..\9 to match previously captured subsequences |
|---|
| 61 |
These characters have to be escaped to match them: ()[]?*+.{}^$|@ |
|---|
| 62 |
/ needs to escaped if it is used at the start or at the end of regex |
|---|
| 63 |
|
|---|
| 64 |
Compile time error handling: |
|---|
| 65 |
redundant ) is found |
|---|
| 66 |
redundant ] is found |
|---|
| 67 |
Regexp must not end with \\ |
|---|
| 68 |
Unmatched parenthesis |
|---|
| 69 |
unsupported quantifier |
|---|
| 70 |
unmatched { in regular expression |
|---|
| 71 |
if \1..\9 references a group not accessable or not defined |
|---|
| 72 |
use of + or {m,n} on any and this capturing group is not supported |
|---|
| 73 |
use of * or {0,} on any and this capturing group is not supported |
|---|
| 74 |
start of range of a character range is bigger than ending range |
|---|
| 75 |
\k must be followed by 'name' or <name> |
|---|
| 76 |
Closing > missing |
|---|
| 77 |
Closing ' missing |
|---|
| 78 |
unmatched [ in regular expression |
|---|
| 79 |
|
|---|
| 80 |
Run time error handling: |
|---|
| 81 |
if \1..\9 references a group not captured |
|---|
| 82 |
|
|---|
| 83 |
Limits: |
|---|
| 84 |
length of input string : max of int |
|---|
| 85 |
number of groups: max of int |
|---|
| 86 |
regular expression size: limited by char[] size and stack |
|---|
| 87 |
number of 's : 9 (\1..\9) |
|---|
| 88 |
|
|---|
| 89 |
Functions: |
|---|
| 90 |
|
|---|
| 91 |
bool test!(char[] regexp)(char[] stringtosearch) |
|---|
| 92 |
short form: t |
|---|
| 93 |
returns true if regexp matches the beginning of stringtosearch |
|---|
| 94 |
|
|---|
| 95 |
char[] search!(char[] regexp)(char[] stringtosearch) |
|---|
| 96 |
short form: s |
|---|
| 97 |
finds the first match of regexp |
|---|
| 98 |
returns the substring found |
|---|
| 99 |
returns null if nothing found |
|---|
| 100 |
|
|---|
| 101 |
int index!(char[] regexp)(char[] stringtosearch, int indextostartat) |
|---|
| 102 |
short form: i |
|---|
| 103 |
finds the first match of regexp |
|---|
| 104 |
returns its starting index in stringtosearch |
|---|
| 105 |
returns -1 if nothing found |
|---|
| 106 |
|
|---|
| 107 |
indexrec index2!(char[] regexp)(char[] stringtosearch, int indextostartat) |
|---|
| 108 |
short form: i2 |
|---|
| 109 |
finds the first match of regexp |
|---|
| 110 |
returns its start and end (last char+1) in stringtosearch |
|---|
| 111 |
returns indexrec(-1,-1) if nothing found |
|---|
| 112 |
|
|---|
| 113 |
indexrec[] indexall!(char[] regexp)(char[] stringtosearch, int indextostartat,..) |
|---|
| 114 |
short form: ia |
|---|
| 115 |
finds all occurances of the regexp in stringtosearch from left to right |
|---|
| 116 |
Matches follow each other. No overlaps are possible. in Perl:/ /g |
|---|
| 117 |
returns the start and end (last char+1) of found strings in stringtosearch |
|---|
| 118 |
returns an an empty array if nothing found |
|---|
| 119 |
|
|---|
| 120 |
char[][] searchall!(char[] regexp)(char[] stringtosearch,..) |
|---|
| 121 |
short form: sa |
|---|
| 122 |
finds all occurances of the regexp in stringtosearch from left to right |
|---|
| 123 |
Matches follow each other. No overlaps are possible. It is similar to Perl:/ /g |
|---|
| 124 |
returns found strings |
|---|
| 125 |
returns an an empty array if nothing found |
|---|
| 126 |
|
|---|
| 127 |
grouprec [] indexgroups!(char[] regexp)(char[] stringtosearch,..) |
|---|
| 128 |
short form: ig |
|---|
| 129 |
finds the first match of regexp, the captured groups are returned |
|---|
| 130 |
returns the start/end indexes of captured groups and the whole string matched(at index 0) |
|---|
| 131 |
returns null if nothing found |
|---|
| 132 |
grouprec(something,-1) means that given group was not captured |
|---|
| 133 |
|
|---|
| 134 |
grouprec [][] indexgroupsall!(char[] regexp)(char[] stringtosearch,..) |
|---|
| 135 |
short form: iga |
|---|
| 136 |
finds all occurances of the regexp in stringtosearch from left to right |
|---|
| 137 |
Matches follow each other. No overlaps are possible. in Perl:/ /g |
|---|
| 138 |
returns the start and end (last char+1) of found groups of all matches |
|---|
| 139 |
returns an an empty array if nothing found |
|---|
| 140 |
|
|---|
| 141 |
char [] group(char[] stringtosearch,grouprec g,int groupno) |
|---|
| 142 |
can be used to have access to the results of searchgroups in an easier way |
|---|
| 143 |
g should come from searchgroups |
|---|
| 144 |
returns groupno-th captured group |
|---|
| 145 |
if groupno is 0, returns the whole match |
|---|
| 146 |
returns []/null if no match for given group |
|---|
| 147 |
|
|---|
| 148 |
void searchgroupstest(char[]reg)(char[]input) |
|---|
| 149 |
for testing, prints found matches,groups |
|---|
| 150 |
reg regular expression to use |
|---|
| 151 |
input input string to parse |
|---|
| 152 |
|
|---|
| 153 |
void searchgroupsalltest(char[]reg)(char[]input) |
|---|
| 154 |
for testing,prints all solutions found |
|---|
| 155 |
reg regular expression to use |
|---|
| 156 |
input input string to parse |
|---|
| 157 |
|
|---|
| 158 |
void searchgroupstest2(char[]reg)(char[]input, char[][] target) |
|---|
| 159 |
for testing,compares found matches with target, it stops the program in case of failure |
|---|
| 160 |
reg regular expression to use |
|---|
| 161 |
input input string to parse |
|---|
| 162 |
target expected matches/groups |
|---|
| 163 |
|
|---|
| 164 |
void printcode(char[] reg) |
|---|
| 165 |
prints the generated D code from reg |
|---|
| 166 |
char[] tr(char[]reg)(char[] convertable) -fast transliteration throught switches |
|---|
| 167 |
e.g. tr!("/12345/abced/")(str) |
|---|
| 168 |
|
|---|
| 169 |
void printclasscode(char [] reg) |
|---|
| 170 |
prints screg class as it should look after mixins are processed |
|---|
| 171 |
|
|---|
| 172 |
void getclasscode(char [] reg) |
|---|
| 173 |
returns a string containing screg class as it should look after mixins are processed |
|---|
| 174 |
|
|---|
| 175 |
Class: |
|---|
| 176 |
screg - another way to search in strings |
|---|
| 177 |
bool match(char[] searchstrin); - match searchstrin , returns true if match is found |
|---|
| 178 |
if it is called again, it finds the next match |
|---|
| 179 |
It is similar to search without using anchors |
|---|
| 180 |
bool gmatch(char[] searchstrin); - match as if \G(?:searchstrin) , returns true if match is found |
|---|
| 181 |
if it is called again, it finds the next match |
|---|
| 182 |
|
|---|
| 183 |
char[] _(int groupno) - returns given group matched (_(0) returns the whole string matched |
|---|
| 184 |
char[] [int groupno] - used as an array |
|---|
| 185 |
returns given group matched ([0] returns the whole string matched |
|---|
| 186 |
char[] ismatched(int groupno) - returns if the given group matched |
|---|
| 187 |
char[] exists(int groupno) - returns if the given group exists, not necessarily matched |
|---|
| 188 |
int pos() - return current position from where matching is attempted (as in Perl) |
|---|
| 189 |
void pos(int pin) - set current position where matching is attempted |
|---|
| 190 |
void restart() - restart, set position to 0 |
|---|
| 191 |
Example: |
|---|
| 192 |
auto reg1=new screg!("/ab/"); // regular expression to use |
|---|
| 193 |
while (reg1.match("abxabxab")) |
|---|
| 194 |
{ |
|---|
| 195 |
writefln(reg1._(0)); |
|---|
| 196 |
} |
|---|
| 197 |
|
|---|
| 198 |
auto reg3=new screg!("/(?<day>monday)|(?<day>tuesday)|(?<day>wednesday)/"); |
|---|
| 199 |
if (reg3.match("wednesday")) |
|---|
| 200 |
{ |
|---|
| 201 |
// reg3.groupname.day gives back the group number |
|---|
| 202 |
writefln(reg3._(reg3.groupname.day)); //prints the matched day |
|---|
| 203 |
writefln(reg3.getday());//prints the matched day in another way |
|---|
| 204 |
} |
|---|
| 205 |
*/ |
|---|
| 206 |
|
|---|
| 207 |
// Points of interest: |
|---|
| 208 |
// * The parser is able to treat all 'quantifier's in a single mixin function, while still applying |
|---|
| 209 |
// optimisations (eg, there's absolutely no difference between {1,} and "+"). |
|---|
| 210 |
// * There is absolutely no parameter passing inside the regexp engine. Even functions which |
|---|
| 211 |
// can't be inlined will have very low calling cost. |
|---|
| 212 |
// * Consequently, the speed is excellent. The main unnecessary operations are the checks to see whether we |
|---|
| 213 |
// are at the end of the string. |
|---|
| 214 |
// This could be greatly improved by precalculating the minimum length required for a match, |
|---|
| 215 |
// at least for subsequences of fixed length. |
|---|
| 216 |
// * Since each mixin can be given access to any desired runtime or compile-time parameters, |
|---|
| 217 |
// the scheme is extremely flexible. |
|---|
| 218 |
|
|---|
| 219 |
module scregexp; |
|---|
| 220 |
version(Tango) |
|---|
| 221 |
{} |
|---|
| 222 |
else |
|---|
| 223 |
{ version = Phobos;} |
|---|
| 224 |
version(Phobos) |
|---|
| 225 |
import std.string; |
|---|
| 226 |
version(Tango) |
|---|
| 227 |
{ |
|---|
| 228 |
import tango.text.Ascii; |
|---|
| 229 |
alias toUpper toupper; |
|---|
| 230 |
alias toLower tolower; |
|---|
| 231 |
alias icompare icmp; |
|---|
| 232 |
import tango.io.Stdout; |
|---|
| 233 |
} |
|---|
| 234 |
//--------------------------------------------------------------------- |
|---|
| 235 |
// Part 0 : Functions from the meta library |
|---|
| 236 |
//--------------------------------------------------------------------- |
|---|
| 237 |
|
|---|
| 238 |
/****************************************************** |
|---|
| 239 |
* ulong atoui!(char [] s); |
|---|
| 240 |
* |
|---|
| 241 |
* Converts an ASCII string to an uint. |
|---|
| 242 |
*/ |
|---|
| 243 |
uint atoui(char [] s, uint result = 0, int indx = 0) |
|---|
| 244 |
{ |
|---|
| 245 |
if (s.length == indx) |
|---|
| 246 |
return result; |
|---|
| 247 |
else if (s[indx]<'0' || s[indx]>'9') |
|---|
| 248 |
return result; |
|---|
| 249 |
else |
|---|
| 250 |
return atoui(s, result * 10 + s[indx] - '0', indx + 1); |
|---|
| 251 |
} |
|---|
| 252 |
char[] tostring(uint i) |
|---|
| 253 |
{ |
|---|
| 254 |
uint i2 = i / 10; |
|---|
| 255 |
uint digit = i - i2 * 10; |
|---|
| 256 |
if (i >= 10) |
|---|
| 257 |
{ |
|---|
| 258 |
char[] s; |
|---|
| 259 |
s = tostring(i2)~cast(char)(digit + 48); |
|---|
| 260 |
return s; |
|---|
| 261 |
} |
|---|
| 262 |
else |
|---|
| 263 |
{ |
|---|
| 264 |
char[] s; |
|---|
| 265 |
s = ""~cast(char)(digit + 48); |
|---|
| 266 |
return s; |
|---|
| 267 |
} |
|---|
| 268 |
} |
|---|
| 269 |
|
|---|
| 270 |
//--------------------------------------------------------------------- |
|---|
| 271 |
// Part I : Functions for parsing a regular expression string literal. |
|---|
| 272 |
//--------------------------------------------------------------------- |
|---|
| 273 |
// None of these generate any code. |
|---|
| 274 |
|
|---|
| 275 |
// retuns index of first char in regstr which equals ch, or -1 if not found |
|---|
| 276 |
// escaped chars are ignored |
|---|
| 277 |
int unescapedFindFirst(char [] regstr, char ch, int indx = 0) |
|---|
| 278 |
{ |
|---|
| 279 |
if (regstr.length <= indx) |
|---|
| 280 |
return - 1; // not found |
|---|
| 281 |
else if (regstr[indx] == ch) return indx; |
|---|
| 282 |
else if (regstr[indx] == '\\') |
|---|
| 283 |
// if it's escaped, prevent it from matching. |
|---|
| 284 |
return unescapedFindFirst(regstr, ch, indx + 2); |
|---|
| 285 |
else return unescapedFindFirst(regstr, ch, indx + 1); |
|---|
| 286 |
} |
|---|
| 287 |
|
|---|
| 288 |
int sizeOfComment(char [] regstr) |
|---|
| 289 |
{ |
|---|
| 290 |
int indx; |
|---|
| 291 |
if (regstr.length <= indx) |
|---|
| 292 |
return - 1; |
|---|
| 293 |
while (regstr.length > indx ) |
|---|
| 294 |
{ |
|---|
| 295 |
if (regstr[indx] == 13) |
|---|
| 296 |
{ |
|---|
| 297 |
indx++; |
|---|
| 298 |
if (regstr.length>indx && regstr[indx] == 10) |
|---|
| 299 |
return indx; |
|---|
| 300 |
return indx; |
|---|
| 301 |
} |
|---|
| 302 |
if (regstr[indx] == 10) |
|---|
| 303 |
return indx; |
|---|
| 304 |
indx++; |
|---|
| 305 |
} |
|---|
| 306 |
return indx; |
|---|
| 307 |
} |
|---|
| 308 |
|
|---|
| 309 |
// Returns the number of chars at the start of regstr which are made up by |
|---|
| 310 |
// a repetition expression (+, *, ?, {,} ) |
|---|
| 311 |
int quantifierConsumed(char [] regstr) |
|---|
| 312 |
{ |
|---|
| 313 |
if (regstr.length == 0) return 0; |
|---|
| 314 |
else if (regstr[0] == '+' || regstr[0] == '*' || regstr[0] == '?') return 1; |
|---|
| 315 |
else if (regstr[0] == '{') { |
|---|
| 316 |
if (unescapedFindFirst(regstr, '}') == - 1) { |
|---|
| 317 |
assert(0, "\nError: unmatched { in regular expression"); |
|---|
| 318 |
//writefln("Error: unmatched { in regular expression"); |
|---|
| 319 |
//assert(0); |
|---|
| 320 |
} else return 1 + unescapedFindFirst(regstr, '}'); |
|---|
| 321 |
} else return 0; |
|---|
| 322 |
} |
|---|
| 323 |
|
|---|
| 324 |
int quantifiergreedinessConsumed(char [] regstr) |
|---|
| 325 |
{ |
|---|
| 326 |
if (regstr.length == 0) return 0; |
|---|
| 327 |
else if (regstr[0] == '?') return 1; |
|---|
| 328 |
else return 0; |
|---|
| 329 |
} |
|---|
| 330 |
|
|---|
| 331 |
int quantifierpossessivenessConsumed(char [] regstr) |
|---|
| 332 |
{ |
|---|
| 333 |
if (regstr.length == 0) return 0; |
|---|
| 334 |
else if (regstr[0] == '+') return 1; |
|---|
| 335 |
else return 0; |
|---|
| 336 |
} |
|---|
| 337 |
|
|---|
| 338 |
// The minimum allowable number of repetitions for this quantifier |
|---|
| 339 |
uint quantifierMin(char [] regstr) |
|---|
| 340 |
{ |
|---|
| 341 |
if (regstr[0] == '*' || regstr[0] == '?') return 0; |
|---|
| 342 |
else if (regstr[0] == '+') return 1; |
|---|
| 343 |
else { |
|---|
| 344 |
assert (regstr[0] == '{') ; |
|---|
| 345 |
return atoui(regstr[1..$]); |
|---|
| 346 |
} |
|---|
| 347 |
} |
|---|
| 348 |
|
|---|
| 349 |
// The maximum allowable number of repetitions for this quantifier |
|---|
| 350 |
uint quantifierMax(char [] regstr) |
|---|
| 351 |
{ |
|---|
| 352 |
if (regstr[0] == '*' || regstr[0] == '+') return uint.max; |
|---|
| 353 |
else if (regstr[0] == '?') return 1; |
|---|
| 354 |
else if (regstr[0] == '{') { |
|---|
| 355 |
if (unescapedFindFirst(regstr, ',') == - 1) // "{n}" |
|---|
| 356 |
return quantifierMin(regstr); |
|---|
| 357 |
else if (regstr[$ - 2] == ',') // "{n,}" |
|---|
| 358 |
return uint.max; |
|---|
| 359 |
else // "{n,m}" |
|---|
| 360 |
return atoui(regstr[ 1 + unescapedFindFirst(regstr, ',') .. $]); |
|---|
| 361 |
} else { |
|---|
| 362 |
assert(0, "\nError: unsupported quantifier " ~ regstr); |
|---|
| 363 |
|
|---|
| 364 |
} |
|---|
| 365 |
} |
|---|
| 366 |
|
|---|
| 367 |
bool quantifierGreediness(char [] regstr) |
|---|
| 368 |
{ |
|---|
| 369 |
if (regstr.length == 0) { |
|---|
| 370 |
return true; |
|---|
| 371 |
} |
|---|
| 372 |
else if (regstr[0] == '?') { |
|---|
| 373 |
return false; |
|---|
| 374 |
} else { |
|---|
| 375 |
return true; |
|---|
| 376 |
} |
|---|
| 377 |
} |
|---|
| 378 |
|
|---|
| 379 |
bool quantifierPossessiveness(char [] regstr) |
|---|
| 380 |
{ |
|---|
| 381 |
if (regstr.length == 0) { |
|---|
| 382 |
return false; |
|---|
| 383 |
} |
|---|
| 384 |
else if (regstr[0] == '+') { |
|---|
| 385 |
return true; |
|---|
| 386 |
} else { |
|---|
| 387 |
return false; |
|---|
| 388 |
} |
|---|
| 389 |
} |
|---|
| 390 |
|
|---|
| 391 |
// find the index of the first |, or -1 if not found. |
|---|
| 392 |
// ignores escaped items, and anything in parentheses. |
|---|
| 393 |
int findUnion(char [] regstr, bool isx, int indx = 0, int numopenparens = 0) |
|---|
| 394 |
{ |
|---|
| 395 |
int findUnionc; |
|---|
| 396 |
if (indx >= regstr.length) |
|---|
| 397 |
findUnionc = - 1; |
|---|
| 398 |
else if (numopenparens == 0 && regstr[indx] == '|') |
|---|
| 399 |
findUnionc = indx; |
|---|
| 400 |
else if (regstr[indx] == ')') |
|---|
| 401 |
findUnionc = findUnion(regstr, isx, indx + 1, numopenparens - 1); |
|---|
| 402 |
else if (indx + 2<regstr.length && regstr[indx..indx + 3] == "(?:") |
|---|
| 403 |
findUnionc = findUnion(regstr, isx, indx + 3, numopenparens + 1); |
|---|
| 404 |
else if (indx + 2<regstr.length && regstr[indx..indx + 3] == "(?>") |
|---|
| 405 |
findUnionc = findUnion(regstr, isx, indx + 3, numopenparens + 1); |
|---|
| 406 |
else if (indx + 2<regstr.length && regstr[indx..indx + 3] == "(?=") |
|---|
| 407 |
findUnionc = findUnion(regstr, isx, indx + 3, numopenparens + 1); |
|---|
| 408 |
else if (indx + 2<regstr.length && regstr[indx..indx + 3] == "(?!") |
|---|
| 409 |
findUnionc = findUnion(regstr, isx, indx + 3, numopenparens + 1); |
|---|
| 410 |
else if (indx + 2<regstr.length && regstr[indx..indx + 3] == "(?<") |
|---|
| 411 |
findUnionc = findUnion(regstr, isx, indx + 3, numopenparens + 1); |
|---|
| 412 |
else if (indx + 2<regstr.length && regstr[indx..indx + 3] == "(?'") |
|---|
| 413 |
findUnionc = findUnion(regstr, isx, indx + 3, numopenparens + 1); |
|---|
| 414 |
// else if (indx + 2<regstr.length && regstr[indx..indx + 2] == "(?" |
|---|
| 415 |
// && regstr[3]>="1" && regstr[3]<="9") |
|---|
| 416 |
// findUnionc = findUnion(regstr, indx + 3, numopenparens + 1); |
|---|
| 417 |
else if (regstr[indx] == '(') |
|---|
| 418 |
findUnionc = findUnion(regstr, isx, indx + 1, numopenparens + 1); |
|---|
| 419 |
else if (regstr[indx] == '\\') // skip the escaped character |
|---|
| 420 |
findUnionc = findUnion(regstr, isx, indx + 2, numopenparens); |
|---|
| 421 |
else |
|---|
| 422 |
{ |
|---|
| 423 |
if (regstr[indx] == '[') |
|---|
| 424 |
{ |
|---|
| 425 |
int brsize = unescapedFindFirst(regstr[indx..$], ']'); |
|---|
| 426 |
if (brsize == - 1) |
|---|
| 427 |
assert(0, "\nError: unmatched [ in regular expression:"~regstr); |
|---|
| 428 |
findUnionc = findUnion(regstr, isx, indx + 1 + brsize, numopenparens); |
|---|
| 429 |
} |
|---|
| 430 |
else |
|---|
| 431 |
if (isx && regstr[indx] == '#') |
|---|
| 432 |
findUnionc = findUnion(regstr, isx, indx + 1 + sizeOfComment(regstr[indx..$]), numopenparens); |
|---|
| 433 |
else |
|---|
| 434 |
findUnionc = findUnion(regstr, isx, indx + 1, numopenparens); |
|---|
| 435 |
} |
|---|
| 436 |
return findUnionc; |
|---|
| 437 |
} |
|---|
| 438 |
|
|---|
| 439 |
// keeps going until the number of ) parens equals the number of ( parens. |
|---|
| 440 |
// All escaped characters are ignored. |
|---|
| 441 |
// BUG: what about inside [-] ? |
|---|
| 442 |
int parenConsumed(char [] regstr, int numopenparens = 0) |
|---|
| 443 |
{ |
|---|
| 444 |
if (regstr.length == 0) { |
|---|
| 445 |
// pragma(msg, "Unmatched parenthesis"); |
|---|
| 446 |
assert(0,"\nUnmatched parenthesis"); |
|---|
| 447 |
// assert(0); |
|---|
| 448 |
} else if (regstr[0] == ')') { |
|---|
| 449 |
if (numopenparens == 1) return 1; // finished! |
|---|
| 450 |
else return 1 + parenConsumed(regstr[1..$], numopenparens - 1); |
|---|
| 451 |
} else if (regstr.length>2 && regstr[0..3] == "(?:") { |
|---|
| 452 |
return 3 + parenConsumed(regstr[3..$], numopenparens + 1); |
|---|
| 453 |
} else if (regstr.length>2 && regstr[0..3] == "(?>") { |
|---|
| 454 |
return 3 + parenConsumed(regstr[3..$], numopenparens + 1); |
|---|
| 455 |
} else if (regstr.length>2 && regstr[0..3] == "(?=") { |
|---|
| 456 |
return 3 + parenConsumed(regstr[3..$], numopenparens + 1); |
|---|
| 457 |
} else if (regstr.length>2 && regstr[0..3] == "(?!") { |
|---|
| 458 |
return 3 + parenConsumed(regstr[3..$], numopenparens + 1); |
|---|
| 459 |
} else if (regstr.length>2 && (regstr[0..3] == "(?<" || regstr[0..3] == "(?'") ) { |
|---|
| 460 |
uint namesize = groupnameConsumed(regstr[3..$]); |
|---|
| 461 |
if (regstr.length == 3 + namesize || regstr[0..3] == "(?<" && regstr[3 + namesize] != '>' |
|---|
| 462 |
|| regstr[0..3] == "(?'" && regstr[3 + namesize] != '\'') |
|---|
| 463 |
{ |
|---|
| 464 |
if (regstr[0..3] == "(?<") |
|---|
| 465 |
assert(0,"\nClosing > missing in regular expression:"~regstr); |
|---|
| 466 |
else |
|---|
| 467 |
assert(0,"\nClosing ' missing in regular expression:"~regstr); |
|---|
| 468 |
} |
|---|
| 469 |
uint start = 3 + namesize + 1; |
|---|
| 470 |
return start + parenConsumed(regstr[start..$], numopenparens + 1); |
|---|
| 471 |
} else if (regstr[0] == '(') { |
|---|
| 472 |
return 1 + parenConsumed(regstr[1..$], numopenparens + 1); |
|---|
| 473 |
} else if (regstr[0] == '[') { |
|---|
| 474 |
uint brsize = 1 + unescapedFindFirst(regstr, ']'); |
|---|
| 475 |
return brsize + parenConsumed(regstr[brsize..$], numopenparens);; |
|---|
| 476 |
} else if (regstr[0] == '\\' && regstr.length>1) |
|---|
| 477 |
// ignore \(, \). |
|---|
| 478 |
return 2 + parenConsumed(regstr[2..$], numopenparens); |
|---|
| 479 |
else |
|---|
| 480 |
return 1 + parenConsumed(regstr[1..$], numopenparens); |
|---|
| 481 |
} |
|---|
| 482 |
|
|---|
| 483 |
// the naked term, with no quantifier. Either an atom, or a subsequence. |
|---|
| 484 |
int atomConsumed(char [] regstr) |
|---|
| 485 |
{ |
|---|
| 486 |
int atomConsumedc; |
|---|
| 487 |
//pragma(msg,"atom consumed " ~ regstr); |
|---|
| 488 |
if (regstr.length>2 && regstr[0..3] == "(?:") atomConsumedc = parenConsumed(regstr); |
|---|
| 489 |
else if (regstr.length>2 && regstr[0..3] == "(?>") atomConsumedc = parenConsumed(regstr); |
|---|
| 490 |
else if (regstr.length>2 && regstr[0..3] == "(?=") atomConsumedc = parenConsumed(regstr); |
|---|
| 491 |
else if (regstr.length>2 && regstr[0..3] == "(?!") atomConsumedc = parenConsumed(regstr); |
|---|
| 492 |
else if (regstr.length>2 && regstr[0..3] == "(?<") atomConsumedc = parenConsumed(regstr); |
|---|
| 493 |
else if (regstr.length>2 && regstr[0..3] == "(?'") atomConsumedc = parenConsumed(regstr); |
|---|
| 494 |
else if (regstr[0] == '(') atomConsumedc = parenConsumed(regstr); |
|---|
| 495 |
else if (regstr[0] == '[') atomConsumedc = 1 + unescapedFindFirst(regstr, ']'); |
|---|
| 496 |
else if (regstr[0] == ')') {assert(0, "\nError: ) encountered without an opening ( in regular expression"~regstr);} |
|---|
| 497 |
else if (regstr[0] == ']') {assert(0, "\nError: ] encountered without an opening [ in regular expression"~regstr);} |
|---|
| 498 |
else if (regstr[0] == '\\') { // escape character |
|---|
| 499 |
if (regstr.length>1) { |
|---|
| 500 |
if (regstr[1] == 'k') |
|---|
| 501 |
{ |
|---|
| 502 |
if (regstr.length>2) |
|---|
| 503 |
{ |
|---|
| 504 |
if (regstr[2] == '<') |
|---|
| 505 |
{ |
|---|
| 506 |
uint namesize = groupnameConsumed(regstr[3..$]); |
|---|
| 507 |
if (regstr.length == 3 + namesize || regstr[3 + namesize] != '>') |
|---|
| 508 |
{ |
|---|
| 509 |
assert(0,"\nClosing > missing in regular expression"~regstr); |
|---|
| 510 |
} |
|---|
| 511 |
return atomConsumedc = 3 + namesize + 1; |
|---|
| 512 |
} |
|---|
| 513 |
if (regstr[2] == '\'') |
|---|
| 514 |
{ |
|---|
| 515 |
uint namesize = groupnameConsumed(regstr[3..$]); |
|---|
| 516 |
if (regstr.length == 3 + namesize || regstr[3 + namesize] != '>') |
|---|
| 517 |
{ |
|---|
| 518 |
assert(0,"\nClosing ' missing in regular expression"~regstr); |
|---|
| 519 |
} |
|---|
| 520 |
return atomConsumedc = 3 + namesize + 1; |
|---|
| 521 |
} |
|---|
| 522 |
assert(0,"\nError: \\k must be followed by 'name' or <name> in regular expression"~regstr); |
|---|
| 523 |
} |
|---|
| 524 |
else |
|---|
| 525 |
assert(0, "\nError: \\k must be followed by 'name' or <name> in regular expression"~regstr); |
|---|
| 526 |
} |
|---|
| 527 |
else |
|---|
| 528 |
atomConsumedc = 2; |
|---|
| 529 |
} else { |
|---|
| 530 |
assert(0, "\nError: Regexp must not end with \\ in regular expression"~regstr); |
|---|
| 531 |
// writefln("Error: Regexp must not end with \\ "); |
|---|
| 532 |
// assert(0); |
|---|
| 533 |
} |
|---|
| 534 |
} else if (regstr[0] == '@') { // NONSTANDARD: referenced parameter |
|---|
| 535 |
atomConsumedc = 2; |
|---|
| 536 |
} else atomConsumedc = 1; // match single char |
|---|
| 537 |
return atomConsumedc; |
|---|
| 538 |
} |
|---|
| 539 |
|
|---|
| 540 |
int groupnameConsumed(char [] regstr) |
|---|
| 541 |
{ |
|---|
| 542 |
int pp = 0; |
|---|
| 543 |
if (regstr.length>0 && ( |
|---|
| 544 |
regstr[pp] >= 'a' && regstr[pp] <= 'z' || regstr[pp] >= 'A' && regstr[pp] <= 'Z')) |
|---|
| 545 |
{ |
|---|
| 546 |
while (regstr.length>0 && (regstr[pp] >= 'a' && regstr[pp] <= 'z' |
|---|
| 547 |
|| regstr[pp] >= 'A' && regstr[pp] <= 'Z' |
|---|
| 548 |
|| regstr[pp] >= '0' && regstr[pp] <= '9' |
|---|
| 549 |
)) |
|---|
| 550 |
{ |
|---|
| 551 |
pp++; |
|---|
| 552 |
|
|---|
| 553 |
} |
|---|
| 554 |
return pp; |
|---|
| 555 |
} |
|---|
| 556 |
else |
|---|
| 557 |
assert(0, "\nError: Name of the group is missing after (?< in regular expression"~regstr); |
|---|
| 558 |
return 0; |
|---|
| 559 |
} |
|---|
| 560 |
|
|---|
| 561 |
int atomCharacterConsumed(char [] regstr, bool isx, out int whitespaceno) |
|---|
| 562 |
{ |
|---|
| 563 |
int atomConsumedc; |
|---|
| 564 |
// if (options["x"]) |
|---|
| 565 |
// { |
|---|
| 566 |
whitespaceno = 0; |
|---|
| 567 |
|
|---|
| 568 |
if (regstr[0] ==' ' || regstr[0] == '\t' || regstr[0] == '\n') |
|---|
| 569 |
{ |
|---|
| 570 |
whitespaceno = 1; |
|---|
| 571 |
} |
|---|
| 572 |
if (isx && regstr[0] == '#') |
|---|
| 573 |
return 0; |
|---|
| 574 |
// } |
|---|
| 575 |
//pragma(msg,"atom consumed " ~ regstr); |
|---|
| 576 |
if (regstr[0] == '\\') { // escape character |
|---|
| 577 |
if (regstr.length>1) { |
|---|
| 578 |
if (!((regstr[1] >= '0' && regstr[1] <= '9') || regstr[1] == 's' |
|---|
| 579 |
|| regstr[1] == 'd' || regstr[1] == 'w' |
|---|
| 580 |
|| regstr[1] == 'S' || regstr[1] == 'k' |
|---|
| 581 |
|| regstr[1] == 'D' || regstr[1] == 'W' |
|---|
| 582 |
|| regstr[1] == 'A' || regstr[1] == 'z' )) |
|---|
| 583 |
atomConsumedc = 2; |
|---|
| 584 |
else |
|---|
| 585 |
atomConsumedc = 0; |
|---|
| 586 |
} else { |
|---|
| 587 |
assert(0, "\nError: Regexp must not end with \\"); |
|---|
| 588 |
// writefln("Error: Regexp must not end with \\ "); |
|---|
| 589 |
// assert(0); |
|---|
| 590 |
} |
|---|
| 591 |
} else if (regstr[0] == '@' || regstr[0] == '$' || regstr[0] == '^' |
|---|
| 592 |
|| regstr[0] == '.' || regstr[0] == '[' || regstr[0] == '(' ) { // NONSTANDARD: referenced parameter |
|---|
| 593 |
atomConsumedc = 0; |
|---|
| 594 |
} else |
|---|
| 595 |
atomConsumedc = 1; // match single char |
|---|
| 596 |
return atomConsumedc; |
|---|
| 597 |
} |
|---|
| 598 |
|
|---|
| 599 |
// parses a term from the front of regstr (which must not be empty). |
|---|
| 600 |
// consisting of an atom, optionally followed by a quantifier. |
|---|
| 601 |
int termConsumed(char [] regstr) |
|---|
| 602 |
{ |
|---|
| 603 |
/* int atomC = atomConsumed(regstr); |
|---|
| 604 |
int quantifierC= quantifierConsumed(regstr[atomC..$]); |
|---|
| 605 |
int quantifiergreedinessC=quantifiergreedinessConsumed(regstr[atomC + quantifierC ..$]); |
|---|
| 606 |
int termConsumed = atomC + quantifierC+quantifiergreedinessC; |
|---|
| 607 |
return termConsumed;*/ |
|---|
| 608 |
uint ac = atomConsumed(regstr); |
|---|
| 609 |
return ac + |
|---|
| 610 |
quantifierConsumed(regstr[ac..$]) + |
|---|
| 611 |
quantifiergreedinessConsumed(regstr[ac + quantifierConsumed(regstr[ac..$]) ..$]) + |
|---|
| 612 |
quantifierpossessivenessConsumed(regstr[ac + quantifierConsumed(regstr[ac..$]) ..$]); |
|---|
| 613 |
} |
|---|
| 614 |
|
|---|
| 615 |
//parses a character sequence without quantifiers |
|---|
| 616 |
int characterSequenceConsumed(char []regstr, bool [char[]] options, out int realchars) |
|---|
| 617 |
{ |
|---|
| 618 |
realchars = 0; |
|---|
| 619 |
if (regstr.length == 0) |
|---|
| 620 |
{ |
|---|
| 621 |
return 0; |
|---|
| 622 |
} |
|---|
| 623 |
int whitespaceno; |
|---|
| 624 |
int atomC = atomCharacterConsumed(regstr, options["x"], whitespaceno); |
|---|
| 625 |
if (atomC>0) |
|---|
| 626 |
{ |
|---|
| 627 |
int quantifierC = quantifierConsumed(regstr[atomC..$]); |
|---|
| 628 |
if (quantifierC == 0) |
|---|
| 629 |
{ |
|---|
| 630 |
if (options["x"] && whitespaceno>0) |
|---|
| 631 |
{ |
|---|
| 632 |
} |
|---|
| 633 |
else |
|---|
| 634 |
{ |
|---|
| 635 |
realchars = 1; |
|---|
| 636 |
} |
|---|
| 637 |
int crealchars; |
|---|
| 638 |
atomC += characterSequenceConsumed(regstr[atomC..$], options, crealchars); |
|---|
| 639 |
realchars += crealchars; |
|---|
| 640 |
} |
|---|
| 641 |
else |
|---|
| 642 |
{ |
|---|
| 643 |
atomC = 0; |
|---|
| 644 |
} |
|---|
| 645 |
} |
|---|
| 646 |
return atomC; |
|---|
| 647 |
} |
|---|
| 648 |
|
|---|
| 649 |
//--------------------------------------------------------------------- |
|---|
| 650 |
// Part II: mixins which generate the final code |
|---|
| 651 |
//--------------------------------------------------------------------- |
|---|
| 652 |
// Unlike most regexp engines, which turn the pattern string into a table-based state machine, |
|---|
| 653 |
// this one generates a binary tree of nested functions. Each node in the tree corresponds to |
|---|
| 654 |
// a D template, and is generated as a mixin. |
|---|
| 655 |
|
|---|
| 656 |
// At compile time, each ctfe is passed a subset of a regexp string. |
|---|
| 657 |
// It generates a member function bool aname(), which updates a pointer p, |
|---|
| 658 |
// and returns true if a match was found. |
|---|
| 659 |
|
|---|
| 660 |
// Each ctfe has access to the following values: |
|---|
| 661 |
// At compile time: |
|---|
| 662 |
// fullpattern -- the complete, unparsed regular expression string |
|---|
| 663 |
// At run time: |
|---|
| 664 |
// searchstr (read only) -- the string being searched |
|---|
| 665 |
// p --- the first character in searchstr which is not yet matched. |
|---|
| 666 |
// param[0..8] -- the quasi-static parameter strings @1..@9 to match. |
|---|
| 667 |
|
|---|
| 668 |
// Additional variables or constants can be added as desired. |
|---|
| 669 |
|
|---|
| 670 |
// Most of the complexity in the regexp engine comes from the optional quantifiers. |
|---|
| 671 |
// In general, they can only determine how far to match by testing if the entire remainder |
|---|
| 672 |
// of the pattern can be matched. |
|---|
| 673 |
// |
|---|
| 674 |
// Each ctfe also recieves a function 'next'. This has a member bool fn() which |
|---|
| 675 |
// returns true if the remainder of the regexp match is successful. |
|---|
| 676 |
// All regexps must ensure that next is called. |
|---|
| 677 |
|
|---|
| 678 |
// Note that unless p is reset to 0, it will automatically behave as a global search, |
|---|
| 679 |
// continuing from the last place it left off. |
|---|
| 680 |
|
|---|
| 681 |
|
|---|
| 682 |
int findOptions(char [] pattern) |
|---|
| 683 |
{ |
|---|
| 684 |
for (int i = 0;i<pattern.length;i++) |
|---|
| 685 |
{ |
|---|
| 686 |
if (pattern[pattern.length - 1 - i..pattern.length - i] == "/" ) |
|---|
| 687 |
{ |
|---|
| 688 |
if (pattern.length - 2 - i >= 0 && pattern[pattern.length - 2 - i] == '\\') |
|---|
| 689 |
{ |
|---|
| 690 |
return pattern.length; |
|---|
| 691 |
} |
|---|
| 692 |
return pattern.length - i - 1; |
|---|
| 693 |
} |
|---|
| 694 |
} |
|---|
| 695 |
return pattern.length; |
|---|
| 696 |
} |
|---|
| 697 |
char[] removeGroupCode(char[] code) |
|---|
| 698 |
{ |
|---|
| 699 |
char[] code2; |
|---|
| 700 |
for (int p = 0;p<code.length - 10;p++) |
|---|
| 701 |
{ |
|---|
| 702 |
if (code[p..p + 9] == "//Regexp:") |
|---|
| 703 |
{ |
|---|
| 704 |
while (code[p] != 10 && code[p] != 13) |
|---|
| 705 |
{ |
|---|
| 706 |
code2~=code[p]; |
|---|
| 707 |
p++; |
|---|
| 708 |
} |
|---|
| 709 |
} |
|---|
| 710 |
if (code[p..p + 10] == "/*gstart*/") |
|---|
| 711 |
{ |
|---|
| 712 |
p += 10; |
|---|
| 713 |
while (code[p..p + 8] != "/*gend*/") |
|---|
| 714 |
{ |
|---|
| 715 |
p++; |
|---|
| 716 |
} |
|---|
| 717 |
p += 7; |
|---|
| 718 |
} |
|---|
| 719 |
else |
|---|
| 720 |
code2~=code[p]; |
|---|
| 721 |
} |
|---|
| 722 |
code2~=code[code.length - 10..code.length]; |
|---|
| 723 |
return code2; |
|---|
| 724 |
} |
|---|
| 725 |
|
|---|
| 726 |
char[] parseRegexp(char [] pattern, bool getcode = false) |
|---|
| 727 |
{ |
|---|
| 728 |
char[] endSequence = alwaysTrue() ; |
|---|
| 729 |
int groupno = 0; |
|---|
| 730 |
char[] pattern2; |
|---|
| 731 |
if (pattern.length>0 && pattern[0] == '/' ) |
|---|
| 732 |
{ |
|---|
| 733 |
pattern2 = pattern[1..$]; |
|---|
| 734 |
} |
|---|
| 735 |
else |
|---|
| 736 |
{ |
|---|
| 737 |
pattern2 = pattern; |
|---|
| 738 |
} |
|---|
| 739 |
int opt = findOptions(pattern2); |
|---|
| 740 |
bool [char[]] options = ["i":false, "x":false, "s":false, "m":false]; |
|---|
| 741 |
// options["i"]=false; |
|---|
| 742 |
|
|---|
| 743 |
if (opt != pattern2.length) |
|---|
| 744 |
{ |
|---|
| 745 |
foreach(c;pattern2[opt + 1..$]) |
|---|
| 746 |
{ |
|---|
| 747 |
options[""~c] = true; |
|---|
| 748 |
} |
|---|
| 749 |
//opt--; |
|---|
| 750 |
// assert(0,pattern~tostring(opt)); |
|---|
| 751 |
} |
|---|
| 752 |
int globalfuncno; |
|---|
| 753 |
char[][] groupnames; |
|---|
| 754 |
char[] groupdcl; |
|---|
| 755 |
char[] code = "//Regexp:"~toLiteralString(pattern2)~ "\n"~ |
|---|
| 756 |
endSequence~regSequence("engine", groupno, pattern2[0..opt], |
|---|
| 757 |
options, "next_alwaystrue", groupnames, groupdcl, globalfuncno); |
|---|
| 758 |
char[] decl; |
|---|
| 759 |
/*for (int i = 1;i <= groupno;i++) |
|---|
| 760 |
{ |
|---|
| 761 |
decl~="int bracketend"~tostring(i)~"=-1;\n"; |
|---|
| 762 |
}*/ |
|---|
| 763 |
if (groupno >= 1) |
|---|
| 764 |
decl~="int bracketend["~tostring(groupno + 1)~"]=-1;\n"; |
|---|
| 765 |
|
|---|
| 766 |
|
|---|
| 767 |
// if (groupnames.length>0) |
|---|
| 768 |
{ |
|---|
| 769 |
decl~="struct groupnamerec {\n"~groupdcl; |
|---|
| 770 |
/* foreach (gname; groupnames.keys) |
|---|
| 771 |
{ |
|---|
| 772 |
|
|---|
| 773 |
}*/ |
|---|
| 774 |
/*foreach (gname,gno; groupnames) |
|---|
| 775 |
{ |
|---|
| 776 |
decl~="uint "~gname~"="~tostring(gno)~";\n"; |
|---|
| 777 |
}*/ |
|---|
| 778 |
decl~="}\ngroupnamerec groupname;\n"; |
|---|
| 779 |
} |
|---|
| 780 |
if (getcode && groupdcl.length>5) |
|---|
| 781 |
for (int i = 0;i<groupdcl.length - 5;i++) |
|---|
| 782 |
{ |
|---|
| 783 |
if (groupdcl[i..i + 5] =="uint ") |
|---|
| 784 |
{ |
|---|
| 785 |
|
|---|
| 786 |
for (int j = i + 5;j<groupdcl.length;j++) |
|---|
| 787 |
{ |
|---|
| 788 |
if (groupdcl[j] == '=') |
|---|
| 789 |
{ |
|---|
| 790 |
decl~="char[] get"~ |
|---|
| 791 |
groupdcl[i + 5..j]~"(){ return _(groupname."~groupdcl[i + 5..j]~");}\n"; |
|---|
| 792 |
|
|---|
| 793 |
} |
|---|
| 794 |
} |
|---|
| 795 |
} |
|---|
| 796 |
} |
|---|
| 797 |
//assert(0,"at the end , the groupno is "~tostring(groupno)); |
|---|
| 798 |
if (groupno == 0) //remove group related code |
|---|
| 799 |
{ |
|---|
| 800 |
code = removeGroupCode(code); |
|---|
| 801 |
} |
|---|
| 802 |
return decl~code; |
|---|
| 803 |
} |
|---|
| 804 |
|
|---|
| 805 |
char[] alwaysTrue() // used to mark the end of a sequence |
|---|
| 806 |
{ |
|---|
| 807 |
return "bool next_alwaystrue () { return true; }\n"; |
|---|
| 808 |
} |
|---|
| 809 |
private struct retSequence |
|---|
| 810 |
{ |
|---|
| 811 |
char[] code; |
|---|
| 812 |
int groupno; |
|---|
| 813 |
} |
|---|
| 814 |
|
|---|
| 815 |
bool getFirstChar(char [] regstr,char c) |
|---|
| 816 |
{ |
|---|
| 817 |
return false; |
|---|
| 818 |
} |
|---|
| 819 |
|
|---|
| 820 |
|
|---|
| 821 |
// regstr is a sequence of productions, possibly containing a union |
|---|
| 822 |
char[] regSequence(char fnname[], ref int groupno, char [] regstr, |
|---|
| 823 |
bool[char[]] options, char[] next, |
|---|
| 824 |
ref char[][] groupnames, ref char[] groupdcl, ref int globalfuncno) |
|---|
| 825 |
{ |
|---|
| 826 |
char [] code = ""; |
|---|
| 827 |
int fu = findUnion(regstr, options["x"]); |
|---|
| 828 |
//Stdout("regSequence:"~regstr).newline; |
|---|
| 829 |
if (fu == - 1) { |
|---|
| 830 |
// No unions to worry about |
|---|
| 831 |
|
|---|
| 832 |
code = regNoUnions(fnname, groupno, regstr, options, next, groupnames, groupdcl, globalfuncno); |
|---|
| 833 |
} else { |
|---|
| 834 |
int[][] cases; |
|---|
| 835 |
bool[] validcase; |
|---|
| 836 |
for (int j=0;j<256;j++) |
|---|
| 837 |
{ |
|---|
| 838 |
validcase~=false; |
|---|
| 839 |
cases~=[]; |
|---|
| 840 |
} |
|---|
| 841 |
int[] altgroupno; |
|---|
| 842 |
int altno = 1; |
|---|
| 843 |
int tofu; |
|---|
| 844 |
tofu = - 1; |
|---|
| 845 |
int newfu = fu; |
|---|
| 846 |
char[] controllercode; |
|---|
| 847 |
int nocases = 0; |
|---|
| 848 |
altgroupno~=0;//dummy |
|---|
| 849 |
controllercode~= " |
|---|
| 850 |
int oldp = p;\n"; |
|---|
| 851 |
code ~=" |
|---|
| 852 |
bool "~ fnname~ "() { //regSequence options |
|---|
| 853 |
|
|---|
| 854 |
"; |
|---|
| 855 |
do |
|---|
| 856 |
{ |
|---|
| 857 |
fu = tofu + 1; |
|---|
| 858 |
tofu = newfu+tofu + 1; |
|---|
| 859 |
char c; |
|---|
| 860 |
bool firstchar = false; |
|---|
| 861 |
nocases++; |
|---|
| 862 |
altgroupno~=groupno; |
|---|
| 863 |
// Stdout.format("{} {}\n",fu,tofu); |
|---|
| 864 |
if (getFirstChar(regstr[fu..tofu], c)) |
|---|
| 865 |
{ |
|---|
| 866 |
cases[c]~=altno; |
|---|
| 867 |
validcase[c] = true; |
|---|
| 868 |
} |
|---|
| 869 |
else |
|---|
| 870 |
{ // wrap it up |
|---|
| 871 |
if (nocases<4) // 2+ 1 |
|---|
| 872 |
{ |
|---|
| 873 |
for (int j = nocases - 1;j >= 1 ;j--) |
|---|
| 874 |
{ |
|---|
| 875 |
controllercode~= " |
|---|
| 876 |
if (option"~tostring(altno - j)~"()) return true; |
|---|
| 877 |
p = oldp;/*gstart*/ |
|---|
| 878 |
group.length="~tostring(altgroupno[altno - j] + 1)~";/*gend*/ |
|---|
| 879 |
"; |
|---|
| 880 |
} |
|---|
| 881 |
} |
|---|
| 882 |
else //more than 2 consecutive cases go into a switch |
|---|
| 883 |
{ |
|---|
| 884 |
controllercode~=" |
|---|
| 885 |
if (searchstr.length>p) |
|---|
| 886 |
switch(searchstr[p]) {"; |
|---|
| 887 |
for (int cc = 0;cc<256 ;cc++) |
|---|
| 888 |
{ |
|---|
| 889 |
if (validcase[cc]) |
|---|
| 890 |
{ |
|---|
| 891 |
controllercode~="case '"~"':"; |
|---|
| 892 |
for (int j = 0;j<cases[cc].length;j++) |
|---|
| 893 |
{ |
|---|
| 894 |
controllercode~= " |
|---|
| 895 |
if (option"~tostring(cases[cc][j])~"()) return true; |
|---|
| 896 |
p = oldp;/*gstart*/ |
|---|
| 897 |
group.length="~tostring(altgroupno[cases[cc][j]] + 1)~";/*gend*/ |
|---|
| 898 |
"; |
|---|
| 899 |
|
|---|
| 900 |
} |
|---|
| 901 |
controllercode~="break;\n"; |
|---|
| 902 |
|
|---|
| 903 |
validcase[cc]=false; |
|---|
| 904 |
cases[cc]=[]; |
|---|
| 905 |
} |
|---|
| 906 |
} |
|---|
| 907 |
controllercode~="default;\n"; |
|---|
| 908 |
} ; |
|---|
| 909 |
nocases=0; |
|---|
| 910 |
controllercode~= " |
|---|
| 911 |
if (option"~tostring(altno)~"()) return true; |
|---|
| 912 |
p = oldp;/*gstart*/ |
|---|
| 913 |
group.length="~tostring(altgroupno[altno] + 1)~";/*gend*/ |
|---|
| 914 |
"; |
|---|
| 915 |
|
|---|
| 916 |
} |
|---|
| 917 |
|
|---|
| 918 |
|
|---|
| 919 |
code ~=regSequence("option"~tostring(altno), groupno, regstr[fu..tofu], |
|---|
| 920 |
options, next, groupnames, groupdcl, globalfuncno); |
|---|
| 921 |
|
|---|
| 922 |
altno++; |
|---|
| 923 |
|
|---|
| 924 |
newfu = findUnion(regstr[tofu + 1..$], options["x"]); |
|---|
| 925 |
} while (newfu != - 1); |
|---|
| 926 |
|
|---|
| 927 |
code~= |
|---|
| 928 |
regSequence("option"~tostring(altno), groupno, regstr[tofu + 1..$], options, next, groupnames, groupdcl, globalfuncno) ; |
|---|
| 929 |
controllercode~=" |
|---|
| 930 |
if (option"~tostring(altno)~"()) return true; |
|---|
| 931 |
p = oldp;/*gstart*/ |
|---|
| 932 |
group.length="~tostring(groupno + 1)~";/*gend*/ |
|---|
| 933 |
// writefln(\"regSequence\",~"~toLiteralString(regstr)~"); |
|---|
| 934 |
return false; |
|---|
| 935 |
}"; |
|---|
| 936 |
code~=controllercode; |
|---|
| 937 |
} |
|---|
| 938 |
return code; |
|---|
| 939 |
} |
|---|
| 940 |
|
|---|
| 941 |
int countGroups(char [] regstr) |
|---|
| 942 |
{ |
|---|
| 943 |
if (regstr.length == 0) |
|---|
| 944 |
{ |
|---|
| 945 |
return 0; |
|---|
| 946 |
} |
|---|
| 947 |
else if (regstr.length >2 && regstr[0..3] == "(?:") |
|---|
| 948 |
{ |
|---|
| 949 |
return 0; |
|---|
| 950 |
} |
|---|
| 951 |
else if (regstr.length >2 && regstr[0..3] == "(?>") |
|---|
| 952 |
{ |
|---|
| 953 |
return 0; |
|---|
| 954 |
} |
|---|
| 955 |
else if (regstr.length >2 && regstr[0..3] == "(?=") |
|---|
| 956 |
{ |
|---|
| 957 |
return 0; |
|---|
| 958 |
} |
|---|
| 959 |
else if (regstr.length >2 && regstr[0..3] == "(?!") |
|---|
| 960 |
{ |
|---|
| 961 |
return 0; |
|---|
| 962 |
} |
|---|
| 963 |
else if (regstr[0] == '(') |
|---|
| 964 |
{ |
|---|
| 965 |
return 1; |
|---|
| 966 |
} |
|---|
| 967 |
else |
|---|
| 968 |
{ |
|---|
| 969 |
return 0; |
|---|
| 970 |
} |
|---|
| 971 |
} |
|---|
| 972 |
|
|---|
| 973 |
int findLastLetter(char[] fnname) |
|---|
| 974 |
{ |
|---|
| 975 |
int i = 0; |
|---|
| 976 |
foreach (c;fnname) |
|---|
| 977 |
{ |
|---|
| 978 |
if (!((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z'))) |
|---|
| 979 |
{ |
|---|
| 980 |
return i; |
|---|
| 981 |
} |
|---|
| 982 |
i++; |
|---|
| 983 |
} |
|---|
| 984 |
return i; |
|---|
| 985 |
} |
|---|
| 986 |
|
|---|
| 987 |
// regstr is a sequence of terms, all of which must be matched |
|---|
| 988 |
// Does not contain any unions |
|---|
| 989 |
char[] regNoUnions(char[] fnname, ref int groupno, char [] regstr, |
|---|
| 990 |
bool [char []] options, char[] next, |
|---|
| 991 |
ref char[][] groupnames, ref char[] groupdcl, ref int globalfuncno) |
|---|
| 992 |
{ |
|---|
| 993 |
char[] code; |
|---|
| 994 |
int skipcs; |
|---|
| 995 |
int skip; |
|---|
| 996 |
// assert(0,"no way "~regstr); |
|---|
| 997 |
// writefln("regNoUnions "~regstr); |
|---|
| 998 |
//if (regstr.length>0) |
|---|
| 999 |
// { |
|---|
| 1000 |
if (regstr == "") |
|---|
| 1001 |
{ |
|---|
| 1002 |
return "bool "~fnname~"(){ return "~next~"();}"; |
|---|
| 1003 |
} |
|---|
| 1004 |
code = |
|---|
| 1005 |
"//regNoUnions "~toLiteralString(regstr[0..termConsumed(regstr)]) ~"\n"; |
|---|
| 1006 |
// } |
|---|
| 1007 |
if (options["x"]) |
|---|
| 1008 |
{ |
|---|
| 1009 |
int i = 0; |
|---|
| 1010 |
while (i<regstr.length && (regstr[i] ==' ' || regstr[i] == '\t' || regstr[i] == '\n')) |
|---|
| 1011 |
{ |
|---|
| 1012 |
i += 1; |
|---|
| 1013 |
} |
|---|
| 1014 |
if (i<regstr.length && regstr[i] == '#') |
|---|
| 1015 |
{ |
|---|
| 1016 |
int soc = sizeOfComment(regstr[i..$]); |
|---|
| 1017 |
if (soc>0) |
|---|
| 1018 |
i += 1 + soc; |
|---|
| 1019 |
} |
|---|
| 1020 |
if (i>0) |
|---|
| 1021 |
{ |
|---|
| 1022 |
if (i<regstr.length) |
|---|
| 1023 |
{ |
|---|
| 1024 |
return regNoUnions(fnname, groupno, regstr[i..$], options, next, groupnames, groupdcl, globalfuncno); |
|---|
| 1025 |
} |
|---|
| 1026 |
else |
|---|
| 1027 |
{ |
|---|
| 1028 |
return "bool "~fnname~"(){ return "~next~"();}"; |
|---|
| 1029 |
} |
|---|
| 1030 |
} |
|---|
| 1031 |
} |
|---|
| 1032 |
//assert((regstr.length ==0),"regstr.length cannot be zero " ~ regstr ); |
|---|
| 1033 |
if (regstr.length == termConsumed(regstr)) { |
|---|
| 1034 |
// there's only a single item (possibly including a quantifier) |
|---|
| 1035 |
// pragma(msg, "\nhere at the moment3"); |
|---|
| 1036 |
code~= regTerm(fnname, groupno, regstr, options, next, groupnames, groupdcl, globalfuncno); |
|---|
| 1037 |
// pragma(msg, "\nhere at the moment4"); |
|---|
| 1038 |
} else { |
|---|
| 1039 |
int realskipcs; |
|---|
| 1040 |
skipcs = characterSequenceConsumed(regstr, options, realskipcs); |
|---|
| 1041 |
if (regstr.length == skipcs) |
|---|
| 1042 |
{ |
|---|
| 1043 |
code~= regCharacterSequence(fnname, groupno, regstr, options, next); |
|---|
| 1044 |
} |
|---|
| 1045 |
else |
|---|
| 1046 |
{ |
|---|
| 1047 |
// writefln(skipcs); |
|---|
| 1048 |
//int skip; |
|---|
| 1049 |
char[] second; |
|---|
| 1050 |
if (realskipcs>1) |
|---|
| 1051 |
{ |
|---|
| 1052 |
skip = skipcs; |
|---|
| 1053 |
|
|---|
| 1054 |
} |
|---|
| 1055 |
else |
|---|
| 1056 |
{ |
|---|
| 1057 |
skip = termConsumed(regstr); |
|---|
| 1058 |
} |
|---|
| 1059 |
// int g=groupno+countGroups(regstr); |
|---|
| 1060 |
int g = groupno; |
|---|
| 1061 |
char[] newnextname = "next"~tostring(globalfuncno); |
|---|
| 1062 |
globalfuncno++; |
|---|
| 1063 |
if (realskipcs>1) |
|---|
| 1064 |
{ |
|---|
| 1065 |
// assert(0,"hi"~tostring(skip)~regstr); |
|---|
| 1066 |
second = regSequence(newnextname, groupno, regstr[skip..$], |
|---|
| 1067 |
options, next, groupnames, groupdcl, globalfuncno); |
|---|
| 1068 |
code~= "bool "~fnname~"() { |
|---|
| 1069 |
|
|---|
| 1070 |
"~second~regCharacterSequence("first", g, regstr[0..skip], options, newnextname)~" // regTerm |
|---|
| 1071 |
return first(); |
|---|
| 1072 |
} |
|---|
| 1073 |
"; |
|---|
| 1074 |
} |
|---|
| 1075 |
else |
|---|
| 1076 |
{ |
|---|
| 1077 |
int oldgroupno = groupno; |
|---|
| 1078 |
int glfno = 0; |
|---|
| 1079 |
char[][] groupnamesdummy; |
|---|
| 1080 |
char[] groupdcldummy; |
|---|
| 1081 |
//regTerm("", groupno, regstr[0..skip], options, "", groupnames,groupdcl,glfno); //just to get groupno |
|---|
| 1082 |
char[] first = regTerm("first", groupno, regstr[0..skip], options, newnextname, groupnames, groupdcl, globalfuncno); |
|---|
| 1083 |
second = regSequence(newnextname, groupno, regstr[skip..$], |
|---|
| 1084 |
options, next, groupnames, groupdcl, globalfuncno); |
|---|
| 1085 |
code~=" |
|---|
| 1086 |
bool "~fnname~"() { |
|---|
| 1087 |
"~second~first~" // regTerm |
|---|
| 1088 |
return first(); |
|---|
| 1089 |
} |
|---|
| 1090 |
"; |
|---|
| 1091 |
} |
|---|
| 1092 |
} |
|---|
| 1093 |
} |
|---|
| 1094 |
return code; |
|---|
| 1095 |
} |
|---|
| 1096 |
char[] regStop(char[] fnname, int groupno, char[] next) |
|---|
| 1097 |
{ |
|---|
| 1098 |
return " |
|---|
| 1099 |
bool "~fnname~"() { //regStop |
|---|
| 1100 |
""bracketend["~tostring(groupno)~"]=p; |
|---|
| 1101 |
return "~next~"(); |
|---|
| 1102 |
}"; |
|---|
| 1103 |
} |
|---|
| 1104 |
|
|---|
| 1105 |
char[] regCharacterSequence(char[] fnname, int groupno, char [] regstr, bool [char []] options, char[] next) |
|---|
| 1106 |
{ |
|---|
| 1107 |
int i = 0; |
|---|
| 1108 |
int matchsize = 0; |
|---|
| 1109 |
char [] match = ""; |
|---|
| 1110 |
char[] code; |
|---|
| 1111 |
while (i<regstr.length) |
|---|
| 1112 |
{ |
|---|
| 1113 |
if (regstr.length + i>1 && regstr[i] == '\\') { |
|---|
| 1114 |
match~=toLiteralString(regstr[i + 1]); |
|---|
| 1115 |
i += 2; |
|---|
| 1116 |
matchsize++; |
|---|
| 1117 |
} else { |
|---|
| 1118 |
if (options["x"] && (regstr[i] ==' ' || regstr[i] == '\t' || regstr[i] == '\n')) |
|---|
| 1119 |
{ |
|---|
| 1120 |
} |
|---|
| 1121 |
else |
|---|
| 1122 |
{ |
|---|
| 1123 |
match~=toLiteralString(regstr[i]); |
|---|
| 1124 |
matchsize++; |
|---|
| 1125 |
} |
|---|
| 1126 |
i++; |
|---|
| 1127 |
// match single character |
|---|
| 1128 |
} |
|---|
| 1129 |
|
|---|
| 1130 |
} |
|---|
| 1131 |
if (!options["i"]) |
|---|
| 1132 |
{ |
|---|
| 1133 |
// writefln("match "~match); |
|---|
| 1134 |
code ="bool "~fnname~"() { |
|---|
| 1135 |
if (p+"~tostring(matchsize)~">searchstr.length || searchstr[p..p+"~tostring(matchsize)~"]!=\""~match~"\") return false; |
|---|
| 1136 |
p+="~tostring(matchsize)~"; |
|---|
| 1137 |
return "~next~"(); |
|---|
| 1138 |
} |
|---|
| 1139 |
"; |
|---|
| 1140 |
} |
|---|
| 1141 |
else |
|---|
| 1142 |
{ |
|---|
| 1143 |
code ="bool "~fnname~"() { |
|---|
| 1144 |
if (p+"~tostring(matchsize)~">searchstr.length || icmp(searchstr[p..p+"~tostring(matchsize)~"],\""~match~"\")!=0) return false; |
|---|
| 1145 |
p+="~tostring(matchsize)~"; |
|---|
| 1146 |
return "~next~"(); |
|---|
| 1147 |
} |
|---|
| 1148 |
"; |
|---|
| 1149 |
} |
|---|
| 1150 |
return code; |
|---|
| 1151 |
} |
|---|
| 1152 |
|
|---|
| 1153 |
|
|---|
| 1154 |
// the term without a quantifier. Here we deal with embedded subsequences. |
|---|
| 1155 |
char[] regSingleTerm(char[] fnname, ref int groupno, char [] regstr, bool [char []] options, |
|---|
| 1156 |
char[] next, ref char[][] groupnames, ref char[] groupdcl, int globalfuncno) |
|---|
| 1157 |
{ |
|---|
| 1158 |
char[] code; |
|---|
| 1159 |
|
|---|
| 1160 |
if (regstr.length>2 && regstr[0..3] == "(?:") { |
|---|
| 1161 |
// A sequence always calls next. |
|---|
| 1162 |
code = regSequence(fnname, groupno, regstr[3..$ - 1], options, next, groupnames, groupdcl, globalfuncno); |
|---|
| 1163 |
} |
|---|
| 1164 |
else if (regstr.length>2 && regstr[0..3] == "(?>") { |
|---|
| 1165 |
// A sequence always calls next. |
|---|
| 1166 |
code = regSequence("independent", groupno, regstr[3..$ - 1], |
|---|
| 1167 |
options, "next_alwaystrue", |
|---|
| 1168 |
groupnames, groupdcl, globalfuncno); |
|---|
| 1169 |
code ="bool "~fnname~"() {\n"~code ; |
|---|
| 1170 |
code~="return independent() && "~next~"();}\n"; |
|---|
| 1171 |
} |
|---|
| 1172 |
else if (regstr.length>2 && regstr[0..3] == "(?=") { |
|---|
| 1173 |
// A sequence always calls next. |
|---|
| 1174 |
code = regSequence("lookahead", groupno, regstr[3..$ - 1], options, |
|---|
| 1175 |
"next_alwaystrue", groupnames, groupdcl, globalfuncno); |
|---|
| 1176 |
code ="bool "~fnname~"() {\n"~code ; |
|---|
| 1177 |
code~=" int oldp=p; |
|---|
| 1178 |
if (!lookahead()) return false; |
|---|
| 1179 |
p=oldp; |
|---|
| 1180 |
return "~next~"(); |
|---|
| 1181 |
}\n"; |
|---|
| 1182 |
} |
|---|
| 1183 |
else if (regstr.length>2 && regstr[0..3] == "(?!") { |
|---|
| 1184 |
// A sequence always calls next. |
|---|
| 1185 |
code = regSequence("negativelookahead", groupno, regstr[3..$ - 1], options, |
|---|
| 1186 |
"next_alwaystrue", groupnames, groupdcl, globalfuncno); |
|---|
| 1187 |
code ="bool "~fnname~"() {\n"~code ; |
|---|
| 1188 |
code~=" int oldp=p; |
|---|
| 1189 |
if (negativelookahead()) return false; |
|---|
| 1190 |
p=oldp; |
|---|
| 1191 |
return "~next~"(); |
|---|
| 1192 |
}\n"; |
|---|
| 1193 |
} |
|---|
| 1194 |
else if (regstr[0] == '(' || regstr.length>2 && (regstr[0..3] == "(?<" || regstr[0..3] == "(?'")) { |
|---|
| 1195 |
// A sequence always calls next. |
|---|
| 1196 |
groupno++; |
|---|
| 1197 |
char[] stop = regStop("stop"~tostring(groupno), groupno, next); |
|---|
| 1198 |
char[] bracketvar = "bracketend["~tostring(groupno)~"]"; |
|---|
| 1199 |
int cgroupno = groupno; |
|---|
| 1200 |
uint seqstart = 1; |
|---|
| 1201 |
char [] setgroupname; |
|---|
| 1202 |
char[] savegroupnamevalue; |
|---|
| 1203 |
char[] restoregroupnamevalue; |
|---|
| 1204 |
if (regstr.length>2 && (regstr[0..3] == "(?<" || regstr[0..3] == "(?'")) |
|---|
| 1205 |
{ |
|---|
| 1206 |
char[] name = regstr[3..3 + groupnameConsumed(regstr[3..$])]; |
|---|
| 1207 |
//groupnames[name]=groupno; does not work as ctfe |
|---|
| 1208 |
int groupexists; |
|---|
| 1209 |
groupexists = getGroupno(groupnames, name); |
|---|
| 1210 |
|
|---|
| 1211 |
/*if (groupdcl.length>5) |
|---|
| 1212 |
for (int i=0;i<groupdcl.length-5;i++) |
|---|
| 1213 |
{ |
|---|
| 1214 |
if (groupdcl[i..i+5]=="uint ") |
|---|
| 1215 |
{ |
|---|
| 1216 |
|
|---|
| 1217 |
for (int j=i+5;j<groupdcl.length;j++) |
|---|
| 1218 |
{ |
|---|
| 1219 |
if (groupdcl[j]=='=') |
|---|
| 1220 |
{ |
|---|
| 1221 |
if (groupdcl[i+5..j]==name) |
|---|
| 1222 |
{ |
|---|
| 1223 |
groupexists=true; |
|---|
| 1224 |
break; |
|---|
| 1225 |
} |
|---|
| 1226 |
} |
|---|
| 1227 |
} |
|---|
| 1228 |
} |
|---|
| 1229 |
}*/ |
|---|
| 1230 |
seqstart = 3 + groupnameConsumed(regstr[3..$]) + 1; |
|---|
| 1231 |
setgroupname = "groupname."~name~"="~tostring(groupno)~";\n"; |
|---|
| 1232 |
savegroupnamevalue ="int tempgroupname=groupname."~name~";\n"; |
|---|
| 1233 |
restoregroupnamevalue = "groupname."~name~"=tempgroupname;\n"; |
|---|
| 1234 |
if (groupexists == - 1) |
|---|
| 1235 |
{groupdcl~="uint "~name~"="~tostring(groupno)~";\n"; |
|---|
| 1236 |
//if (groupnames.length<=groupno) |
|---|
| 1237 |
// groupnames.length= groupno; |
|---|
| 1238 |
//groupnames[groupno]=name; |
|---|
| 1239 |
groupnames~=name~"="~tostring(groupno); |
|---|
| 1240 |
} |
|---|
| 1241 |
} |
|---|
| 1242 |
|
|---|
| 1243 |
|
|---|
| 1244 |
code =" //single term |
|---|
| 1245 |
bool "~fnname~"() { // ( |
|---|
| 1246 |
"~stop~" |
|---|
| 1247 |
//int "~bracketvar~"; |
|---|
| 1248 |
"~regSequence("a", groupno, regstr[seqstart..$ - 1], options, "stop"~tostring(groupno), groupnames, groupdcl, globalfuncno)~ " |
|---|
| 1249 |
int oldp=p; |
|---|
| 1250 |
if (group.length<="~tostring(cgroupno)~") |
|---|
| 1251 |
{ |
|---|
| 1252 |
group.length="~tostring(cgroupno + 1)~"; |
|---|
| 1253 |
} |
|---|
| 1254 |
group["~tostring(cgroupno)~"]=grouprec(oldp,-1); |
|---|
| 1255 |
"~savegroupnamevalue~" |
|---|
| 1256 |
"~setgroupname~" |
|---|
| 1257 |
bool r=a(); |
|---|
| 1258 |
if (r) |
|---|
| 1259 |
{ |
|---|
| 1260 |
group["~tostring(cgroupno)~"]=grouprec(oldp,"~bracketvar~"); |
|---|
| 1261 |
// writefln(\"grouprec1 \",oldp,\" \",p); |
|---|
| 1262 |
} |
|---|
| 1263 |
else |
|---|
| 1264 |
{ |
|---|
| 1265 |
"~restoregroupnamevalue~" |
|---|
| 1266 |
group.length="~tostring(cgroupno)~"; |
|---|
| 1267 |
} |
|---|
| 1268 |
return r; |
|---|
| 1269 |
} |
|---|
| 1270 |
"; |
|---|
| 1271 |
} else { |
|---|
| 1272 |
// A simple atom doesn't call next, so we need to do it here. |
|---|
| 1273 |
code ="bool "~fnname~"() { |
|---|
| 1274 |
"~regAtom(groupno, regstr, groupnames, options) ~ " |
|---|
| 1275 |
return fn() && "~next~"(); |
|---|
| 1276 |
} |
|---|
| 1277 |
"; |
|---|
| 1278 |
} |
|---|
| 1279 |
return code; |
|---|
| 1280 |
} |
|---|
| 1281 |
|
|---|
| 1282 |
// Evaluate one term (without quantifier). |
|---|
| 1283 |
// This helper class has two purposes: |
|---|
| 1284 |
// (1) to restore the 'p' pointer when we return. |
|---|
| 1285 |
// (2) ensure that at least one character was consumed |
|---|
| 1286 |
char[] regSequenceDontUpdateP(char[] fnname, int groupno, char [] regstr, bool [char [] ] options, ref int globalfuncno) |
|---|
| 1287 |
{ |
|---|
| 1288 |
//int globalfuncno=0; |
|---|
| 1289 |
char[][] groupnames; //dummy |
|---|
| 1290 |
char[] groupdcl; |
|---|
| 1291 |
return "bool "~fnname~"() { //regSequenceDontUpdateP |
|---|
| 1292 |
" ~regSequence("x", groupno, regstr, options, "next_alwaystrue", |
|---|
| 1293 |
groupnames, groupdcl, globalfuncno)~" |
|---|
| 1294 |
// It's only a successful match if _something_ was consumed |
|---|
| 1295 |
if (p==theinitialp) return false; |
|---|
| 1296 |
int oldp = p; |
|---|
| 1297 |
if (!x()) return false; |
|---|
| 1298 |
p = oldp; |
|---|
| 1299 |
return true; |
|---|
| 1300 |
} |
|---|
| 1301 |
"; |
|---|
| 1302 |
} |
|---|
| 1303 |
|
|---|
| 1304 |
// Calls the naked term twice, but only updates 'p' after the first one. |
|---|
| 1305 |
// Evaluate the term, knowing that what comes after will be the same as this. |
|---|
| 1306 |
char[] regTermTwice(char[] fnname, int groupno, char [] regstr, bool [char[]] options, int t, ref int globalfuncno) |
|---|
| 1307 |
{ |
|---|
| 1308 |
char[] code; |
|---|
| 1309 |
char[] groupdcl; |
|---|
| 1310 |
char[][] groupnames; //dummy |
|---|
| 1311 |
if (regstr.length>2 && (regstr[0..3] == "(?:" || regstr[0..3] == "(?>" |
|---|
| 1312 |
|| regstr[0..3] == "(?=" || regstr[0..3] == "(?!")) { |
|---|
| 1313 |
|
|---|
| 1314 |
char [] suddendeath = regSequenceDontUpdateP("suddendeath", groupno, regstr[3..t - 1], |
|---|
| 1315 |
options, globalfuncno); //groupno may be incorrect but does not matter |
|---|
| 1316 |
|
|---|
| 1317 |
code =" |
|---|
| 1318 |
bool "~fnname~"() |
|---|
| 1319 |
{ |
|---|
| 1320 |
// While evaluating this first sequence, if this is a sequence |
|---|
| 1321 |
// that potentially has zero length (ie, everything is a *, ? or {m,} term), |
|---|
| 1322 |
// each term should attempt to consume at least one character if possible. |
|---|
| 1323 |
int theinitialp = p; |
|---|
| 1324 |
" ~suddendeath ~regSequence("a", groupno, regstr[3..t - 1], options, "suddendeath", |
|---|
| 1325 |
groupnames, groupdcl, globalfuncno)~ |
|---|
| 1326 |
"return a(); |
|---|
| 1327 |
} |
|---|
| 1328 |
"; |
|---|
| 1329 |
} |
|---|
| 1330 |
else if (regstr[0] == '(') { |
|---|
| 1331 |
char []suddendeath = regSequenceDontUpdateP("suddendeath", groupno + 1, regstr[1..t - 1], options, globalfuncno); |
|---|
| 1332 |
int g = groupno + 1; |
|---|
| 1333 |
code =" |
|---|
| 1334 |
bool "~fnname~"() |
|---|
| 1335 |
{ |
|---|
| 1336 |
// While evaluating this first sequence, if this is a sequence |
|---|
| 1337 |
// that potentially has zero length (ie, everything is a *, ? or {m,} term), |
|---|
| 1338 |
// each term should attempt to consume at least one character if possible. |
|---|
| 1339 |
int theinitialp = p; |
|---|
| 1340 |
"~suddendeath~ regSequence("a", g, regstr[1..t - 1], options, "suddendeath", |
|---|
| 1341 |
groupnames, groupdcl, globalfuncno)~ " |
|---|
| 1342 |
int oldp=p; |
|---|
| 1343 |
bool r=a(); /*gstart*/ |
|---|
| 1344 |
if (group.length<="~tostring(groupno + 1)~") |
|---|
| 1345 |
{ |
|---|
| 1346 |
group.length="~tostring(groupno + 2)~"; |
|---|
| 1347 |
} |
|---|
| 1348 |
group["~tostring(groupno + 1)~"]=grouprec(oldp,-1); |
|---|
| 1349 |
if (r) |
|---|
| 1350 |
{ |
|---|
| 1351 |
// writefln(\"grouprec \",oldp,\" \",p); |
|---|
| 1352 |
} |
|---|
| 1353 |
else |
|---|
| 1354 |
{ |
|---|
| 1355 |
group.length="~tostring(groupno + 1)~"; |
|---|
| 1356 |
} /*gend*/ |
|---|
| 1357 |
return r; |
|---|
| 1358 |
//return a.fn(); |
|---|
| 1359 |
} |
|---|
| 1360 |
"; |
|---|
| 1361 |
} else { |
|---|
| 1362 |
code =" |
|---|
| 1363 |
bool "~fnname~"() { |
|---|
| 1364 |
// It's easy with atoms, because we know they always eat something. |
|---|
| 1365 |
// BUG: Maybe this will fail when null @n strings are passed in? |
|---|
| 1366 |
"~regAtom(groupno, regstr, groupnames, options)~" |
|---|
| 1367 |
return fn(); |
|---|
| 1368 |
} |
|---|
| 1369 |
"; |
|---|
| 1370 |
} |
|---|
| 1371 |
return code; |
|---|
| 1372 |
} |
|---|
| 1373 |
|
|---|
| 1374 |
|
|---|
| 1375 |
// the atom, optionally followed by a quantifier. |
|---|
| 1376 |
// Here we deal with all kinds of repitition, |
|---|
| 1377 |
// but we make different optimisations depending on the allowable repeats. |
|---|
| 1378 |
char[] regTerm(char[] fnname, ref int groupno, char [] regstr, bool [char []] options, |
|---|
| 1379 |
char[] next, ref char[][] groupnames, ref char[] groupdcl, ref int globalfuncno) |
|---|
| 1380 |
{ |
|---|
| 1381 |
char [] code; |
|---|
| 1382 |
if (atomConsumed(regstr) == regstr.length) { |
|---|
| 1383 |
|
|---|
| 1384 |
// there is no quantifier, just use the naked term |
|---|
| 1385 |
code = regSingleTerm(fnname, groupno, regstr, options, next, groupnames, groupdcl, globalfuncno); |
|---|
| 1386 |
} else { |
|---|
| 1387 |
int t = atomConsumed(regstr); |
|---|
| 1388 |
uint qc = quantifierConsumed(regstr[t..$]); |
|---|
| 1389 |
uint repmin = quantifierMin(regstr[t..$]); |
|---|
| 1390 |
uint repmax = quantifierMax(regstr[t..$]); |
|---|
| 1391 |
uint greedy = quantifierGreediness(regstr[t + qc..$]); |
|---|
| 1392 |
uint possess = quantifierPossessiveness(regstr[t + qc..$]); |
|---|
| 1393 |
uint cg = countGroups(regstr); |
|---|
| 1394 |
|
|---|
| 1395 |
code =" |
|---|
| 1396 |
bool "~fnname~"(){ |
|---|
| 1397 |
|
|---|
| 1398 |
|
|---|
| 1399 |
// HORRENDOUSLY inefficient! In some cases, we generate the quantified term THREE TIMES! |
|---|
| 1400 |
// The first one contains the rest of the search tree. |
|---|
| 1401 |
// This is used when we think we can do (atom).(next) for an early exit |
|---|
| 1402 |
"~regTerm("atomAndNext", groupno, regstr[0..t], options, next, groupnames, groupdcl, globalfuncno) ~ " |
|---|
| 1403 |
|
|---|
| 1404 |
// debug writefln(fullpattern, \" Quantifier \",regstr , \" starting at \", searchstr[p..$]); |
|---|
| 1405 |
"; |
|---|
| 1406 |
if (possess) |
|---|
| 1407 |
{ |
|---|
| 1408 |
code~=regTerm("atom", groupno, regstr[0..t], options, "next_alwaystrue", groupnames, groupdcl, globalfuncno); |
|---|
| 1409 |
} |
|---|
| 1410 |
if (repmin == 0 && repmax == 1) { |
|---|
| 1411 |
code~=" |
|---|
| 1412 |
// \"?\", or \"{0,1}\". Worth optimising seperately |
|---|
| 1413 |
int oldp = p; |
|---|
| 1414 |
"; |
|---|
| 1415 |
if (possess) |
|---|
| 1416 |
{ |
|---|
| 1417 |
code~=" // possesive |
|---|
| 1418 |
if (atom()) |
|---|
| 1419 |
{ |
|---|
| 1420 |
if (!"~next~"()) |
|---|
| 1421 |
return false; |
|---|
| 1422 |
return true; |
|---|
| 1423 |
} |
|---|
| 1424 |
p = oldp; |
|---|
| 1425 |
if ("~next~"()) |
|---|
| 1426 |
{ |
|---|
| 1427 |
return true; |
|---|
| 1428 |
} |
|---|
| 1429 |
p = oldp; |
|---|
| 1430 |
return false; |
|---|
| 1431 |
}"; |
|---|
| 1432 |
} |
|---|
| 1433 |
else |
|---|
| 1434 |
if (!greedy) |
|---|
| 1435 |
{ |
|---|
| 1436 |
if (next == "next_alwaystrue") |
|---|
| 1437 |
{ |
|---|
| 1438 |
code~="return true;}"; |
|---|
| 1439 |
} |
|---|
| 1440 |
else |
|---|
| 1441 |
code~="if ("~next~"()) { return true; } |
|---|
| 1442 |
p = oldp; |
|---|
| 1443 |
return atomAndNext(); |
|---|
| 1444 |
}"; |
|---|
| 1445 |
} |
|---|
| 1446 |
else //greedy |
|---|
| 1447 |
{ |
|---|
| 1448 |
if (next == "next_alwaystrue") |
|---|
| 1449 |
{ |
|---|
| 1450 |
code~="// pragma(msg, \"?greedy\"); |
|---|
| 1451 |
if (atomAndNext()) |
|---|
| 1452 |
{ |
|---|
| 1453 |
return true; |
|---|
| 1454 |
} |
|---|
| 1455 |
p = oldp; |
|---|
| 1456 |
return true; |
|---|
| 1457 |
}"; |
|---|
| 1458 |
} |
|---|
| 1459 |
else |
|---|
| 1460 |
code~=" // pragma(msg, \"?greedy\"); |
|---|
| 1461 |
if (atomAndNext()) |
|---|
| 1462 |
{ |
|---|
| 1463 |
return true; |
|---|
| 1464 |
} |
|---|
| 1465 |
p = oldp; |
|---|
| 1466 |
if ("~next~"()) |
|---|
| 1467 |
{ |
|---|
| 1468 |
return true; |
|---|
| 1469 |
} |
|---|
| 1470 |
p = oldp; |
|---|
| 1471 |
return false; |
|---|
| 1472 |
} |
|---|
| 1473 |
"; |
|---|
| 1474 |
} |
|---|
| 1475 |
} else { |
|---|
| 1476 |
code~=" |
|---|
| 1477 |
// Here's where we generate the redundant term. |
|---|
| 1478 |
// If we can't do (atom).(next), we must be able to do |
|---|
| 1479 |
// (atom).(atom) to stay in the game. |
|---|
| 1480 |
"~regTermTwice("atomonly", groupno, regstr[0..t], options, t, globalfuncno) ; |
|---|
| 1481 |
if (repmin == 0 && repmax == uint.max) { |
|---|
| 1482 |
// optimise for \"*\", \"{0,}\" |
|---|
| 1483 |
if (cg>0) |
|---|
| 1484 |
{ |
|---|
| 1485 |
assert(0,"\nError:use of * or {0,} on any and this capturing group "~regstr[0..t]~" is not supported in regular expression"~regstr); |
|---|
| 1486 |
} |
|---|
| 1487 |
if (possess) // * or {0,} |
|---|
| 1488 |
{ |
|---|
| 1489 |
code~=" // possessive |
|---|
| 1490 |
int oldp; |
|---|
| 1491 |
int veryoldp=p; |
|---|
| 1492 |
int newp=-1; |
|---|
| 1493 |
// p=oldp; |
|---|
| 1494 |
do { |
|---|
| 1495 |
// Can we do (atom)? |
|---|
| 1496 |
oldp = p; |
|---|
| 1497 |
if (atom()) |
|---|
| 1498 |
{ newp=p; |
|---|
| 1499 |
} |
|---|
| 1500 |
} while (p != oldp); |
|---|
| 1501 |
if (newp!=-1) |
|---|
| 1502 |
{ |
|---|
| 1503 |
p=newp; |
|---|
| 1504 |
|
|---|
| 1505 |
return "~next~"(); |
|---|
| 1506 |
} //nothing is matched |
|---|
| 1507 |
p = veryoldp; |
|---|
| 1508 |
if ("~next~"()) return true; // success but we want longer ones |
|---|
| 1509 |
return false; |
|---|
| 1510 |
}"; |
|---|
| 1511 |
} |
|---|
| 1512 |
else if (!greedy) |
|---|
| 1513 |
{ |
|---|
| 1514 |
|
|---|
| 1515 |
if (next == "next_alwaystrue") |
|---|
| 1516 |
code~=" // optimise for non-greedy\"*\", \"{0,}\" |
|---|
| 1517 |
return true; // We can finish right now. |
|---|
| 1518 |
} |
|---|
| 1519 |
"; |
|---|
| 1520 |
else |
|---|
| 1521 |
code~=" // optimise for non-greedy\"*\", \"{0,}\" |
|---|
| 1522 |
int oldp=p; |
|---|
| 1523 |
if ("~next~"()) return true; // We can finish right now. |
|---|
| 1524 |
p=oldp; |
|---|
| 1525 |
do { |
|---|
| 1526 |
// Can we do (atom).(next) ? |
|---|
| 1527 |
oldp = p; |
|---|
| 1528 |
if (atomAndNext()) { return true; } |
|---|
| 1529 |
p = oldp; |
|---|
| 1530 |
// We need to do (atom).(atom) to have any chance of continuing. |
|---|
| 1531 |
// also, it must have consumed at least one character, or there is no hope. |
|---|
| 1532 |
} while (atomonly() && p != oldp); |
|---|
| 1533 |
return false; |
|---|
| 1534 |
} |
|---|
| 1535 |
"; |
|---|
| 1536 |
} |
|---|
| 1537 |
else // optimise for greedy\"*\", \"{0,}\" |
|---|
| 1538 |
{ |
|---|
| 1539 |
//atom cannot contain capturing groups |
|---|
| 1540 |
|
|---|
| 1541 |
if (next == "next_alwaystrue") |
|---|
| 1542 |
code~=" // greedy |
|---|
| 1543 |
int oldp; |
|---|
| 1544 |
int veryoldp=p; |
|---|
| 1545 |
int newp=-1; |
|---|
| 1546 |
// p=oldp; |
|---|
| 1547 |
do { |
|---|
| 1548 |
// Can we do (atom).(next) ? |
|---|
| 1549 |
oldp = p; |
|---|
| 1550 |
if (atomAndNext()) |
|---|
| 1551 |
{ newp=p; |
|---|
| 1552 |
} |
|---|
| 1553 |
p = oldp; |
|---|
| 1554 |
// We need to do (atom).(atom) to have any chance of continuing. |
|---|
| 1555 |
// also, it must have consumed at least one character, or there is no hope. |
|---|
| 1556 |
} while (atomonly() && p != oldp); |
|---|
| 1557 |
if (newp!=-1) |
|---|
| 1558 |
{ |
|---|
| 1559 |
p=newp; |
|---|
| 1560 |
|
|---|
| 1561 |
return true; |
|---|
| 1562 |
} |
|---|
| 1563 |
p = veryoldp; |
|---|
| 1564 |
return true; // success but we want longer ones |
|---|
| 1565 |
}"; |
|---|
| 1566 |
else |
|---|
| 1567 |
code~=" // greedy |
|---|
| 1568 |
int oldp; |
|---|
| 1569 |
int veryoldp=p; /*gstart*/ |
|---|
| 1570 |
grouprec [] savegroups; /*gend*/ |
|---|
| 1571 |
int newp=-1; |
|---|
| 1572 |
// p=oldp; |
|---|
| 1573 |
do { |
|---|
| 1574 |
// Can we do (atom).(next) ? |
|---|
| 1575 |
oldp = p; |
|---|
| 1576 |
if (atomAndNext()) |
|---|
| 1577 |
{ newp=p; /*gstart*/ |
|---|
| 1578 |
int sl=group.length-("~tostring(groupno + 1)~"); |
|---|
| 1579 |
if (sl>0) |
|---|
| 1580 |
{ |
|---|
| 1581 |
savegroups.length=sl; |
|---|
| 1582 |
savegroups[0..$]=group["~tostring(groupno + 1)~"..$]; |
|---|
| 1583 |
} |
|---|
| 1584 |
group.length="~tostring(groupno + 1)~"; /*gend*/ |
|---|
| 1585 |
} |
|---|
| 1586 |
p = oldp; |
|---|
| 1587 |
// We need to do (atom).(atom) to have any chance of continuing. |
|---|
| 1588 |
// also, it must have consumed at least one character, or there is no hope. |
|---|
| 1589 |
} while (atomonly() && p != oldp); |
|---|
| 1590 |
if (newp!=-1) |
|---|
| 1591 |
{ |
|---|
| 1592 |
p=newp; /*gstart*/ |
|---|
| 1593 |
group.length="~tostring(groupno + 1)~"+savegroups.length; |
|---|
| 1594 |
if (savegroups.length>0) |
|---|
| 1595 |
{ |
|---|
| 1596 |
group["~tostring(groupno + 1)~"..$]=savegroups[0..$]; |
|---|
| 1597 |
} /*gend*/ |
|---|
| 1598 |
|
|---|
| 1599 |
return true; |
|---|
| 1600 |
} |
|---|
| 1601 |
p = veryoldp; /*gstart*/ |
|---|
| 1602 |
group.length="~tostring(groupno + 1)~"; /*gend*/ |
|---|
| 1603 |
if ("~next~"()) return true; // success but we want longer ones |
|---|
| 1604 |
return false; |
|---|
| 1605 |
}"; |
|---|
| 1606 |
} |
|---|
| 1607 |
} else { // \"+\", or \"{m,n}\" |
|---|
| 1608 |
if (cg>0) |
|---|
| 1609 |
{ |
|---|
| 1610 |
assert(0,"\nError:use of + or {m,n} on any and this capturing group "~regstr[0..t]~" is not supported in regular expression"~regstr); |
|---|
| 1611 |
} |
|---|
| 1612 |
if (possess) |
|---|
| 1613 |
{ |
|---|
| 1614 |
// possessive start |
|---|
| 1615 |
code~="//possessive\"+\", or \"{m,n}\" |
|---|
| 1616 |
int numreps=0; // how many repeats have we found? |
|---|
| 1617 |
int oldp; |
|---|
| 1618 |
int newp=-1; \n"; |
|---|
| 1619 |
if (repmin == 0) |
|---|
| 1620 |
{ |
|---|
| 1621 |
code~=" newp=p; |
|---|
| 1622 |
"; |
|---|
| 1623 |
} |
|---|
| 1624 |
code~=" |
|---|
| 1625 |
do { |
|---|
| 1626 |
oldp = p; |
|---|
| 1627 |
numreps++; |
|---|
| 1628 |
if (numreps>="~tostring(repmin)~" && atom()) { |
|---|
| 1629 |
newp=p; |
|---|
| 1630 |
} |
|---|
| 1631 |
|
|---|
| 1632 |
"; |
|---|
| 1633 |
if (repmax<uint.max) { |
|---|
| 1634 |
code~=" // optimise for \"+\", \"{n,}\" |
|---|
| 1635 |
if (numreps == "~tostring(repmax)~") break; |
|---|
| 1636 |
"; |
|---|
| 1637 |
} |
|---|
| 1638 |
code~=" |
|---|
| 1639 |
} while (p!=oldp); |
|---|
| 1640 |
if (newp!=-1) |
|---|
| 1641 |
{ |
|---|
| 1642 |
p=newp; |
|---|
| 1643 |
return "~next~"(); |
|---|
| 1644 |
} |
|---|
| 1645 |
return false; |
|---|
| 1646 |
} |
|---|
| 1647 |
"; |
|---|
| 1648 |
|
|---|
| 1649 |
// possessive end |
|---|
| 1650 |
} |
|---|
| 1651 |
else |
|---|
| 1652 |
if (!greedy) |
|---|
| 1653 |
{ |
|---|
| 1654 |
code~="// non-greedy\"+\", or \"{m,n}\" |
|---|
| 1655 |
int numreps=0; // how many repeats have we found? |
|---|
| 1656 |
int oldp;\n"; |
|---|
| 1657 |
if (repmin == 0) |
|---|
| 1658 |
{ |
|---|
| 1659 |
if (next == "next_alwaystrue") |
|---|
| 1660 |
code~=" return true; |
|---|
| 1661 |
"; |
|---|
| 1662 |
else |
|---|
| 1663 |
code~=" |
|---|
| 1664 |
oldp=p; |
|---|
| 1665 |
if ("~next~"()) return true; |
|---|
| 1666 |
p = oldp; |
|---|
| 1667 |
"; |
|---|
| 1668 |
} |
|---|
| 1669 |
code~=" |
|---|
| 1670 |
do { |
|---|
| 1671 |
oldp = p; |
|---|
| 1672 |
numreps++; |
|---|
| 1673 |
if (numreps>="~tostring(repmin)~" && atomAndNext()) return true; |
|---|
| 1674 |
p = oldp; |
|---|
| 1675 |
"; |
|---|
| 1676 |
if (repmax<uint.max) { |
|---|
| 1677 |
code~=" // optimise for \"+\", \"{n,}\" |
|---|
| 1678 |
if (numreps == "~tostring(repmax)~") return false; |
|---|
| 1679 |
"; |
|---|
| 1680 |
} |
|---|
| 1681 |
code~=" |
|---|
| 1682 |
} while (atomonly() && p!=oldp); |
|---|
| 1683 |
return false; |
|---|
| 1684 |
} |
|---|
| 1685 |
"; |
|---|
| 1686 |
} |
|---|
| 1687 |
else // greedy |
|---|
| 1688 |
{ |
|---|
| 1689 |
code~="//greedy\"+\", or \"{m,n}\" |
|---|
| 1690 |
int numreps=0; // how many repeats have we found? |
|---|
| 1691 |
int oldp; |
|---|
| 1692 |
int newp=-1; /*gstart*/ |
|---|
| 1693 |
grouprec [] savegroups;/*gend*/\n"; |
|---|
| 1694 |
if (repmin == 0) |
|---|
| 1695 |
{ |
|---|
| 1696 |
code~=" |
|---|
| 1697 |
oldp=p; |
|---|
| 1698 |
if ("~next~"()) |
|---|
| 1699 |
{ |
|---|
| 1700 |
newp=p; /*gstart*/ |
|---|
| 1701 |
int sl=group.length-"~tostring(groupno + 1)~"; |
|---|
| 1702 |
if (sl>0) |
|---|
| 1703 |
{ |
|---|
| 1704 |
savegroups.length=sl; |
|---|
| 1705 |
savegroups[0..$]=group["~tostring(groupno + 1)~"..$]; |
|---|
| 1706 |
} |
|---|
| 1707 |
group.length="~tostring(groupno + 1)~"; /*gend*/ |
|---|
| 1708 |
} |
|---|
| 1709 |
p = oldp; |
|---|
| 1710 |
"; |
|---|
| 1711 |
} |
|---|
| 1712 |
code~=" |
|---|
| 1713 |
do { |
|---|
| 1714 |
oldp = p; |
|---|
| 1715 |
numreps++; |
|---|
| 1716 |
if (numreps>="~tostring(repmin)~" && atomAndNext()) { |
|---|
| 1717 |
newp=p; |
|---|
| 1718 |
/*gstart*/ |
|---|
| 1719 |
int sl=group.length-("~tostring(groupno + 1)~"); |
|---|
| 1720 |
if (sl>0) |
|---|
| 1721 |
{ |
|---|
| 1722 |
savegroups.length=sl; |
|---|
| 1723 |
savegroups[0..$]=group["~tostring(groupno + 1)~"..$]; |
|---|
| 1724 |
} |
|---|
| 1725 |
group.length="~tostring(groupno + 1)~"; /*gend*/ |
|---|
| 1726 |
} |
|---|
| 1727 |
p = oldp; |
|---|
| 1728 |
"; |
|---|
| 1729 |
if (repmax<uint.max) { |
|---|
| 1730 |
code~=" // optimise for \"+\", \"{n,}\" |
|---|
| 1731 |
if (numreps == "~tostring(repmax)~") break; |
|---|
| 1732 |
"; |
|---|
| 1733 |
} |
|---|
| 1734 |
code~=" |
|---|
| 1735 |
} while (atomonly() && p!=oldp); |
|---|
| 1736 |
if (newp!=-1) |
|---|
| 1737 |
{ |
|---|
| 1738 |
p=newp; /*gstart*/ |
|---|
| 1739 |
group.length="~tostring(groupno + 1)~"+savegroups.length; |
|---|
| 1740 |
if (savegroups.length>0) |
|---|
| 1741 |
{ |
|---|
| 1742 |
group["~tostring(groupno + 1)~"..$]=savegroups[0..$]; |
|---|
| 1743 |
} /*gend*/ |
|---|
| 1744 |
return true; |
|---|
| 1745 |
} |
|---|
| 1746 |
return false; |
|---|
| 1747 |
} |
|---|
| 1748 |
"; |
|---|
| 1749 |
} |
|---|
| 1750 |
} |
|---|
| 1751 |
} |
|---|
| 1752 |
|
|---|
| 1753 |
} |
|---|
| 1754 |
return code; |
|---|
| 1755 |
} |
|---|
| 1756 |
char[] slashimp(char[] abbr, bool[char[]] options) |
|---|
| 1757 |
{ |
|---|
| 1758 |
return " |
|---|
| 1759 |
bool fn() { // character class |
|---|
| 1760 |
if (p<searchstr.length && ("~charMatches(abbr, "searchstr[p]", options)~")) |
|---|
| 1761 |
{ |
|---|
| 1762 |
p++; |
|---|
| 1763 |
return true; |
|---|
| 1764 |
} |
|---|
| 1765 |
return false; |
|---|
| 1766 |
} |
|---|
| 1767 |
"; |
|---|
| 1768 |
} |
|---|
| 1769 |
char [] Backreference(int maxgroupno, int groupno, char[] groupname) |
|---|
| 1770 |
{ |
|---|
| 1771 |
|
|---|
| 1772 |
if (groupno>maxgroupno) |
|---|
| 1773 |
{ |
|---|
| 1774 |
assert(0,"\nmax group number at this point:"~ |
|---|
| 1775 |
tostring(maxgroupno) ~ " bad \\"~tostring(groupno)~ |
|---|
| 1776 |
":group "~tostring(groupno) |
|---|
| 1777 |
~" cannot be referenced as it is not available at this point"); |
|---|
| 1778 |
} |
|---|
| 1779 |
// assert(0,"bad \\"~regstr[1]~ ":group "~regstr[1]~" cannot be referenced as it is not captured"); |
|---|
| 1780 |
char[] groupcode; |
|---|
| 1781 |
char[] groupid; |
|---|
| 1782 |
if (groupname != "") |
|---|
| 1783 |
{ |
|---|
| 1784 |
groupcode = "groupname."~groupname; |
|---|
| 1785 |
groupid = groupname; |
|---|
| 1786 |
} |
|---|
| 1787 |
else |
|---|
| 1788 |
{ |
|---|
| 1789 |
groupcode = tostring(groupno); |
|---|
| 1790 |
groupid = groupcode; |
|---|
| 1791 |
} |
|---|
| 1792 |
return " bool fn() { |
|---|
| 1793 |
int gs=group["~ groupcode~"].start; |
|---|
| 1794 |
int ge=bracketend["~ groupcode~"]; |
|---|
| 1795 |
int gsize=ge-gs; |
|---|
| 1796 |
// writefln(`backref group `,gs,\" \",ge,`searchstr `,\" \",p,\" \",p+gsize,\""~ groupcode~"\","~ groupcode~"); |
|---|
| 1797 |
//writefln(`backref group `,searchstr[gs..ge],`searchstr `,searchstr[p..p+gsize]); |
|---|
| 1798 |
if (gsize<0) |
|---|
| 1799 |
{ |
|---|
| 1800 |
throw new Exception(\"bad \\\\"~groupid~ ":group "~groupid~" cannot be referenced as it is not captured\"); |
|---|
| 1801 |
// assert(0,\"bad \\\\"~tostring(groupno)~ ":group "~tostring(groupno)~" cannot be referenced as it is not captured\"); |
|---|
| 1802 |
} |
|---|
| 1803 |
if (p+gsize<=searchstr.length && searchstr[gs..ge]==searchstr[p..p+gsize]) |
|---|
| 1804 |
{ |
|---|
| 1805 |
p+=gsize; |
|---|
| 1806 |
return true; |
|---|
| 1807 |
} |
|---|
| 1808 |
return false; |
|---|
| 1809 |
} |
|---|
| 1810 |
"; |
|---|
| 1811 |
|
|---|
| 1812 |
} |
|---|
| 1813 |
|
|---|
| 1814 |
int getGroupno(ref char[][] groupnames, ref char[] nametofind) |
|---|
| 1815 |
{ |
|---|
| 1816 |
int f = - 1; |
|---|
| 1817 |
|
|---|
| 1818 |
for (int i = 0;i<groupnames.length;i++) |
|---|
| 1819 |
{ |
|---|
| 1820 |
char[] findthis = nametofind; |
|---|
| 1821 |
findthis~="="; |
|---|
| 1822 |
if (groupnames[i].length>nametofind.length |
|---|
| 1823 |
&& |
|---|
| 1824 |
groupnames[i][0..nametofind.length + 1] == findthis) |
|---|
| 1825 |
{ |
|---|
| 1826 |
f = atoui(groupnames[i][nametofind.length + 1..$]); |
|---|
| 1827 |
break; |
|---|
| 1828 |
} |
|---|
| 1829 |
} |
|---|
| 1830 |
return f; |
|---|
| 1831 |
|
|---|
| 1832 |
} |
|---|
| 1833 |
// generate a parser for an atom |
|---|
| 1834 |
// IN: regstr is a valid atom, without a repeat |
|---|
| 1835 |
// OUT: if atom is matched, return true, and update p. |
|---|
| 1836 |
// if atom is not matched, return false, and leave p unchanged. |
|---|
| 1837 |
char[] regAtom(int groupno, char [] regstr, char[][] groupnames, bool[char[]] options) |
|---|
| 1838 |
{ |
|---|
| 1839 |
if (regstr[0] == '[') { |
|---|
| 1840 |
if (regstr[1] == '^') |
|---|
| 1841 |
{ |
|---|
| 1842 |
return " |
|---|
| 1843 |
bool fn() { // inverse character class |
|---|
| 1844 |
if (p<searchstr.length && (!"~charMatches(regstr[2..$ - 1], "searchstr[p]", options)~")) |
|---|
| 1845 |
{ |
|---|
| 1846 |
p++; |
|---|
| 1847 |
return true; |
|---|
| 1848 |
} |
|---|
| 1849 |
return false; |
|---|
| 1850 |
} "; |
|---|
| 1851 |
} else { |
|---|
| 1852 |
return " |
|---|
| 1853 |
bool fn() { // character class |
|---|
| 1854 |
if (p<searchstr.length && ("~charMatches(regstr[1..$ - 1], "searchstr[p]", options)~")) |
|---|
| 1855 |
{ |
|---|
| 1856 |
p++; |
|---|
| 1857 |
return true; |
|---|
| 1858 |
} |
|---|
| 1859 |
return false; |
|---|
| 1860 |
} |
|---|
| 1861 |
"; |
|---|
| 1862 |
} |
|---|
| 1863 |
} else if (regstr[0] == '.') { // match any |
|---|
| 1864 |
if (options["s"]) |
|---|
| 1865 |
{ |
|---|
| 1866 |
return " |
|---|
| 1867 |
bool fn() { //. |
|---|
| 1868 |
if (p==searchstr.length) return false; |
|---|
| 1869 |
p++; |
|---|
| 1870 |
return true; |
|---|
| 1871 |
} |
|---|
| 1872 |
"; |
|---|
| 1873 |
} |
|---|
| 1874 |
else |
|---|
| 1875 |
{ |
|---|
| 1876 |
return " |
|---|
| 1877 |
bool fn() { //. |
|---|
| 1878 |
if (p==searchstr.length || searchstr[p]=='\\n' ) return false; |
|---|
| 1879 |
p++; |
|---|
| 1880 |
return true; |
|---|
| 1881 |
} |
|---|
| 1882 |
"; |
|---|
| 1883 |
|
|---|
| 1884 |
} |
|---|
| 1885 |
} else if (regstr.length>1 && regstr[0..2] == "\\w") { // match a word letter |
|---|
| 1886 |
bool [char[]] toptions = ["i":false]; |
|---|
| 1887 |
return slashimp("\\w", toptions); |
|---|
| 1888 |
} |
|---|
| 1889 |
else if (regstr.length>1 && regstr[0..2] == "\\s") // match whitespace |
|---|
| 1890 |
return slashimp("\\s", options); |
|---|
| 1891 |
else if (regstr.length>1 && regstr[0..2] == "\\d") // match numbers |
|---|
| 1892 |
return slashimp("\\d", options); |
|---|
| 1893 |
else if (regstr.length>1 && regstr[0..2] == "\\D") // match numbers |
|---|
| 1894 |
return slashimp("\\D", options); |
|---|
| 1895 |
else if (regstr.length>1 && regstr[0..2] == "\\W") // match numbers |
|---|
| 1896 |
return slashimp("\\W", options); |
|---|
| 1897 |
else if (regstr.length>1 && regstr[0..2] == "\\S") // match numbers |
|---|
| 1898 |
return slashimp("\\S", options); |
|---|
| 1899 |
else if (regstr.length>1 && regstr[0] == '\\' && (regstr[1] == 'k' )) |
|---|
| 1900 |
{ |
|---|
| 1901 |
if (regstr[2] == '<' || regstr[2] == '\'') |
|---|
| 1902 |
{ |
|---|
| 1903 |
uint namesize = groupnameConsumed(regstr[3..$]); |
|---|
| 1904 |
char[] name = regstr[3..3 + namesize]; |
|---|
| 1905 |
int f = getGroupno(groupnames, name); |
|---|
| 1906 |
if (f == - 1) |
|---|
| 1907 |
assert(0,"\nError:bad group name "~regstr[3..3 + namesize]~ ":it does not exist"); |
|---|
| 1908 |
return Backreference(groupno, f, name); |
|---|
| 1909 |
} |
|---|
| 1910 |
assert(0, "\nError:internal"); |
|---|
| 1911 |
|
|---|
| 1912 |
} |
|---|
| 1913 |
else if (regstr.length>1 && regstr[0] == '\\' && (regstr[1] >= '1' && regstr[1] <= '9' )) { |
|---|
| 1914 |
return Backreference(groupno, atoui(""~regstr[1]), ""); |
|---|
| 1915 |
} |
|---|
| 1916 |
else if (regstr[0] == '@') { // NONSTANDARD: referenced parameter |
|---|
| 1917 |
return regParameter(atoui(regstr[1..$]) - 1, options); |
|---|
| 1918 |
} else if (regstr[0] == '^') { // start of line |
|---|
| 1919 |
if (options["m"]) |
|---|
| 1920 |
{ |
|---|
| 1921 |
return " |
|---|
| 1922 |
bool fn() { |
|---|
| 1923 |
return (p==0 || searchstr[p-1]=='\\n'); |
|---|
| 1924 |
} |
|---|
| 1925 |
"; |
|---|
| 1926 |
} |
|---|
| 1927 |
else |
|---|
| 1928 |
{ |
|---|
| 1929 |
return " |
|---|
| 1930 |
bool fn() { |
|---|
| 1931 |
return (p==0); |
|---|
| 1932 |
} |
|---|
| 1933 |
"; |
|---|
| 1934 |
} |
|---|
| 1935 |
} else if (regstr[0] == '$') { // end of line |
|---|
| 1936 |
if (options["m"]) |
|---|
| 1937 |
{ |
|---|
| 1938 |
return " |
|---|
| 1939 |
bool fn() { |
|---|
| 1940 |
return (p==searchstr.length || searchstr[p+1]=='\\n'); |
|---|
| 1941 |
} |
|---|
| 1942 |
"; |
|---|
| 1943 |
} |
|---|
| 1944 |
else |
|---|
| 1945 |
{ |
|---|
| 1946 |
return " |
|---|
| 1947 |
bool fn() { |
|---|
| 1948 |
return (p==searchstr.length || (p==searchstr.length-1 && searchstr[p]=='\\n')); |
|---|
| 1949 |
} |
|---|
| 1950 |
"; |
|---|
| 1951 |
} |
|---|
| 1952 |
} else if (regstr.length>1 && regstr[0] == '\\' && regstr[1] == 'A') { // end of line |
|---|
| 1953 |
return " |
|---|
| 1954 |
bool fn() { |
|---|
| 1955 |
return (p==0); |
|---|
| 1956 |
} |
|---|
| 1957 |
"; |
|---|
| 1958 |
} else if (regstr.length>1 && regstr[0] == '\\' && regstr[1] == 'z') { // end of line |
|---|
| 1959 |
return " |
|---|
| 1960 |
bool fn() { |
|---|
| 1961 |
return (p==searchstr.length); |
|---|
| 1962 |
} |
|---|
| 1963 |
"; |
|---|
| 1964 |
} else if (regstr.length>1 && regstr[0] == '\\') { // escaped char |
|---|
| 1965 |
char[] regstr1 = toLiteralChar(regstr[1]); |
|---|
| 1966 |
if (!options["i"]) |
|---|
| 1967 |
{ |
|---|
| 1968 |
return " |
|---|
| 1969 |
bool fn() { |
|---|
| 1970 |
if (p==searchstr.length || searchstr[p]!="~regstr1~") return false; |
|---|
| 1971 |
p++; |
|---|
| 1972 |
return true; |
|---|
| 1973 |
} |
|---|
| 1974 |
"; |
|---|
| 1975 |
} |
|---|
| 1976 |
else |
|---|
| 1977 |
{ |
|---|
| 1978 |
return " |
|---|
| 1979 |
bool fn() { |
|---|
| 1980 |
if (p==searchstr.length || icmp([searchstr[p]],["~regstr1~"])!=0) return false; |
|---|
| 1981 |
p++; |
|---|
| 1982 |
return true; |
|---|
| 1983 |
} |
|---|
| 1984 |
"; |
|---|
| 1985 |
|
|---|
| 1986 |
|
|---|
| 1987 |
} |
|---|
| 1988 |
} else { |
|---|
| 1989 |
// match single character |
|---|
| 1990 |
char[] regstr0 = toLiteralChar(regstr[0]); |
|---|
| 1991 |
if (!options["i"]) |
|---|
| 1992 |
{ |
|---|
| 1993 |
return " |
|---|
| 1994 |
bool fn() { |
|---|
| 1995 |
if (p==searchstr.length || searchstr[p]!="~regstr0~") return false; |
|---|
| 1996 |
p++; |
|---|
| 1997 |
return true; |
|---|
| 1998 |
} |
|---|
| 1999 |
"; |
|---|
| 2000 |
} |
|---|
| 2001 |
else |
|---|
| 2002 |
{ |
|---|
| 2003 |
return " |
|---|
| 2004 |
bool fn() { |
|---|
| 2005 |
if (p==searchstr.length || icmp([searchstr[p]],["~regstr0~"])!=0) return false; |
|---|
| 2006 |
p++; |
|---|
| 2007 |
return true; |
|---|
| 2008 |
} |
|---|
| 2009 |
"; |
|---|
| 2010 |
} |
|---|
| 2011 |
} |
|---|
| 2012 |
} |
|---|
| 2013 |
|
|---|
| 2014 |
// match a variable string, which will be passed as a parameter. |
|---|
| 2015 |
char[] regParameter(int parmnum, bool [char[]]options) |
|---|
| 2016 |
{ |
|---|
| 2017 |
if (!options["i"]) |
|---|
| 2018 |
{ |
|---|
| 2019 |
return " |
|---|
| 2020 |
bool fn() { |
|---|
| 2021 |
if (p + param["~tostring(parmnum)~"].length > searchstr.length) return false; |
|---|
| 2022 |
if (searchstr[p..p+param["~tostring(parmnum)~"].length] != param["~tostring(parmnum)~"]) return false; |
|---|
| 2023 |
p+=param["~tostring(parmnum)~"].length; |
|---|
| 2024 |
return true; |
|---|
| 2025 |
} |
|---|
| 2026 |
"; |
|---|
| 2027 |
} |
|---|
| 2028 |
else |
|---|
| 2029 |
{ |
|---|
| 2030 |
return " |
|---|
| 2031 |
bool fn() { |
|---|
| 2032 |
if (p + param["~tostring(parmnum)~"].length > searchstr.length) return false; |
|---|
| 2033 |
if (icmp(searchstr[p..p+param["~tostring(parmnum)~"].length],param["~tostring(parmnum)~"])!=0) return false; |
|---|
| 2034 |
p+=param["~tostring(parmnum)~"].length; |
|---|
| 2035 |
return true; |
|---|
| 2036 |
} |
|---|
| 2037 |
"; |
|---|
| 2038 |
|
|---|
| 2039 |
} |
|---|
| 2040 |
} |
|---|
| 2041 |
|
|---|
| 2042 |
//"a-zA-Z0-9_" |
|---|
| 2043 |
|
|---|
| 2044 |
char[] toLiteralString(char[] s) |
|---|
| 2045 |
{ |
|---|
| 2046 |
char[] sout; |
|---|
| 2047 |
foreach(c;s) |
|---|
| 2048 |
{ |
|---|
| 2049 |
sout~=toLiteralString(c); |
|---|
| 2050 |
} |
|---|
| 2051 |
return sout; |
|---|
| 2052 |
} |
|---|
| 2053 |
|
|---|
| 2054 |
char[] toLiteralString(char c) |
|---|
| 2055 |
{ |
|---|
| 2056 |
if (c == '\'') |
|---|
| 2057 |
{ |
|---|
| 2058 |
return "\\\'"; |
|---|
| 2059 |
} |
|---|
| 2060 |
else if (c == '\"') |
|---|
| 2061 |
{ |
|---|
| 2062 |
return "\\\""; |
|---|
| 2063 |
} |
|---|
| 2064 |
else if (c == '\\') |
|---|
| 2065 |
{ |
|---|
| 2066 |
return "\\\\"; |
|---|
| 2067 |
} |
|---|
| 2068 |
else if (c == '\n') |
|---|
| 2069 |
{ |
|---|
| 2070 |
return "\\n"; |
|---|
| 2071 |
} |
|---|
| 2072 |
else if (c == '\r') |
|---|
| 2073 |
{ |
|---|
| 2074 |
return "\\r"; |
|---|
| 2075 |
} |
|---|
| 2076 |
else if (c == '\t') |
|---|
| 2077 |
{ |
|---|
| 2078 |
return "\\t"; |
|---|
| 2079 |
} |
|---|
| 2080 |
else |
|---|
| 2081 |
{ |
|---|
| 2082 |
return ""~c; |
|---|
| 2083 |
} |
|---|
| 2084 |
} |
|---|
| 2085 |
|
|---|
| 2086 |
|
|---|
| 2087 |
char[] toLiteralChar(char c) |
|---|
| 2088 |
{ |
|---|
| 2089 |
return "\'"~toLiteralString(c)~"\'"; |
|---|
| 2090 |
|
|---|
| 2091 |
} |
|---|
| 2092 |
// ps1<=pe1 and ps2<=pe2 |
|---|
| 2093 |
bool intersectPeriods(int ps1, int pe1, int ps2, int pe2, out int is1, out int ie1) |
|---|
| 2094 |
{ |
|---|
| 2095 |
if (pe1<ps2) |
|---|
| 2096 |
{ |
|---|
| 2097 |
return false; |
|---|
| 2098 |
} |
|---|
| 2099 |
if (pe2<ps1) |
|---|
| 2100 |
{ |
|---|
| 2101 |
return false; |
|---|
| 2102 |
} |
|---|
| 2103 |
if (pe1 <= pe2) |
|---|
| 2104 |
{ |
|---|
| 2105 |
if (ps1 <= ps2) |
|---|
| 2106 |
{ |
|---|
| 2107 |
is1 = ps2; |
|---|
| 2108 |
ie1 = pe1; |
|---|
| 2109 |
return true; |
|---|
| 2110 |
} |
|---|
| 2111 |
else |
|---|
| 2112 |
{ |
|---|
| 2113 |
is1 = ps1; |
|---|
| 2114 |
ie1 = pe1; |
|---|
| 2115 |
return true; |
|---|
| 2116 |
} |
|---|
| 2117 |
} |
|---|
| 2118 |
else |
|---|
| 2119 |
{ |
|---|
| 2120 |
if (ps1 <= ps2) |
|---|
| 2121 |
{ |
|---|
| 2122 |
is1 = ps2; |
|---|
| 2123 |
ie1 = pe2; |
|---|
| 2124 |
return true; |
|---|
| 2125 |
} |
|---|
| 2126 |
else |
|---|
| 2127 |
{ |
|---|
| 2128 |
is1 = ps1; |
|---|
| 2129 |
ie1 = pe2; |
|---|
| 2130 |
return true; |
|---|
| 2131 |
} |
|---|
| 2132 |
|
|---|
| 2133 |
} |
|---|
| 2134 |
} |
|---|
| 2135 |
|
|---|
| 2136 |
|
|---|
| 2137 |
// return true if char ch is matched by the character class regstr. |
|---|
| 2138 |
char[] charMatches(char [] regstr, char []ch, bool [char[]] options) |
|---|
| 2139 |
{ |
|---|
| 2140 |
char[] code; |
|---|
| 2141 |
if (regstr.length == 0) return "false"; |
|---|
| 2142 |
else if (regstr.length >= 3 && regstr[1] == '-' && regstr[0] != '\\' ) { |
|---|
| 2143 |
if (regstr[0]>regstr[2]) |
|---|
| 2144 |
{ |
|---|
| 2145 |
assert(0, "Error:>"~regstr[0..3]~"< start of range of a character range is bigger than ending range"); |
|---|
| 2146 |
} |
|---|
| 2147 |
if (!options["i"]) |
|---|
| 2148 |
{ |
|---|
| 2149 |
char[] regstr0 = toLiteralChar(regstr[0]); |
|---|
| 2150 |
char[] regstr2 = toLiteralChar(regstr[2]); |
|---|
| 2151 |
return "("~ch~">="~regstr0~" && "~ch~"<="~regstr2~") || "~charMatches(regstr[3..$], ch, options); |
|---|
| 2152 |
} |
|---|
| 2153 |
else |
|---|
| 2154 |
{ |
|---|
| 2155 |
int is1, ie1; |
|---|
| 2156 |
char is2, ie2; |
|---|
| 2157 |
bool i1, i2; |
|---|
| 2158 |
char[] code1, code2; |
|---|
| 2159 |
i1 = intersectPeriods('a', 'z', regstr[0], regstr[2], is1, ie1); |
|---|
| 2160 |
if (i1) |
|---|
| 2161 |
{ |
|---|
| 2162 |
int isi, iei; |
|---|
| 2163 |
char uis, uie; |
|---|
| 2164 |
uis = toupper([is1])[0] ; uie = toupper([ie1])[0] ; |
|---|
| 2165 |
code1 = "("~ch~">="~toLiteralChar(uis)~" && "~ch~"<="~toLiteralChar(uie)~") || "; |
|---|
| 2166 |
if (intersectPeriods(uis, uie, regstr[0], regstr[2], isi, iei)) |
|---|
| 2167 |
{ |
|---|
| 2168 |
if (isi == uis && iei == uie) //UPPER-UPPER is in regstr[0]-regstr[2] |
|---|
| 2169 |
{ |
|---|
| 2170 |
code1 = ""; |
|---|
| 2171 |
} |
|---|
| 2172 |
} |
|---|
| 2173 |
} |
|---|
| 2174 |
i1 = intersectPeriods('A', 'Z', regstr[0], regstr[2], is1, ie1); |
|---|
| 2175 |
if (i1) |
|---|
| 2176 |
{ |
|---|
| 2177 |
int isi, iei; |
|---|
| 2178 |
char lis, lie; |
|---|
| 2179 |
lis = tolower([is1])[0] ; lie = tolower([ie1])[0] ; |
|---|
| 2180 |
code2 = "("~ch~">="~toLiteralChar(lis)~" && "~ch~"<="~toLiteralChar(lie)~") || "; |
|---|
| 2181 |
if (intersectPeriods(lis, lie, regstr[0], regstr[2], isi, iei)) |
|---|
| 2182 |
{ |
|---|
| 2183 |
if (isi == lis && iei == lie) //UPPER-UPPER is in regstr[0]-regstr[2] |
|---|
| 2184 |
{ |
|---|
| 2185 |
code2 = ""; |
|---|
| 2186 |
} |
|---|
| 2187 |
} |
|---|
| 2188 |
} |
|---|
| 2189 |
char[] regstr0 = toLiteralChar(regstr[0]); |
|---|
| 2190 |
char[] regstr2 = toLiteralChar(regstr[2]); |
|---|
| 2191 |
return code1~ code2~ "("~ch~">="~regstr0~" && "~ch~"<="~regstr2~") || "~charMatches(regstr[3..$], ch, options); |
|---|
| 2192 |
} |
|---|
| 2193 |
} |
|---|
| 2194 |
else if (regstr.length >= 2 && regstr[0..2] == "\\w") { |
|---|
| 2195 |
return charMatches("a-zA-Z0-9_", ch, options) ~ " || "~charMatches(regstr[2..$], ch, options); |
|---|
| 2196 |
} |
|---|
| 2197 |
else if (regstr.length >= 2 && regstr[0..2] == "\\d") { |
|---|
| 2198 |
return charMatches("0-9", ch, options)~" || "~charMatches(regstr[2..$], ch, options); |
|---|
| 2199 |
} |
|---|
| 2200 |
else if (regstr.length >= 2 && regstr[0..2] == "\\s") { |
|---|
| 2201 |
return charMatches(" \t\n\r", ch, options)~" || "~charMatches(regstr[2..$], ch, options); |
|---|
| 2202 |
} |
|---|
| 2203 |
else if (regstr.length >= 2 && regstr[0..2] == "\\W") { |
|---|
| 2204 |
return "!("~charMatches("a-zA-Z0-9_", ch, options) ~ ") || "~charMatches(regstr[2..$], ch, options); |
|---|
| 2205 |
} |
|---|
| 2206 |
else if (regstr.length >= 2 && regstr[0..2] == "\\D") { |
|---|
| 2207 |
return "!("~charMatches("0-9", ch, options)~") || "~charMatches(regstr[2..$], ch, options); |
|---|
| 2208 |
} |
|---|
| 2209 |
else if (regstr.length >= 2 && regstr[0..2] == "\\S") { |
|---|
| 2210 |
return "!("~charMatches(" \t\n\r", ch, options)~") || "~charMatches(regstr[2..$], ch, options); |
|---|
| 2211 |
} |
|---|
| 2212 |
else if (regstr[0] == '\\') |
|---|
| 2213 |
{ |
|---|
| 2214 |
if (regstr.length == 1) |
|---|
| 2215 |
{ |
|---|
| 2216 |
assert(0,"\nError:a character is missing after \\"); |
|---|
| 2217 |
} |
|---|
| 2218 |
char[] regstr0 = toLiteralChar(regstr[1]); |
|---|
| 2219 |
return "("~ch~"=="~regstr0~") || "~charMatches(regstr[2..$], ch, options); |
|---|
| 2220 |
} |
|---|
| 2221 |
else |
|---|
| 2222 |
{ |
|---|
| 2223 |
char[] regstr0 = toLiteralChar(regstr[0]); |
|---|
| 2224 |
return "("~ch~"=="~regstr0~") || "~charMatches(regstr[1..$], ch, options);} |
|---|
| 2225 |
} |
|---|
| 2226 |
|
|---|
| 2227 |
|
|---|
| 2228 |
//--------------------------------------------------------------------- |
|---|
| 2229 |
// Part III: the public interface of the regexp engine |
|---|
| 2230 |
//--------------------------------------------------------------------- |
|---|
| 2231 |
|
|---|
| 2232 |
// Does the regexp match the pattern? |
|---|
| 2233 |
template test(char [] fullpattern) |
|---|
| 2234 |
{ |
|---|
| 2235 |
|
|---|
| 2236 |
bool test(char [] searchstr, char [][] param...) { |
|---|
| 2237 |
int p = 0; // start at the beginning of the string |
|---|
| 2238 |
grouprec [] group; |
|---|
| 2239 |
mixin (parseRegexp(fullpattern)); |
|---|
| 2240 |
return engine(); |
|---|
| 2241 |
} |
|---|
| 2242 |
} |
|---|
| 2243 |
|
|---|
| 2244 |
class screg(char [] fullpattern) |
|---|
| 2245 |
{ |
|---|
| 2246 |
private: |
|---|
| 2247 |
int p; // next index to test |
|---|
| 2248 |
//int groupno=0; |
|---|
| 2249 |
int x; |
|---|
| 2250 |
|
|---|
| 2251 |
char[] searchstr; |
|---|
| 2252 |
char[][9] param; |
|---|
| 2253 |
public: |
|---|
| 2254 |
grouprec [] group; |
|---|
| 2255 |
this() |
|---|
| 2256 |
{ |
|---|
| 2257 |
p = 0; |
|---|
| 2258 |
} |
|---|
| 2259 |
this(int startp, char[][]parameters...) |
|---|
| 2260 |
{ |
|---|
| 2261 |
p = startp; |
|---|
| 2262 |
int i; |
|---|
| 2263 |
foreach(par;parameters) |
|---|
| 2264 |
param[i++] = par; |
|---|
| 2265 |
} |
|---|
| 2266 |
mixin (parseRegexp(fullpattern, true)); |
|---|
| 2267 |
alias match matches; |
|---|
| 2268 |
bool match(char[] searchstrin) |
|---|
| 2269 |
{ |
|---|
| 2270 |
group.length = 0; |
|---|
| 2271 |
searchstr = searchstrin; |
|---|
| 2272 |
//pragma(msg,"screg.match:"~fullpattern); |
|---|
| 2273 |
for (int x = p; x<searchstr.length;++x) { |
|---|
| 2274 |
p = x; |
|---|
| 2275 |
if (engine()) { |
|---|
| 2276 |
if (group.length == 0) |
|---|
| 2277 |
{ |
|---|
| 2278 |
group.length = 1; |
|---|
| 2279 |
} |
|---|
| 2280 |
group[0] = grouprec(x, p); |
|---|
| 2281 |
return true; |
|---|
| 2282 |
} |
|---|
| 2283 |
} |
|---|
| 2284 |
return false; |
|---|
| 2285 |
} |
|---|
| 2286 |
bool gmatch(char[] searchstrin) |
|---|
| 2287 |
{ |
|---|
| 2288 |
group.length = 0; |
|---|
| 2289 |
searchstr = searchstrin; |
|---|
| 2290 |
if (engine()) { |
|---|
| 2291 |
if (group.length == 0) |
|---|
| 2292 |
{ |
|---|
| 2293 |
group.length = 1; |
|---|
| 2294 |
} |
|---|
| 2295 |
group[0] = grouprec(x, p); |
|---|
| 2296 |
return true; |
|---|
| 2297 |
} |
|---|
| 2298 |
return false; |
|---|
| 2299 |
} |
|---|
| 2300 |
char[] _(int groupno) |
|---|
| 2301 |
{ |
|---|
| 2302 |
return .group(searchstr, group, groupno); |
|---|
| 2303 |
} |
|---|
| 2304 |
|
|---|
| 2305 |
char[] opIndex(int groupno) |
|---|
| 2306 |
{ |
|---|
| 2307 |
return .group(searchstr, group, groupno); |
|---|
| 2308 |
} |
|---|
| 2309 |
bool exists(int groupno) |
|---|
| 2310 |
{ |
|---|
| 2311 |
if (group.length >= groupno) |
|---|
| 2312 |
return false; |
|---|
| 2313 |
if (groupno<0) |
|---|
| 2314 |
return false; |
|---|
| 2315 |
} |
|---|
| 2316 |
alias ismatched defined; |
|---|
| 2317 |
bool ismatched(int groupno) |
|---|
| 2318 |
{ |
|---|
| 2319 |
if (groupno >= group.length) |
|---|
| 2320 |
return false; |
|---|
| 2321 |
return (group[groupno].end != - 1); |
|---|
| 2322 |
} |
|---|
| 2323 |
int pos() |
|---|
| 2324 |
{ |
|---|
| 2325 |
return p; |
|---|
| 2326 |
} |
|---|
| 2327 |
void pos(int pin) |
|---|
| 2328 |
{ |
|---|
| 2329 |
p = pin; |
|---|
| 2330 |
} |
|---|
| 2331 |
void restart() |
|---|
| 2332 |
{ |
|---|
| 2333 |
p = 0; |
|---|
| 2334 |
} |
|---|
| 2335 |
} |
|---|
| 2336 |
|
|---|
| 2337 |
/// Return first substring which matches the pattern. |
|---|
| 2338 |
/// Note that some patterns will return an empty string as a valid result. |
|---|
| 2339 |
//template search |
|---|
| 2340 |
//{ |
|---|
| 2341 |
char [] search(char [] fullpattern)(char [] searchstr, char [][] param...) { |
|---|
| 2342 |
int p; // next index to test |
|---|
| 2343 |
//int groupno=0; |
|---|
| 2344 |
grouprec [] group; |
|---|
| 2345 |
//pragma(msg,parseRegexp(fullpattern)); |
|---|
| 2346 |
mixin (parseRegexp(fullpattern)); |
|---|
| 2347 |
for (int x = 0; x<searchstr.length;++x) { |
|---|
| 2348 |
p = x; |
|---|
| 2349 |
if (engine()) return searchstr[x..p]; |
|---|
| 2350 |
} |
|---|
| 2351 |
return null; // no match |
|---|
| 2352 |
} |
|---|
| 2353 |
//} |
|---|
| 2354 |
|
|---|
| 2355 |
//simple version, escapes not supported |
|---|
| 2356 |
char [] parsetr(char[] fullpattern) |
|---|
| 2357 |
{ |
|---|
| 2358 |
if (fullpattern.length<3 || fullpattern[0..1] != "/") |
|---|
| 2359 |
assert(0,"tr has syntax error"); |
|---|
| 2360 |
int i; |
|---|
| 2361 |
char[] code ="char c2;\nswitch(c){"; |
|---|
| 2362 |
int s1, e1, s2, e2; |
|---|
| 2363 |
s1 = 1; |
|---|
| 2364 |
for (i = s1;i<fullpattern.length;i++) |
|---|
| 2365 |
{ |
|---|
| 2366 |
if (fullpattern[i] == '/') |
|---|
| 2367 |
{ |
|---|
| 2368 |
e1 = i; |
|---|
| 2369 |
break; |
|---|
| 2370 |
} |
|---|
| 2371 |
} |
|---|
| 2372 |
if (e1 == 0) |
|---|
| 2373 |
assert(0,"tr has syntax error,/ missing"); |
|---|
| 2374 |
s2 = e1 + 1; |
|---|
| 2375 |
|
|---|
| 2376 |
for (i = s2;i<fullpattern.length;i++) |
|---|
| 2377 |
{ |
|---|
| 2378 |
if (fullpattern[i] == '/') |
|---|
| 2379 |
{ |
|---|
| 2380 |
e2 = i; |
|---|
| 2381 |
break; |
|---|
| 2382 |
} |
|---|
| 2383 |
} |
|---|
| 2384 |
|
|---|
| 2385 |
if (e2 == 0) |
|---|
| 2386 |
assert(0,"tr has syntax error,last / missing"); |
|---|
| 2387 |
if (e2 - s2 != e1 - s1) |
|---|
| 2388 |
assert(0,"input character set is not the same as output character set"); |
|---|
| 2389 |
for (i = s1;i<e1;i++) |
|---|
| 2390 |
{ |
|---|
| 2391 |
char[] c = fullpattern[i..i + 1]; |
|---|
| 2392 |
if (c == "'") |
|---|
| 2393 |
c = "\\'"; |
|---|
| 2394 |
char[] c2 = fullpattern[s2..s2 + 1]; |
|---|
| 2395 |
s2++; |
|---|
| 2396 |
if (c2 == "'") |
|---|
| 2397 |
c2 = "\\'"; |
|---|
| 2398 |
code~="case '"~c~"': c2='"~c2~"';break;\n"; |
|---|
| 2399 |
} |
|---|
| 2400 |
|
|---|
| 2401 |
code~="default: c2=c;}"; |
|---|
| 2402 |
|
|---|
| 2403 |
char[]code2 = "char[] convert(){ |
|---|
| 2404 |
char[] outstr; |
|---|
| 2405 |
foreach(c;convertable) |
|---|
| 2406 |
{ |
|---|
| 2407 |
"~code~" |
|---|
| 2408 |
outstr~=c2; |
|---|
| 2409 |
} |
|---|
| 2410 |
return outstr;} |
|---|
| 2411 |
"; |
|---|
| 2412 |
//assert(0,code2); |
|---|
| 2413 |
return code2; |
|---|
| 2414 |
} |
|---|
| 2415 |
|
|---|
| 2416 |
char [] parselistofwords(char[] list) |
|---|
| 2417 |
{ |
|---|
| 2418 |
|
|---|
| 2419 |
int i; |
|---|
| 2420 |
char[] code = "switch(str){"; |
|---|
| 2421 |
int s1, e1, s2, e2; |
|---|
| 2422 |
s1 = 1; |
|---|
| 2423 |
char[][] strs; |
|---|
| 2424 |
while (i<list.length) |
|---|
| 2425 |
{ |
|---|
| 2426 |
int j = i; |
|---|
| 2427 |
while (list[j] >= '0' && list[j] <= '9' || |
|---|
| 2428 |
list[j] >= 'a' && list[j] <= 'z' || |
|---|
| 2429 |
list[j] >= 'A' && list[j] <= 'Z' || list[j] == '_' ) |
|---|
| 2430 |
{ |
|---|
| 2431 |
j++; |
|---|
| 2432 |
if (j >= list.length) |
|---|
| 2433 |
break; |
|---|
| 2434 |
} |
|---|
| 2435 |
if (i != j) |
|---|
| 2436 |
{ |
|---|
| 2437 |
//str[list[i..j]]=true; |
|---|
| 2438 |
code~="case \""~list[i..j]~"\": found=true;break;\n"; |
|---|
| 2439 |
i = j; |
|---|
| 2440 |
} |
|---|
| 2441 |
else |
|---|
| 2442 |
i++; |
|---|
| 2443 |
} |
|---|
| 2444 |
|
|---|
| 2445 |
code~="default: found=false;}"; |
|---|
| 2446 |
|
|---|
| 2447 |
char[]code2 = "bool ismatched(){ |
|---|
| 2448 |
bool found; |
|---|
| 2449 |
"~code~" |
|---|
| 2450 |
return found;} |
|---|
| 2451 |
"; |
|---|
| 2452 |
//assert(0,code2); |
|---|
| 2453 |
return code2; |
|---|
| 2454 |
} |
|---|
| 2455 |
|
|---|
| 2456 |
bool matchlist(char [] list)(char [] str) { |
|---|
| 2457 |
// int something; |
|---|
| 2458 |
//pragma(msg,parseRegexp(fullpattern)); |
|---|
| 2459 |
mixin (parselistofwords(list)); |
|---|
| 2460 |
//char[] o;//=convert_(); |
|---|
| 2461 |
return ismatched; |
|---|
| 2462 |
} |
|---|
| 2463 |
|
|---|
| 2464 |
|
|---|
| 2465 |
|
|---|
| 2466 |
// usage: tr!("/a/b/")(str) |
|---|
| 2467 |
char [] tr(char [] fullpattern)(char [] convertable) { |
|---|
| 2468 |
// int something; |
|---|
| 2469 |
//pragma(msg,parseRegexp(fullpattern)); |
|---|
| 2470 |
mixin (parsetr(fullpattern)); |
|---|
| 2471 |
//char[] o;//=convert_(); |
|---|
| 2472 |
return convert; // no match |
|---|
| 2473 |
} |
|---|
| 2474 |
|
|---|
| 2475 |
|
|---|
| 2476 |
|
|---|
| 2477 |
|
|---|
| 2478 |
|
|---|
| 2479 |
|
|---|
| 2480 |
//template searchgroups(char [] fullpattern) |
|---|
| 2481 |
//{ |
|---|
| 2482 |
grouprec [] indexgroups(char [] fullpattern)(char [] searchstr, char [][] param...) { |
|---|
| 2483 |
int p; // next index to test |
|---|
| 2484 |
//int groupno=0; |
|---|
| 2485 |
grouprec [] group; |
|---|
| 2486 |
// pragma(msg,"here 3"); |
|---|
| 2487 |
mixin (parseRegexp(fullpattern));// engine; |
|---|
| 2488 |
|
|---|
| 2489 |
for (int x = 0; x<searchstr.length;++x) { |
|---|
| 2490 |
p = x; |
|---|
| 2491 |
if (engine()) { |
|---|
| 2492 |
if (group.length == 0) |
|---|
| 2493 |
{ |
|---|
| 2494 |
group.length = 1; |
|---|
| 2495 |
} |
|---|
| 2496 |
group[0] = grouprec(x, p); |
|---|
| 2497 |
return group; |
|---|
| 2498 |
} |
|---|
| 2499 |
} |
|---|
| 2500 |
return null; // no match |
|---|
| 2501 |
} |
|---|
| 2502 |
//} |
|---|
| 2503 |
grouprec [][] indexgroupsall(char [] fullpattern)(char [] searchstr, int startindex = 0, char [][] param = []) { |
|---|
| 2504 |
int p; // next index to test |
|---|
| 2505 |
grouprec [] group; |
|---|
| 2506 |
mixin (parseRegexp(fullpattern)); |
|---|
| 2507 |
grouprec [][] solutions; |
|---|
| 2508 |
int soli = 0; |
|---|
| 2509 |
for (int x = startindex; x<searchstr.length;++x) { |
|---|
| 2510 |
p = x; |
|---|
| 2511 |
if (engine()) |
|---|
| 2512 |
{ |
|---|
| 2513 |
if (group.length == 0) |
|---|
| 2514 |
{ |
|---|
| 2515 |
group.length = 1; |
|---|
| 2516 |
} |
|---|
| 2517 |
group[0] = grouprec(x, p); |
|---|
| 2518 |
solutions.length = soli + 1; |
|---|
| 2519 |
solutions[soli] = group; |
|---|
| 2520 |
soli++; |
|---|
| 2521 |
x = p - 1; |
|---|
| 2522 |
} |
|---|
| 2523 |
} |
|---|
| 2524 |
return solutions; // return all |
|---|
| 2525 |
} |
|---|
| 2526 |
|
|---|
| 2527 |
char [] group(char[] str, grouprec [] g, int groupno) |
|---|
| 2528 |
{ |
|---|
| 2529 |
if (g[groupno].end == - 1) |
|---|
| 2530 |
{ |
|---|
| 2531 |
return []; |
|---|
| 2532 |
} |
|---|
| 2533 |
return str[g[groupno].start..g[groupno].end]; |
|---|
| 2534 |
} |
|---|
| 2535 |
|
|---|
| 2536 |
|
|---|
| 2537 |
|
|---|
| 2538 |
template index(char [] fullpattern) |
|---|
| 2539 |
{ |
|---|
| 2540 |
int index(char [] searchstr, int startindex, char [][] param...) { |
|---|
| 2541 |
int p; // next index to test |
|---|
| 2542 |
grouprec [] group; |
|---|
| 2543 |
int rp; |
|---|
| 2544 |
mixin (parseRegexp(fullpattern)); |
|---|
| 2545 |
for (int x = startindex; x<searchstr.length;++x) { |
|---|
| 2546 |
p = x; |
|---|
| 2547 |
rp = 0; |
|---|
| 2548 |
if (engine()) return x; |
|---|
| 2549 |
} |
|---|
| 2550 |
return - 1; // no match |
|---|
| 2551 |
} |
|---|
| 2552 |
} |
|---|
| 2553 |
|
|---|
| 2554 |
struct indexrec |
|---|
| 2555 |
{ |
|---|
| 2556 |
int start; |
|---|
| 2557 |
int end; //as for array slices, last char + 1 |
|---|
| 2558 |
} |
|---|
| 2559 |
|
|---|
| 2560 |
struct grouprec |
|---|
| 2561 |
{ |
|---|
| 2562 |
int start; |
|---|
| 2563 |
int end = - 1; //as for array slices, last char + 1 |
|---|
| 2564 |
char[] toString() |
|---|
| 2565 |
{ |
|---|
| 2566 |
version(Tango) |
|---|
| 2567 |
return Stdout.layout.convert("[{},{}]", start, end); |
|---|
| 2568 |
else |
|---|
| 2569 |
return format("[%d,%d]", start, end); |
|---|
| 2570 |
} |
|---|
| 2571 |
} |
|---|
| 2572 |
|
|---|
| 2573 |
template index2(char [] fullpattern) |
|---|
| 2574 |
{ |
|---|
| 2575 |
indexrec index2(char [] searchstr, int startindex, char [][] param...) { |
|---|
| 2576 |
int p; // next index to test |
|---|
| 2577 |
int rp; |
|---|
| 2578 |
grouprec [] group; |
|---|
| 2579 |
mixin (parseRegexp(fullpattern)); |
|---|
| 2580 |
for (int x = startindex; x<searchstr.length;++x) { |
|---|
| 2581 |
p = x; |
|---|
| 2582 |
rp = 0; |
|---|
| 2583 |
if (engine()) return indexrec(x, p); |
|---|
| 2584 |
} |
|---|
| 2585 |
return indexrec( - 1, - 1); // no match |
|---|
| 2586 |
} |
|---|
| 2587 |
} |
|---|
| 2588 |
|
|---|
| 2589 |
template indexall(char [] fullpattern) |
|---|
| 2590 |
{ |
|---|
| 2591 |
indexrec[] indexall(char [] searchstr, int startindex = 0, char [][] param = []) { |
|---|
| 2592 |
int p; // next index to test |
|---|
| 2593 |
grouprec [] group; |
|---|
| 2594 |
mixin (parseRegexp(fullpattern)); |
|---|
| 2595 |
indexrec[] solutions; |
|---|
| 2596 |
int soli = 0; |
|---|
| 2597 |
for (int x = startindex; x<searchstr.length;++x) { |
|---|
| 2598 |
p = x; |
|---|
| 2599 |
if (engine()) |
|---|
| 2600 |
{ |
|---|
| 2601 |
solutions.length = soli + 1; |
|---|
| 2602 |
solutions[soli] = indexrec(x, p); |
|---|
| 2603 |
soli++; |
|---|
| 2604 |
x = p - 1; |
|---|
| 2605 |
} |
|---|
| 2606 |
} |
|---|
| 2607 |
return solutions; // return all |
|---|
| 2608 |
} |
|---|
| 2609 |
} |
|---|
| 2610 |
|
|---|
| 2611 |
template indexalloverlapping(char [] fullpattern) |
|---|
| 2612 |
{ |
|---|
| 2613 |
indexrec[] indexalloverlapping(char [] searchstr, int startindex = 0, char [][] param = []) { |
|---|
| 2614 |
int p; // next index to test |
|---|
| 2615 |
grouprec [] group; |
|---|
| 2616 |
mixin (parseRegexp(fullpattern)); |
|---|
| 2617 |
indexrec[] solutions; |
|---|
| 2618 |
int soli = 0; |
|---|
| 2619 |
for (int x = startindex; x<searchstr.length;++x) { |
|---|
| 2620 |
p = x; |
|---|
| 2621 |
if (engine()) |
|---|
| 2622 |
{ |
|---|
| 2623 |
solutions.length = soli + 1; |
|---|
| 2624 |
solutions[soli] = indexrec(x, p); |
|---|
| 2625 |
soli++; |
|---|
| 2626 |
} |
|---|
| 2627 |
} |
|---|
| 2628 |
return solutions; // return all |
|---|
| 2629 |
} |
|---|
| 2630 |
} |
|---|
| 2631 |
|
|---|
| 2632 |
|
|---|
| 2633 |
|
|---|
| 2634 |
|
|---|
| 2635 |
template searchall(char [] fullpattern) |
|---|
| 2636 |
{ |
|---|
| 2637 |
char[][] searchall(char [] searchstr, char [][] param...) { |
|---|
| 2638 |
int p; // next index to test |
|---|
| 2639 |
grouprec [] group; |
|---|
| 2640 |
// pragma(msg, "here"~parseRegexp(fullpattern)); |
|---|
| 2641 |
mixin (parseRegexp(fullpattern)); |
|---|
| 2642 |
char[][] solutions; |
|---|
| 2643 |
int soli = 0; |
|---|
| 2644 |
for (int x = 0; x<searchstr.length;++x) { |
|---|
| 2645 |
p = x; |
|---|
| 2646 |
//writefln("starting ",x); |
|---|
| 2647 |
if (engine()) |
|---|
| 2648 |
{ |
|---|
| 2649 |
solutions.length = soli + 1; |
|---|
| 2650 |
solutions[soli] = searchstr[x..p]; |
|---|
| 2651 |
//writefln(">>", searchstr[x..p], "<<"); |
|---|
| 2652 |
soli++; |
|---|
| 2653 |
x = p - 1; |
|---|
| 2654 |
} |
|---|
| 2655 |
} |
|---|
| 2656 |
return solutions; // return them |
|---|
| 2657 |
} |
|---|
| 2658 |
} |
|---|
| 2659 |
|
|---|
| 2660 |
template searchalloverlapping(char [] fullpattern) |
|---|
| 2661 |
{ |
|---|
| 2662 |
char[][] searchalloverlapping(char [] searchstr, char [][] param...) { |
|---|
| 2663 |
int p; // next index to test |
|---|
| 2664 |
grouprec [] group; |
|---|
| 2665 |
// pragma(msg, "here"~parseRegexp(fullpattern)); |
|---|
| 2666 |
mixin (parseRegexp(fullpattern)); |
|---|
| 2667 |
char[][] solutions; |
|---|
| 2668 |
int soli = 0; |
|---|
| 2669 |
for (int x = 0; x<searchstr.length;++x) { |
|---|
| 2670 |
p = x; |
|---|
| 2671 |
//writefln("starting ",x); |
|---|
| 2672 |
if (engine()) |
|---|
| 2673 |
{ |
|---|
| 2674 |
solutions.length = soli + 1; |
|---|
| 2675 |
solutions[soli] = searchstr[x..p]; |
|---|
| 2676 |
//writefln(">>", searchstr[x..p], "<<"); |
|---|
| 2677 |
soli++; |
|---|
| 2678 |
} |
|---|
| 2679 |
} |
|---|
| 2680 |
return solutions; // return them |
|---|
| 2681 |
} |
|---|
| 2682 |
} |
|---|
| 2683 |
version(none) |
|---|
| 2684 |
//version(shortscregexpfuncnames) |
|---|
| 2685 |
{ |
|---|
| 2686 |
alias test t; |
|---|
| 2687 |
alias index i; |
|---|
| 2688 |
alias index2 i2; |
|---|
| 2689 |
alias search s; |
|---|
| 2690 |
alias searchall sa; |
|---|
| 2691 |
alias indexall ia; |
|---|
| 2692 |
alias indexgroups ig; |
|---|
| 2693 |
alias indexgroupsall iga; |
|---|
| 2694 |
} |
|---|
| 2695 |
|
|---|
| 2696 |
|
|---|
| 2697 |
|
|---|
| 2698 |
//--------------------------------------------------------------------- |
|---|
| 2699 |
// EXAMPLE |
|---|
| 2700 |
//--------------------------------------------------------------------- |
|---|
| 2701 |
version(Phobos) |
|---|
| 2702 |
import std.stdio; |
|---|
| 2703 |
version(test) |
|---|
| 2704 |
{ |
|---|
| 2705 |
void main() |
|---|
| 2706 |
{ |
|---|
| 2707 |
version(Phobos) |
|---|
| 2708 |
writefln("BEGINNING UNIT TESTS\n"); |
|---|
| 2709 |
else |
|---|
| 2710 |
Stdout("BEGINNING UNIT TESTS\n").newline; |
|---|
| 2711 |
assert(search!("ab")("aaab") == "ab"); |
|---|
| 2712 |
|
|---|
| 2713 |
|
|---|
| 2714 |
assert(search!("a*b")("aaab") == "aaab"); |
|---|
| 2715 |
assert(search!("a*(b)")("aaab") == "aaab"); |
|---|
| 2716 |
|
|---|
| 2717 |
assert(search!("((a*b))")("aaab") == "aaab"); |
|---|
| 2718 |
assert(search!("(a*)b")("aaab") == "aaab"); |
|---|
| 2719 |
|
|---|
| 2720 |
assert(search!("(?:b*a*)*b")("aaab") == "aaab"); |
|---|
| 2721 |
|
|---|
| 2722 |
assert(search!("b+cd")("acdbbcabbcdaaab") == "bbcd"); |
|---|
| 2723 |
|
|---|
| 2724 |
assert(search!("b?cd")("abcacbacdb") == "cd"); |
|---|
| 2725 |
|
|---|
| 2726 |
assert(search!("(ab)?abc")("aababcab") == "ababc"); |
|---|
| 2727 |
assert(search!("(?:ab)*abc")("aababcab") == "ababc"); |
|---|
| 2728 |
assert(search!("((?:a)*|xyz)b")("aaab") == "aaab"); |
|---|
| 2729 |
|
|---|
| 2730 |
assert(search!("(?:ab)*(abb)")("bababb") == "ababb"); |
|---|
| 2731 |
assert(search!("e?(?:ab)*b+?")("eaaababbbbaac") == "ababb"); |
|---|
| 2732 |
assert(search!("(?:ab*)*c")("bbbababbaaabaaaabbbbc") == "ababbaaabaaaabbbbc"); |
|---|
| 2733 |
char [] quasistatic = "m"; |
|---|
| 2734 |
assert(search!("(@1.*?@1)")("they said D can't do metaprogramming?", quasistatic) == "metaprogram"); |
|---|
| 2735 |
assert(search!("[h-za]*g")("metaprogramming") == "taprog"); |
|---|
| 2736 |
assert(search!("(?:a*)*b")("cacaaab") == "aaab"); |
|---|
| 2737 |
assert(search!("(?:a*b*)*c")("dababdaabababbaaabbbcab") == "aabababbaaabbbc"); |
|---|
| 2738 |
assert(search!("(?:(?:a*b*)|da)*b")("fasdaaab") == "daaab"); |
|---|
| 2739 |
assert(search!("aaab??")("aaabbb") == "aaa"); |
|---|
| 2740 |
assert(search!("aaab?")("aaabbb") == "aaab"); |
|---|
| 2741 |
//assert(search!("aaa?")("aa") == "aa"); |
|---|
| 2742 |
|
|---|
| 2743 |
char [] qq; |
|---|
| 2744 |
version(Phobos) |
|---|
| 2745 |
writefln("========="); |
|---|
| 2746 |
version(Tango) |
|---|
| 2747 |
Stdout("=========").newline; |
|---|
| 2748 |
qq = search!("(?:(?:a*b*)|da)*b")("fasdaaab"); |
|---|
| 2749 |
version(Phobos) writefln("Result: ----", qq, "---"); |
|---|
| 2750 |
version(Tango) Stdout("Result: ----", qq, "---").newline; |
|---|
| 2751 |
version(Phobos) writefln("END OF UNIT TESTS\n"); |
|---|
| 2752 |
version(Tango) Stdout("END OF UNIT TESTS\n").newline; |
|---|
| 2753 |
version(Phobos) writefln("All tests are passed if you set -debug"); |
|---|
| 2754 |
version(Tango) Stdout("All tests are passed if you set -debug").newline; |
|---|
| 2755 |
|
|---|
| 2756 |
} |
|---|
| 2757 |
} |
|---|
| 2758 |
//------------------------------------------------------------------------------- |
|---|
| 2759 |
/+ |
|---|
| 2760 |
|
|---|
| 2761 |
// NOT CURRENTLY USED |
|---|
| 2762 |
|
|---|
| 2763 |
// Finds the number of instances of 'ch' in str which aren't preceded by a backslash |
|---|
| 2764 |
// ch must not be a backslash. |
|---|
| 2765 |
template unescapedCount(char [] str, char ch) |
|---|
| 2766 |
{ |
|---|
| 2767 |
static if (str.length==0) const int unescapedCount = 0; |
|---|
| 2768 |
else static if (str[0]=='\\' && str.length>1) const int unescapedCount = unescapedCount!(str[2..$], ch); |
|---|
| 2769 |
else static if (str[0]==ch) const int unescapedCount = 1 + unescapedCount!(str[1..$], ch); |
|---|
| 2770 |
else const int unescapedCount = unescapedCount!(str[1..$], ch); |
|---|
| 2771 |
} |
|---|
| 2772 |
|
|---|
| 2773 |
+/ |
|---|
| 2774 |
|
|---|
| 2775 |
|
|---|
| 2776 |
|
|---|
| 2777 |
void searchgroupstest(char[]reg)(char[]input) |
|---|
| 2778 |
{ |
|---|
| 2779 |
version(Phobos) |
|---|
| 2780 |
writefln("reg:%s", reg," input:%s", input); |
|---|
| 2781 |
version(Tango) |
|---|
| 2782 |
Stdout.format("reg:{} input:{}", reg, input).newline; |
|---|
| 2783 |
int groupno = 0; |
|---|
| 2784 |
foreach(e;indexgroups!(reg)(input)) |
|---|
| 2785 |
{ |
|---|
| 2786 |
if (e.start <= e.end) |
|---|
| 2787 |
{ |
|---|
| 2788 |
version(Phobos) |
|---|
| 2789 |
writefln("group %d:>>%s<<", groupno, input[e.start..e.end]); |
|---|
| 2790 |
version(Tango) |
|---|
| 2791 |
Stdout.format("group {}:>>{}<<", groupno, input[e.start..e.end]).newline; |
|---|
| 2792 |
} |
|---|
| 2793 |
else |
|---|
| 2794 |
{ |
|---|
| 2795 |
version(Phobos) |
|---|
| 2796 |
writefln("group %d:", groupno, "[", e.start, ",", e.end, "]"); |
|---|
| 2797 |
version(Tango) |
|---|
| 2798 |
Stdout.format("group {}:", groupno, "[", e.start, ",", e.end, "]").newline; |
|---|
| 2799 |
} |
|---|
| 2800 |
groupno++; |
|---|
| 2801 |
} |
|---|
| 2802 |
version(Phobos) writefln("-------------------"); |
|---|
| 2803 |
version(Tango) Stdout("-------------------").newline; |
|---|
| 2804 |
} |
|---|
| 2805 |
|
|---|
| 2806 |
void searchgroupsalltest(char[]reg)(char[]input) |
|---|
| 2807 |
{ |
|---|
| 2808 |
int sno = 0; |
|---|
| 2809 |
|
|---|
| 2810 |
version(Phobos) |
|---|
| 2811 |
writefln("reg:%s", reg," input:%s", input); |
|---|
| 2812 |
version(Tango) |
|---|
| 2813 |
Stdout.format("reg:{} input:{}", reg, input).newline; |
|---|
| 2814 |
foreach(s;indexgroupsall!(reg)(input)) |
|---|
| 2815 |
{ |
|---|
| 2816 |
version(Phobos) writefln("solution ", sno++); |
|---|
| 2817 |
version(Tango) Stdout.format("solution {}", sno++).newline; |
|---|
| 2818 |
foreach(e;s) |
|---|
| 2819 |
{ |
|---|
| 2820 |
if (e.start <= e.end) |
|---|
| 2821 |
{ |
|---|
| 2822 |
version(Phobos) writefln("group >>%s<<", input[e.start..e.end]); |
|---|
| 2823 |
version(Tango)Stdout.format("group >>{}<<", input[e.start..e.end]).newline; |
|---|
| 2824 |
} |
|---|
| 2825 |
else |
|---|
| 2826 |
{ |
|---|
| 2827 |
version(Phobos) writefln("group %s", e.start," ", e.end); |
|---|
| 2828 |
version(Tango)Stdout.format("group {} {}", e.start, e.end).newline; |
|---|
| 2829 |
} |
|---|
| 2830 |
} |
|---|
| 2831 |
} |
|---|
| 2832 |
version(Phobos) writefln("-------------------"); |
|---|
| 2833 |
version(Tango)Stdout("-------------------").newline; |
|---|
| 2834 |
} |
|---|
| 2835 |
void searchgroupstest2(char[]reg)(char[]input, char[][] target) |
|---|
| 2836 |
{ |
|---|
| 2837 |
version(Phobos) writefln("reg:%s", reg," input:%s", input); |
|---|
| 2838 |
version(Tango)Stdout.format("reg:{} input:{}", reg, input).newline; |
|---|
| 2839 |
pragma(msg, "searchgroupstest2:"~reg); |
|---|
| 2840 |
int i = 0; |
|---|
| 2841 |
int oi = 0; |
|---|
| 2842 |
int failed = 0; |
|---|
| 2843 |
foreach(e;indexgroups!(reg)(input)) { |
|---|
| 2844 |
bool compare = true; |
|---|
| 2845 |
if (e.start <= e.end) |
|---|
| 2846 |
{ |
|---|
| 2847 |
version(Phobos) writef("group", i," >>%s<<", input[e.start..e.end]); |
|---|
| 2848 |
version(Tango)Stdout.format ("group {} >>{}<<", i, input[e.start..e.end]); |
|---|
| 2849 |
} |
|---|
| 2850 |
else |
|---|
| 2851 |
{ |
|---|
| 2852 |
version(Phobos) writef("group", i," ","not matched [", e.start," ", e.end, "]"); |
|---|
| 2853 |
version(Tango)Stdout.format ("group {} not matched [{} {}]", i, e.start, e.end); |
|---|
| 2854 |
} |
|---|
| 2855 |
if (i >= target.length) |
|---|
| 2856 |
{ |
|---|
| 2857 |
//writefln("\nnumber of groups failed"); |
|---|
| 2858 |
failed++; |
|---|
| 2859 |
//writefln("\n"); |
|---|
| 2860 |
compare = false; |
|---|
| 2861 |
} |
|---|
| 2862 |
// writefln(">range ",e.start," ",e.end); |
|---|
| 2863 |
if (e.start>e.end ) |
|---|
| 2864 |
{ |
|---|
| 2865 |
// writefln(">range ",e.start," ",e.end); |
|---|
| 2866 |
if (target.length>i && target[i] is null) |
|---|
| 2867 |
{ |
|---|
| 2868 |
version(Phobos) writefln(" passed"); |
|---|
| 2869 |
version(Tango)Stdout (" passed").newline; |
|---|
| 2870 |
} |
|---|
| 2871 |
else |
|---|
| 2872 |
{ |
|---|
| 2873 |
version(Phobos) writefln(" failed"); |
|---|
| 2874 |
version(Tango)Stdout (" failed").newline; |
|---|
| 2875 |
failed++; |
|---|
| 2876 |
} |
|---|
| 2877 |
} |
|---|
| 2878 |
else if (compare && input[e.start..e.end] == target[i]) |
|---|
| 2879 |
{ |
|---|
| 2880 |
version(Phobos) writefln(" passed"); |
|---|
| 2881 |
version(Tango)Stdout (" passed").newline; |
|---|
| 2882 |
} |
|---|
| 2883 |
else |
|---|
| 2884 |
{ |
|---|
| 2885 |
if (compare) |
|---|
| 2886 |
{ |
|---|
| 2887 |
version(Phobos) writefln(" failed expected %s", target[i]); |
|---|
| 2888 |
version(Tango)Stdout.format (" failed expected {}", target[i]).newline; |
|---|
| 2889 |
} |
|---|
| 2890 |
failed++; |
|---|
| 2891 |
} |
|---|
| 2892 |
oi++; |
|---|
| 2893 |
i++; |
|---|
| 2894 |
} |
|---|
| 2895 |
if (oi != target.length) |
|---|
| 2896 |
{ |
|---|
| 2897 |
version(Phobos) writefln("number of groups failed"); |
|---|
| 2898 |
version(Tango)Stdout ("number of groups failed").newline; |
|---|
| 2899 |
failed++; |
|---|
| 2900 |
} |
|---|
| 2901 |
else |
|---|
| 2902 |
{ |
|---|
| 2903 |
version(Phobos) writefln("number of groups is passed"); |
|---|
| 2904 |
version(Tango)Stdout ("number of groups is passed").newline; |
|---|
| 2905 |
} |
|---|
| 2906 |
version(Phobos) writefln("-------------------"); |
|---|
| 2907 |
version(Tango)Stdout ("-------------------").newline; |
|---|
| 2908 |
if (failed>0) |
|---|
| 2909 |
{ |
|---|
| 2910 |
version(Phobos) writefln("test failed"); |
|---|
| 2911 |
version(Tango)Stdout ("test failed").newline; |
|---|
| 2912 |
assert(0); |
|---|
| 2913 |
} |
|---|
| 2914 |
else |
|---|
| 2915 |
{ |
|---|
| 2916 |
version(Phobos) writefln("test ok"); |
|---|
| 2917 |
version(Tango)Stdout ("test ok").newline; |
|---|
| 2918 |
} |
|---|
| 2919 |
} |
|---|
| 2920 |
|
|---|
| 2921 |
void printcode(char[] reg) |
|---|
| 2922 |
{ |
|---|
| 2923 |
version(Phobos) writefln("%s", parseRegexp(reg)); |
|---|
| 2924 |
version(Tango)Stdout.format ("{}", parseRegexp(reg)).newline; |
|---|
| 2925 |
} |
|---|
| 2926 |
|
|---|
| 2927 |
void printclasscode(char[] reg) |
|---|
| 2928 |
{ |
|---|
| 2929 |
version(Phobos) writefln("%s", getclasscode(reg)); |
|---|
| 2930 |
version(Tango)Stdout.format ("{}", getclasscode(reg)).newline; |
|---|
| 2931 |
} |
|---|
| 2932 |
|
|---|
| 2933 |
char[] getclasscode(char[] reg) |
|---|
| 2934 |
{ |
|---|
| 2935 |
char [] code; |
|---|
| 2936 |
code=" |
|---|
| 2937 |
version(Tango) |
|---|
| 2938 |
{} |
|---|
| 2939 |
else |
|---|
| 2940 |
{ version = Phobos;} |
|---|
| 2941 |
version(Phobos) |
|---|
| 2942 |
import std.string; |
|---|
| 2943 |
version(Tango) |
|---|
| 2944 |
{ |
|---|
| 2945 |
import tango.text.Ascii; |
|---|
| 2946 |
alias toUpper toupper; |
|---|
| 2947 |
alias toLower tolower; |
|---|
| 2948 |
alias icompare icmp; |
|---|
| 2949 |
import tango.io.Stdout; |
|---|
| 2950 |
} |
|---|
| 2951 |
struct grouprec |
|---|
| 2952 |
{ |
|---|
| 2953 |
int start; |
|---|
| 2954 |
int end = - 1; //as for array slices, last char + 1 |
|---|
| 2955 |
char[] toString() |
|---|
| 2956 |
{ |
|---|
| 2957 |
version(Tango) |
|---|
| 2958 |
return Stdout.layout.convert(\"[{},{}]\", start, end); |
|---|
| 2959 |
else |
|---|
| 2960 |
return format(\"[%d,%d]\", start, end); |
|---|
| 2961 |
} |
|---|
| 2962 |
} |
|---|
| 2963 |
char [] group(char[] str, grouprec [] g, int groupno) |
|---|
| 2964 |
{ |
|---|
| 2965 |
if (g[groupno].end == - 1) |
|---|
| 2966 |
{ |
|---|
| 2967 |
return []; |
|---|
| 2968 |
} |
|---|
| 2969 |
return str[g[groupno].start..g[groupno].end]; |
|---|
| 2970 |
} |
|---|
| 2971 |
|
|---|
| 2972 |
class screg |
|---|
| 2973 |
{ |
|---|
| 2974 |
private: |
|---|
| 2975 |
int p; // next index to test |
|---|
| 2976 |
//int groupno=0; |
|---|
| 2977 |
int x; |
|---|
| 2978 |
|
|---|
| 2979 |
char[] searchstr; |
|---|
| 2980 |
char[][9] param; |
|---|
| 2981 |
public: |
|---|
| 2982 |
grouprec [] group; |
|---|
| 2983 |
this() |
|---|
| 2984 |
{ |
|---|
| 2985 |
p = 0; |
|---|
| 2986 |
} |
|---|
| 2987 |
this(int startp, char[][]parameters...) |
|---|
| 2988 |
{ |
|---|
| 2989 |
p = startp; |
|---|
| 2990 |
int i; |
|---|
| 2991 |
foreach(par;parameters) |
|---|
| 2992 |
param[i++] = par; |
|---|
| 2993 |
}"~ |
|---|
| 2994 |
parseRegexp(reg, true)~ |
|---|
| 2995 |
" |
|---|
| 2996 |
alias match matches; |
|---|
| 2997 |
bool match(char[] searchstrin) |
|---|
| 2998 |
{ |
|---|
| 2999 |
group.length = 0; |
|---|
| 3000 |
searchstr = searchstrin; |
|---|
| 3001 |
//pragma(msg,\"screg.match:\"~fullpattern); |
|---|
| 3002 |
for (int x = p; x<searchstr.length;++x) { |
|---|
| 3003 |
p = x; |
|---|
| 3004 |
if (engine()) { |
|---|
| 3005 |
if (group.length == 0) |
|---|
| 3006 |
{ |
|---|
| 3007 |
group.length = 1; |
|---|
| 3008 |
} |
|---|
| 3009 |
group[0] = grouprec(x, p); |
|---|
| 3010 |
return true; |
|---|
| 3011 |
} |
|---|
| 3012 |
} |
|---|
| 3013 |
return false; |
|---|
| 3014 |
} |
|---|
| 3015 |
bool gmatch(char[] searchstrin) |
|---|
| 3016 |
{ |
|---|
| 3017 |
group.length = 0; |
|---|
| 3018 |
searchstr = searchstrin; |
|---|
| 3019 |
if (engine()) { |
|---|
| 3020 |
if (group.length == 0) |
|---|
| 3021 |
{ |
|---|
| 3022 |
group.length = 1; |
|---|
| 3023 |
} |
|---|
| 3024 |
group[0] = grouprec(x, p); |
|---|
| 3025 |
return true; |
|---|
| 3026 |
} |
|---|
| 3027 |
return false; |
|---|
| 3028 |
} |
|---|
| 3029 |
char[] _(int groupno) |
|---|
| 3030 |
{ |
|---|
| 3031 |
return .group(searchstr, group, groupno); |
|---|
| 3032 |
} |
|---|
| 3033 |
|
|---|
| 3034 |
char[] opIndex(int groupno) |
|---|
| 3035 |
{ |
|---|
| 3036 |
return .group(searchstr, group, groupno); |
|---|
| 3037 |
} |
|---|
| 3038 |
bool exists(int groupno) |
|---|
| 3039 |
{ |
|---|
| 3040 |
if (group.length >= groupno) |
|---|
| 3041 |
return false; |
|---|
| 3042 |
if (groupno<0) |
|---|
| 3043 |
return false; |
|---|
| 3044 |
} |
|---|
| 3045 |
alias ismatched defined; |
|---|
| 3046 |
bool ismatched(int groupno) |
|---|
| 3047 |
{ |
|---|
| 3048 |
if (groupno >= group.length) |
|---|
| 3049 |
return false; |
|---|
| 3050 |
return (group[groupno].end != - 1); |
|---|
| 3051 |
} |
|---|
| 3052 |
int pos() |
|---|
| 3053 |
{ |
|---|
| 3054 |
return p; |
|---|
| 3055 |
} |
|---|
| 3056 |
void pos(int pin) |
|---|
| 3057 |
{ |
|---|
| 3058 |
p = pin; |
|---|
| 3059 |
} |
|---|
| 3060 |
void restart() |
|---|
| 3061 |
{ |
|---|
| 3062 |
p = 0; |
|---|
| 3063 |
} |
|---|
| 3064 |
} |
|---|
| 3065 |
"; |
|---|
| 3066 |
return code; |
|---|
| 3067 |
} |
|---|
| 3068 |
|
|---|
| 3069 |
//------------- |
|---|
| 3070 |
// unit tests |
|---|
| 3071 |
//------------- |
|---|
| 3072 |
version (testmeta) { |
|---|
| 3073 |
static assert(quantifierConsumed("{456}345") == 5); |
|---|
| 3074 |
static assert(parenConsumed("(45(6)4)5") == 8); |
|---|
| 3075 |
static assert(parenConsumed(`(45\(6)45`) == 7); |
|---|
| 3076 |
} |
|---|
| 3077 |
|
|---|
| 3078 |
/* |
|---|
| 3079 |
Copyright (c) 2006 Walter Bright |
|---|
| 3080 |
(basic framework, regular expression engine, basic documentation) |
|---|
| 3081 |
Copyright (c) 2007-2009 Marton Papp |
|---|
| 3082 |
(added /w,/s,/d,extended character classes, added groups and backreferences, |
|---|
| 3083 |
options (msxi), extended documentation, non-greedy constucts |
|---|
| 3084 |
converted testing functions into Tango |
|---|
| 3085 |
) |
|---|
| 3086 |
Copyright (c) 2008 (yidabu g m a i l at com) All rights reserved |
|---|
| 3087 |
*modified by yidabu to make it work with Tango |
|---|
| 3088 |
( D Programming Language China : http://www.d-programming-language-china.org/ ) |
|---|
| 3089 |
All rights reserved. |
|---|
| 3090 |
|
|---|
| 3091 |
Redistribution and use in source and binary forms, with or without |
|---|
| 3092 |
modification, are permitted provided that the following conditions |
|---|
| 3093 |
are met: |
|---|
| 3094 |
1. Redistributions of source code must retain the above copyright |
|---|
| 3095 |
notice, this list of conditions and the following disclaimer. |
|---|
| 3096 |
2. Redistributions in binary form must reproduce the above copyright |
|---|
| 3097 |
notice, this list of conditions and the following disclaimer in the |
|---|
| 3098 |
documentation and/or other materials provided with the distribution. |
|---|
| 3099 |
3. The name of the author may not be used to endorse or promote products |
|---|
| 3100 |
derived from this software without specific prior written permission. |
|---|
| 3101 |
|
|---|
| 3102 |
THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR |
|---|
| 3103 |
IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES |
|---|
| 3104 |
OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. |
|---|
| 3105 |
IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, |
|---|
| 3106 |
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT |
|---|
| 3107 |
NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, |
|---|
| 3108 |
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY |
|---|
| 3109 |
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT |
|---|
| 3110 |
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF |
|---|
| 3111 |
THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. |
|---|
| 3112 |
*/ |
|---|