| 1 |
Ddoc |
|---|
| 2 |
|
|---|
| 3 |
$(D_S Regular Expressions, |
|---|
| 4 |
|
|---|
| 5 |
$(P Regular expressions are a powerful tool for |
|---|
| 6 |
pattern matching on strings of text. They |
|---|
| 7 |
are built in to the core of languages like Perl, |
|---|
| 8 |
Ruby, and Javascript. Perl and Ruby are particulary |
|---|
| 9 |
reknowned for adroitly handling regular expressions. |
|---|
| 10 |
So why aren't they part of the D core language? |
|---|
| 11 |
Read on and see how they're done in D compared with Ruby. |
|---|
| 12 |
) |
|---|
| 13 |
|
|---|
| 14 |
$(P This article explains how to use regular expressions |
|---|
| 15 |
in D. It doesn't explain regular expressions themselves, |
|---|
| 16 |
after all, people have written entire books on that topic. |
|---|
| 17 |
D's specific implementation of regular expressions |
|---|
| 18 |
is entirely contained in the Phobos library module |
|---|
| 19 |
$(LINK2 phobos/std_regexp.html, std.regexp). |
|---|
| 20 |
For a more advanced treatment of using regular expressions |
|---|
| 21 |
in conjuction with template metaprogramming, see |
|---|
| 22 |
$(LINK2 templates-revisited.html, Templates Revisited). |
|---|
| 23 |
) |
|---|
| 24 |
|
|---|
| 25 |
$(P In Ruby a regular expression can be created |
|---|
| 26 |
as a special literal: |
|---|
| 27 |
) |
|---|
| 28 |
|
|---|
| 29 |
$(RUBY |
|---|
| 30 |
r = /pattern/ |
|---|
| 31 |
s = /p[1-5]\s*/ |
|---|
| 32 |
) |
|---|
| 33 |
|
|---|
| 34 |
$(P D doesn't have special literals for them, but they can |
|---|
| 35 |
be created:) |
|---|
| 36 |
|
|---|
| 37 |
--- |
|---|
| 38 |
r = RegExp("pattern"); |
|---|
| 39 |
s = RegExp(r"p[1-5]\s*"); |
|---|
| 40 |
--- |
|---|
| 41 |
|
|---|
| 42 |
$(P If the $(I pattern) contains backslash characters \, |
|---|
| 43 |
wysiwyg string literals are used, which have the $(SINGLEQUOTE r) prefix |
|---|
| 44 |
to the string. $(I r) and $(I s) are of type $(B RegExp), but |
|---|
| 45 |
we can use type inference to declare and assign them automatically: |
|---|
| 46 |
) |
|---|
| 47 |
|
|---|
| 48 |
--- |
|---|
| 49 |
auto r = RegExp("pattern"); |
|---|
| 50 |
auto s = RegExp(r"p[1-5]\s*"); |
|---|
| 51 |
--- |
|---|
| 52 |
|
|---|
| 53 |
$(P To check for a match of a string $(I s) with a regular expression |
|---|
| 54 |
in Ruby, use the =~ operator, which returns the index of the |
|---|
| 55 |
first match:) |
|---|
| 56 |
|
|---|
| 57 |
$(RUBY |
|---|
| 58 |
s = "abcabcabab" |
|---|
| 59 |
s =~ /b/ /* match, returns 1 */ |
|---|
| 60 |
s =~ /f/ /* no match, returns nil */ |
|---|
| 61 |
) |
|---|
| 62 |
|
|---|
| 63 |
$(P In D this looks like: |
|---|
| 64 |
) |
|---|
| 65 |
|
|---|
| 66 |
--- |
|---|
| 67 |
auto s = "abcabcabab"; |
|---|
| 68 |
std.regexp.find(s, "b"); /* match, returns 1 */ |
|---|
| 69 |
std.regexp.find(s, "f"); /* no match, returns -1 */ |
|---|
| 70 |
--- |
|---|
| 71 |
|
|---|
| 72 |
$(P Note the equivalence to std.string.find, which searches for |
|---|
| 73 |
substring matches rather than regular expression matches.) |
|---|
| 74 |
|
|---|
| 75 |
$(P The Ruby =~ operator sets some implicitly defined variables |
|---|
| 76 |
based on the result:) |
|---|
| 77 |
|
|---|
| 78 |
$(RUBY |
|---|
| 79 |
s = "abcdef" |
|---|
| 80 |
if s =~ /c/ |
|---|
| 81 |
"#{$`}[#{$&}]#{$'}" /* generates string ab[c]def |
|---|
| 82 |
) |
|---|
| 83 |
|
|---|
| 84 |
$(P The function std.regexp.search() returns a RegExp object |
|---|
| 85 |
describing the match, which can be exploited: |
|---|
| 86 |
) |
|---|
| 87 |
|
|---|
| 88 |
--- |
|---|
| 89 |
auto m = std.regexp.search("abcdef", "c"); |
|---|
| 90 |
if (m) |
|---|
| 91 |
writefln("%s[%s]%s", m.pre, m.match(0), m.post); |
|---|
| 92 |
--- |
|---|
| 93 |
|
|---|
| 94 |
$(P Or even more concisely as: |
|---|
| 95 |
) |
|---|
| 96 |
|
|---|
| 97 |
--- |
|---|
| 98 |
if (auto m = std.regexp.search("abcdef", "c")) |
|---|
| 99 |
writefln("%s[%s]%s", m.pre, m.match(0), m.post); // writes ab[c]def |
|---|
| 100 |
--- |
|---|
| 101 |
|
|---|
| 102 |
<h2>Search and Replace</h2> |
|---|
| 103 |
|
|---|
| 104 |
$(P Search and replace gets more interesting. To replace the |
|---|
| 105 |
occurrences of "a" with "ZZ" in Ruby; the first occurrence, then |
|---|
| 106 |
all: |
|---|
| 107 |
) |
|---|
| 108 |
|
|---|
| 109 |
$(RUBY |
|---|
| 110 |
s = "Strap a rocket engine on a chicken." |
|---|
| 111 |
s.sub(/a/, "ZZ") // result: StrZZp a rocket engine on a chicken. |
|---|
| 112 |
s.gsub(/a/, "ZZ") // result: StrZZp ZZ rocket engine on ZZ chicken. |
|---|
| 113 |
) |
|---|
| 114 |
|
|---|
| 115 |
$(P In D:) |
|---|
| 116 |
|
|---|
| 117 |
--- |
|---|
| 118 |
s = "Strap a rocket engine on a chicken."; |
|---|
| 119 |
sub(s, "a", "ZZ"); // result: StrZZp a rocket engine on a chicken. |
|---|
| 120 |
sub(s, "a", "ZZ", "g"); // result: StrZZp ZZ rocket engine on ZZ chicken. |
|---|
| 121 |
--- |
|---|
| 122 |
|
|---|
| 123 |
$(P The replacement string can reference the matches using |
|---|
| 124 |
the $&, $$, $', $`, $0 .. $99 notation:) |
|---|
| 125 |
|
|---|
| 126 |
--- |
|---|
| 127 |
sub(s, "[ar]", "[$&]", "g"); // result: St[r][a]p [a] [r]ocket engine on [a] chicken. |
|---|
| 128 |
--- |
|---|
| 129 |
|
|---|
| 130 |
$(P Or the replacement string can be provided by a delegate:) |
|---|
| 131 |
|
|---|
| 132 |
--- |
|---|
| 133 |
sub(s, "[ar]", |
|---|
| 134 |
(RegExp m) { return toupper(m.match(0)); }, |
|---|
| 135 |
"g"); // result: StRAp A Rocket engine on A chicken. |
|---|
| 136 |
--- |
|---|
| 137 |
|
|---|
| 138 |
($(TT toupper()) comes from $(LINK2 phobos/std_string.html, std.string).) |
|---|
| 139 |
|
|---|
| 140 |
<h2>Looping</h2> |
|---|
| 141 |
|
|---|
| 142 |
$(P It's possible to search over all matches within |
|---|
| 143 |
a string:) |
|---|
| 144 |
|
|---|
| 145 |
--- |
|---|
| 146 |
import std.stdio; |
|---|
| 147 |
import std.regexp; |
|---|
| 148 |
|
|---|
| 149 |
void main() |
|---|
| 150 |
{ |
|---|
| 151 |
foreach(m; RegExp("ab").search("abcabcabab")) |
|---|
| 152 |
{ |
|---|
| 153 |
writefln("%s[%s]%s", m.pre, m.match(0), m.post); |
|---|
| 154 |
} |
|---|
| 155 |
} |
|---|
| 156 |
// Prints: |
|---|
| 157 |
// [ab]cabcabab |
|---|
| 158 |
// abc[ab]cabab |
|---|
| 159 |
// abcabc[ab]ab |
|---|
| 160 |
// abcabcab[ab] |
|---|
| 161 |
--- |
|---|
| 162 |
|
|---|
| 163 |
<h2>Conclusion</h2> |
|---|
| 164 |
|
|---|
| 165 |
$(P D regular expression handling is as powerful as Ruby's. But |
|---|
| 166 |
its syntax isn't as concise:) |
|---|
| 167 |
|
|---|
| 168 |
$(UL |
|---|
| 169 |
|
|---|
| 170 |
$(LI Regular expression literal syntax - doing so would |
|---|
| 171 |
make it impossible to perform lexical analysis without also |
|---|
| 172 |
doing syntactic or semantic analysis.) |
|---|
| 173 |
|
|---|
| 174 |
$(LI Implicit naming of match variables - this causes problems |
|---|
| 175 |
with name collisions, and just doesn't |
|---|
| 176 |
fit with the rest of the way D works.) |
|---|
| 177 |
|
|---|
| 178 |
) |
|---|
| 179 |
|
|---|
| 180 |
$(P But it is just as powerful. |
|---|
| 181 |
) |
|---|
| 182 |
) |
|---|
| 183 |
Macros: |
|---|
| 184 |
TITLE=Regular Expressions |
|---|
| 185 |
WIKI=RegularExpression |
|---|
| 186 |
RUBY=$(CCODE $0) |
|---|