Download Reference Manual
The Developer's Library for D
About Wiki Forums Source Search Contact

RegexIterator inclusive?

Moderators: kris

Posted: 07/08/07 15:21:40

Hi, I tried using RegexIterator? but it didn't function as I expected*. It's quite possible I did something wrong or failed to understand it's usage (new here), but it seems as if RegexIterator? returns whatever doesn't match instead of returning the matches:

auto buf = new Buffer (256);
auto foo = "to write some D in tango";
buf.append(foo);

foreach(match; new RegexIterator(buf, "some"))
    Stdout(match).newline;

output: "to write D in tango"

Another thing I found is that replacing "some" with "tango" causes an assertion failure in tango.io.Buffer at line 842 (code: "assert (position_ <= limit_);")

* From the doc: Inclusive tokens are just the opposite: they look for patterns in the text that should be part of the token itself - everything else is considered foreign. Currently the only inclusive token type is exposed by RegexToken?;

Author Message

Posted: 07/08/07 20:28:30

Lutger wrote:

Hi, I tried using RegexIterator? but it didn't function as I expected*. It's quite possible I did something wrong or failed to understand it's usage (new here), but it seems as if RegexIterator? returns whatever doesn't match instead of returning the matches:

auto buf = new Buffer (256);
auto foo = "to write some D in tango";
buf.append(foo);

foreach(match; new RegexIterator(buf, "some"))
    Stdout(match).newline;

output: "to write D in tango"

Another thing I found is that replacing "some" with "tango" causes an assertion failure in tango.io.Buffer at line 842 (code: "assert (position_ <= limit_);")

* From the doc: Inclusive tokens are just the opposite: they look for patterns in the text that should be part of the token itself - everything else is considered foreign. Currently the only inclusive token type is exposed by RegexToken?;

Thanks for bringing this up, Lutger

The assert() has now been resolved. All of the stream iterators should actually be of the 'exclusive' variety (so the doc is wrong), where tokens are comprised of the text between located patterns:

auto buf = new Buffer ("to write some D in tango");

foreach(match; new RegexIterator(buf, "some"))
    Stdout(match).newline;

... should emit "to write " & " D in tango", just as though one were splitting the text on \n boundaries, or other delimiter boundaries. The converse operation (locating a pattern within text) should likely be handled by using tango.text.Regex directly rather than streaming the content, since returning the pattern from a stream has no context regarding the prior text.

If streaming is not a requirement (e.g. you have all the text in an array), then the tango.text.Util functions may be more appropriate, along with direct usage of tango.text.Regex -- streaming is slower, but is useful where overlapped processing is involved. For general purposes, it is usually simpler and faster to operate directly upon a pre-loaded array of content instead.

Hope this helps, and thanks for reporting the bug

Posted: 07/08/07 22:03:11

Yes that's helpful indeed, tango.text.Regex will do fine. The snippet does output the string as two tokens and not as one btw, that was my mistake.

As for the docs, the error is here: http://www.dsource.org/projects/tango/docs/current/tango.text.stream.StreamIterator.html

Posted: 07/10/07 00:31:30

Lutger wrote:

Ah, thanks!