Download Reference Manual
The Developer's Library for D
About Wiki Forums Source Search Contact

Non-greedy Regex Bug?

Moderators: kris

Posted: 04/19/08 11:05:06

I'm using version 0.99.5 and have found some unexpected behaiviour with the following code.

import tango.io.Stdout;
import tango.text.Regex;

char[] str="Hello World src=\"www.helloworld.com\" mmmm src=\"www.worldhello.html\"
something href=\"http://192.168.1.1\"
parabara papa paraba, parabara papa rapa, parabara papa paraba paraba paraba paraba parabarabara paparapa
href=\"http://regex.testing/code/\" finish";

int i;
void main(){
    Stdout.formatln("{}\n\n",str);

    i=1;
    Stdout.formatln("Non-grouped\n-----------");
    foreach(m;Regex("(href|src)=\".*?\"").search(str))
        Stdout.formatln("{}:  {}",i++,m[0]);

    i=1;
    Stdout.formatln("\nGrouped\n-------");
    foreach(m;Regex("(href|src)=\"(.*?)\"").search(str))
        Stdout.formatln("{}:  {}",i++,m[0]);

    i=1;
    Stdout.formatln("\nGrouped again\n-------------");
    foreach(m;Regex("(href|src)=\"(.*)?\"").search(str))
        Stdout.formatln("{}:  {}",i++,m[0]);
}

The output I get is

Hello World src="www.helloworld.com" mmmm src="www.worldhello.html"
something href="http://192.168.1.1"
parabara papa paraba, parabara papa rapa, parabara papa paraba paraba paraba paraba parabarabara paparapa
href="http://regex.testing/code/" finish


Non-grouped
-----------
1:  src="www.helloworld.com"
2:  src="www.worldhello.html"
3:  href="http://192.168.1.1"
4:  href="http://regex.testing/code/"

Grouped
-------
1:  src="www.helloworld.com" mmmm src="www.worldhello.html"
something href="http://192.168.1.1"
parabara papa paraba, parabara papa rapa, parabara papa paraba paraba paraba paraba parabarabara paparapa
href="http://regex.testing/code/"

Grouped again
-------------
1:  src="www.helloworld.com" mmmm src="www.worldhello.html"
something href="http://192.168.1.1"
parabara papa paraba, parabara papa rapa, parabara papa paraba paraba paraba paraba parabarabara paparapa
href="http://regex.testing/code/"

What i was expecting was three identical answers like the first, however, when using brackets the non-greedy request seems to be ignored. Am I right in thinking this is a bug or am I making a mistake?

Author Message

Posted: 04/19/08 11:31:37

Could you try the Regex version from latest SVN and see if it has the same behaviour?

Posted: 04/19/08 12:46:03

Tried it and got exactly the same result.

Posted: 04/19/08 13:59:48

you're right that it's a bug, ticket #1061