Download Reference Manual
The Developer's Library for D
About Wiki Forums Source Search Contact

text.xml.PullParser.d:next() - Segmentation fault

Moderators: kris

Posted: 03/28/09 11:17:56

Hi there,

i have an application using xml quite often, so i made some tests for the xml parser. It seems that sometimes (really just sometimes; for ~0.01% of the packets) the parser crashes seriously with a Segmentation fault. Debugging this issue points out the PullParser.next-method. It seems, that we could in some cases get an "auto p = text.point;" which is behind our xml stream. That means i'm missing a condition "p >= text.end" before we do anything with p. In my dirty solution I just added a Whitespace at the end of my XML-Packets. This works . Could somebody analyse the next method? Could we fix it?

Program terminated with signal 11, Segmentation fault.
#0  0x08081a79 in _D5tango4text3xml10PullParser18__T10PullParserTaZ10PullParser4nextMFZE5tango4text3xml10PullParser12XmlTokenType (
    this=@0xb3a14300) at ../dev_env/tango/gdc/include/tango/text/xml/PullParser.d:91
91                      if (*p <= 32)
(gdb)
Current language:  auto; currently minimal
(gdb) bt
#0  0x08081a79 in _D5tango4text3xml10PullParser18__T10PullParserTaZ10PullParser4nextMFZE5tango4text3xml10PullParser12XmlTokenType (
    this=@0xb3a14300) at ../dev_env/tango/gdc/include/tango/text/xml/PullParser.d:91
#1  0x0808120d in _D5tango4text3xml9SaxParser16__T9SaxParserTaZ9SaxParser7doParseMFZv (this=@0xb439f340)
    at ../dev_env/tango/gdc/include/tango/text/xml/SaxParser.d:1303
#2  0x080810b9 in _D5tango4text3xml9SaxParser16__T9SaxParserTaZ9SaxParser5parseMFZv (this=@0xb439f340)
    at ../dev_env/tango/gdc/include/tango/text/xml/SaxParser.d:1227
#3  0x080ccd9d in _D6common9xmlparser3src20GenericElementParser20GenericElementParser5parseMFAaZC6common9xmlparser3src14GenericElement14GenericElement (this=@0xb7a27e30, packet=
      {length = 256, ptr = 0xb78daf00 "<PACKET><PACKET version=\"1.0\" send...})
    at ../common/xmlparser/src/GenericElementParser.d:86

best regards

Stefan Rohe

Author Message

Posted: 03/28/09 11:22:36

Could you create a bug report? If your input is valid, there should definately not be a crash. Please provide us with an example of how you use the parser.

Posted: 03/28/09 13:07:43

i would like to give u just a Mock of the Error, but when i just execute this unittest, everything works well. When i execute in combination with other unittests it fails. But the other unittests are not dirty. Setup and teardown methods are implemented in a clean manner. I'm using still an old tango_0.99.6, but i patched PullParser? and SaxParser? to Revision 4065, to get rid of the Unexpected EOF Exception.

My testcase is. Parse a packet. The same packet. 1000 times. My packet is valid xml. Also nothing special. It could be parsed several hundred times, but then it crashes. Before i let the parser parse it i everytime make a defensive dup-copy of the packet, that i'm sure the parser gets a fresh packet each time.

This problem also just now occurs. Nothing changed on the XMLParser just implemented other stuff, but our executable seemed to reach the critical size. Now the buffer of the XML Parser is at a bad location. *p+1 will let the PullParser? access a forbidden memory area. Compiling just the Unittest for XML-Parser works well, then the Stack looks different and everything works well.

BTW: Is this enough to create a ticket?

best regards

Stefan Rohe

Posted: 03/28/09 13:30:26

Even revision 4065 is a bit old, do you think you could update to 0.99.8 (or even latest trunk)?

Posted: 03/28/09 14:07:40

the next() method in 4065 and in trunk of PullParser? haven't changed. If you look at the next method, wouldn't you think that if shouldn't be look like the following?

  auto p = text.point;
  if (p >= text.end) return endOfInput;   // <-- this line was added
  if (*p <= 32)
  {

ok, this costs speed. Have to execute a few opcodes more, everytime calling next, but it makes it more safe in usage. we shouldn't forget, that the next method is public, so everybody everytime could call it. We do not use pure arrays within the PullParser?, so the compiler does not do the Range check for us. The next step would be to check how p could jump out of the box.

best regards

Stefan Rohe

Posted: 03/28/09 17:08:24 -- Modified: 03/28/09 17:09:32 by
kris

The xml package was explicitly constructed to operate on complete xml documents, rather than document fragments. That includes the pull parser also. Incremental parsing of packets is certainly useful, but is not fully supported at this time. This means the minimum overhead for using the parser is actually the size of the document itself (memory-mapped files are an excellent choice to represent huge xml documents).

Part of the reasoning behind this relates to how text references are constructed. For example: the xml package will never duplicate any content (names, cdata, values, etc). Instead it simply slices portions of the input, making it considerably more efficient. We take advantage of that knowledge where it seems reasonable to do so, though this appears to have led to the situation you're experiencing.

Will see what can be done to resolve the conflict. In the meantime, a workaround would be to obtain the whole document before initiating parsing. Sorry for the inconvenience.

Posted: 03/28/09 17:37:01

i obtained the whole document before. The document is a char[] of around 1000 Bytes. Quite small. I just dup it, to be sure that I do not modify the document somewhere else. If everything is well (no segmentation faults or exceptions anymore) I'll remove the dup again to improve speed and reduce memory allocation.

> We take advantage of that knowledge where it seems reasonable to do so, though this appears to have led to the situation you're experiencing.

but, but, i dup everything before and do not make any modification of the parsed result. (it's just a unittest!). i call something like this:

for(int i=0; i<1000; i++)
{ 
  // this throws sometimes segmentation faults
  if (parse(char[] document.dup) != expectedResult)
  {
    cryout;
  } 
}

I patched my parsing routines like the following.

for(int i=0; i<1000; i++)
{ 
  // this until now never run out of the range and so never until now threw a segmentation fault
  if (parse(char[] (document.dup~" ") != expectedResult)
  {
    cryout;
  } 
}

> Will see what can be done to resolve the conflict.

the patch one post before doesn't look applicable?

Stefan Rohe

Posted: 03/28/09 18:23:34

oh, I had the impression you were streaming the document instead. Can you attach the failing document to a ticket, please?

Posted: 03/28/09 19:02:46

I created a ticket and assigned it to you. Feel free to ask if you think the decription there isn't sufficient. You probably will not be able to reproduce this error, because also here it occurs very seldom. I cannot isolate the error into a few d instructions, because then the stack looks different from my stack and everything will be fine.
the Ticket:
http://www.dsource.org/projects/tango/ticket/1560

best regards

Stefan Rohe

Posted: 03/28/09 22:04:55

Thanks Stefan. Hopefully it is now resolved

Posted: 03/29/09 17:53:35

Now I get object.Exception: Access Violation with code that used to work with tango0.99.8.

I reopened the ticket.

Posted: 03/29/09 20:11:31 -- Modified: 03/29/09 20:11:51 by
kris

Fixed. Thanks for the test-case, HeiHon

Posted: 03/30/09 11:24:10

Thanks for the quick fix :-)

The current tango-r4472 wouldn't "build-tango.bat" on my box. See #1567 with possible fix.