Ticket #10 (closed defect: fixed)

Opened 3 years ago

Last modified 3 years ago

performance issue in CoreParser.attributeNormalize ?

Reported by: Numpsy Assigned to: Michael Rynn
Priority: major Milestone:
Component: component1 Version:
Keywords: Cc:

Description

Hi,

I've been investigating the performance of the DocumentBuilder? when loading a large (18 megabyte) file into a DOM, and i think there is an issue with CoreParser?.attributeNormalize.

In my test, that function gets called 350000 times and each one creates a new ArrayBuffer? instance (and allocates a chunk of memory).

Should it perhaps be reusing a buffer like some of the other similar functions do?

Change History

02/15/12 18:07:17 changed by Michael Rynn

  • owner changed from somebody to Michael Rynn.
  • status changed from new to assigned.

The problem is that its potentially recursive with entityData, at least in XmlDtdParser?. I have eyeballed it previously and a made a mental note, to try and handle the easy case first without extra buffer allocation and new context stack allocation.

02/15/12 18:09:21 changed by Michael Rynn

  • status changed from assigned to new.

02/16/12 08:38:14 changed by Michael Rynn

  • status changed from new to closed.
  • resolution set to fixed.

The latest release is from https://launchpad.net/d2-xml. release 55.

Thank you for that very helpful report.

I have optimized attributeNormalize on both parsers. It seems to have improved performance a few percent. Reusing the same buffers seems a good idea, it may improve CPU memory cache performance. This was already done partially so on the sliceparse module ,but not on the xmlparse module.

I don't know what the effects on performance will be on a 18MB file. It would be nice to have a URL to that sort of thing, for testing. I've noted bad performance results when making a change to std.xml on recursive parsing, comparing the performance of using strictly original read slice, (string*), compared to local stack alias object.

02/16/12 11:31:27 changed by Numpsy

f.y.i, i mentioned a few observations about performance @ http://forum.dlang.org/thread/jgc0im$1ehh$1@digitalmars.com

The file i'm testing with is part of a performance test from a work project so i can't post it, but i can see about generating something similar (it's actually a pretty straight forward list of things, which happens to contain several hundred thousand nodes).