Download Reference Manual
The Developer's Library for D
About Wiki Forums Source Search Contact

Using XmlPullParser with large files.

Moderators: larsivi kris

Posted: 04/02/08 20:53:35

Hi,

How can I use the puller without loading the whole file into memory a small example would be fine.

There was a mail in d's news site that suggested the following for Dom.

auto fc = new FileConduit? (args[1]); auto buf = new MappedBuffer?(fc); auto doc = new Document!(char); doc.parse(buf.getContent());

and this is the error I get. test.d(27): class tango.io.MappedBuffer?.MappedBuffer? member getContent is not accessible

Zz

Author Message

Posted: 04/04/08 03:39:23

try using buf.slice; instead?

Posted: 01/23/09 17:44:51

Hi,

buf.slice does not work this way. Here is an example how I tried to use slice with the SAX Parser. It produces a EOF while reading the XML content (of course it does!).

scope sh = new SaxHandler?!(char); scope sp = new SaxParser?!(char);

sp.setSaxHandler( sh );

auto fc = new FileConduit?("articles.xml"); auto buf = new MappedBuffer?(fc);

char[] content;

do {

content = cast(char[]) buf.slice(40);

sp.setContent(content); //This will initialize parser. sp.parse(); // Segfault? no ty.

} while ( content);

I don't know if there is any way to work on larger xml files that do not fit into the buffer. The DOM/SAX/Pullparser expects to work on a xml document at once and can't read them in chunks as done by slice.

Any ideas on how to solve this issue?

There is a parse(InputStream? input) function in the SAX parser but obviously there is nothing made with the InputStream?. Could it be that the implementation is incomplete at this point?

Posted: 01/23/09 18:23:54

sorry, I misunderstood the first time around.

The xml package in Tango is built explicitly for operating with an array of content, rather than a stream. The upshot is that the entire document must be addressable at once. One effective way to do this (for huge documents) is to use memory-mapped files, which is currently supported in Tango through using tango.io.device.FileMap?