XML Package Design
This page will contain an evolving proposal for a coherent XML design within Tango.
On the highest level, there are two ways to process XML, as a stream or as a complete document/file. Since these are useful in different situations, and may have different performance characteristics based on the situation at hand, Tango should have an interface for both.
- Interfaces should be easy to use
- Interfaces should leverage the power of D
- Interfaces should follow the same patterns as elsewhere in Tango
- Interfaces should accommodate for usage with XPath.
Historically, DOM has been used for the complete document situation, and SAX for the streaming variation. Both of these are cumbersome to implement, and also to use. Also, both can be implemented on top of another framework, and thus we should focus on getting the groundwork well done first. Another point here, is that DOM showed up prior to XML being fully standardized, and thus isn't necessarily what one would use after some experience with XML.
VTD-XML is a package that use binary indexing to parse XML and provide the content. The exact method there seems to be patented somehow, but the idea of indexing shouldn't be patentable. This package is for full document processing.
StAX is also a pull parsing library/specification, especially directed towards streaming.
VTD claims huge gains in speed when compared to Xerces (SAX?), but I've seen no benchmarks towards the other libraries.
A nifty in-language grammar can be found here - maybe we could pull off something similar to this?
A blog entry from a guy trying to pull off something that may be close to what we want to do - here
It is obvious that an iterator pattern should be used (and with XML this involves iterating over one (or several) of tokens, elements, tags and more). Further on, the user must be able to extract information from the current cursor position, move to other positions (if the user knows enough about the document), edit the document (this is not possible with the API on xmlpull.org apparently).
An API suggestion should follow below ...
Can be done here, or in the relevant post in the forum. The design can be changed as suggestions comes (first stage) and eventually are agreed on (second stage).
Delta between Tango's Document and W3C DOM APIs
This chart maps the API calls between Tango and DOM
|How do I...||Tango||W3C DOM||Java DOM|
|parse an xml document?||auto doc = new Document!(char); doc.parse (content);||N/A (differs between languages||DocumentBuilderFactory?. newInstance(). newDocumentBuilder(). parse(content);|
|start an xpath style query?||doc.query||document.createExpression()||XPathFactory. newInstance(). newXPath();|
|create a new document?||auto doc = new Document!(char)||document.getImplementation().createDocument()||DocumentBuilderFactory?. newInstance(). newDocumentBuilder(). newDocument();|
|add an xml prolog to a new document?||doc.header||document.appendChild( document.createProcessingInstruction( "target", "instruction"));||same as DOM, but usually done at serialization time|
|add a new element to the doc?||doc.element("foo");||elem = doc.createElement("foo"); doc.appendChild(elem);||elem = doc.createElement("foo"); doc.appendChild(elem);|
|add an attribute?||elem.attribute(prefix, localName, value);||elem.setAttributeNS(uri, name, value);||elem.setAttributeNS(uri, name, value);|