tango.text.xml.Document

License:

Version:

Initial release: February 2008

Authors:

Aaron, Kris

class Document(T) : package PullParser!(T) ¶#

Implements a DOM atop the XML parser, supporting document parsing, tree traversal and ad-hoc tree manipulation.

The DOM API is non-conformant, yet simple and functional in style - locate a tree node of interest and operate upon or around it. In all cases you will need a document instance to begin, whereupon it may be populated either by parsing an existing document or via API manipulation.

This particular DOM employs a simple free-list to allocate each of the tree nodes, making it quite efficient at parsing XML documents. The tradeoff with such a scheme is that copying nodes from one document to another requires a little more care than otherwise. We felt this was a reasonable tradeoff, given the throughput gains vs the relative infrequency of grafting operations. For grafting within or across documents, please use the move() and copy() methods.

Another simplification is related to entity transcoding. This is not performed internally, and becomes the responsibility of the client. That is, the client should perform appropriate entity transcoding as necessary. Paying the (high) transcoding cost for all documents doesn't seem appropriate.

Parse example

auto doc = new Document!(char);
doc.parse (content);

auto print = new DocPrinter!(char);
Stdout(print(doc)).newline;

API example

auto doc = new Document!(char);

// attach an xml header
doc.header;

// attach an element with some attributes, plus 
// a child element with an attached data value
doc.tree.element   (null, "element")
        .attribute (null, "attrib1", "value")
        .attribute (null, "attrib2")
        .element   (null, "child", "value");

auto print = new DocPrinter!(char);
Stdout(print(doc)).newline;

Note that the document tree() includes all nodes in the tree, and not just elements. Use doc.elements to address the topmost element instead. For example, adding an interior sibling to the prior illustration

1	doc.elements.element (null, "sibling");

Printing the name of the topmost (root) element:

1	Stdout.formatln ("first element is '{}'", doc.elements.name);

XPath examples:

auto doc = new Document!(char);

// attach an element with some attributes, plus 
// a child element with an attached data value
doc.tree.element   (null, "element")
        .attribute (null, "attrib1", "value")
        .attribute (null, "attrib2")
        .element   (null, "child", "value");

// select named-elements
auto set = doc.query["element"]["child"];

// select all attributes named "attrib1"
set = doc.query.descendant.attribute("attrib1");

// select elements with one parent and a matching text value
set = doc.query[].filter((doc.Node n) {return n.children.hasData("value");});

Note that path queries are temporal - they do not retain content across mulitple queries. That is, the lifetime of a query result is limited unless you explicitly copy it. For example, this will fail

1 2	auto elements = doc.query["element"]; auto children = elements["child"];

The above will lose elements because the associated document reuses node space for subsequent queries. In order to retain results, do this

1 2	auto elements = doc.query["element"].dup; auto children = elements["child"];

The above .dup is generally very small (a set of pointers only). On the other hand, recursive queries are fully supported

1	set = doc.query[].filter((doc.Node n) {return n.query[].count > 1;});

Typical usage tends to follow the following pattern, Where each query result is processed before another is initiated

foreach (node; doc.query.child("element"))
        {
        // do something with each node
        }

Note that the parser is templated for char, wchar or dchar.

this(uint nodes = 1000) ¶#

Construct a DOM instance. The optional parameter indicates the initial number of nodes assigned to the freelist

XmlPath!(T).NodeSet query() [final] ¶#

Return an xpath handle to query this document. This starts at the document root.

See also Node.query

Node tree() [final] ¶#

Return the root document node, from which all other nodes are descended.

Returns null where there are no nodes in the document

Node elements() [final] ¶#

Return the topmost element node, which is generally the root of the element tree.

Returns null where there are no top-level element nodes

Document reset() [final] ¶#

Reset the freelist. Subsequent allocation of document nodes will overwrite prior instances.

Document header(T[] encoding = null) [final] ¶#

Prepend an XML header to the document tree

void parse(T[] xml) [final] ¶#

Parse the given xml content, which will reuse any existing node within this document. The resultant tree is retrieved via the document 'tree' attribute

Node allocate() [private, final] ¶#

allocate a node from the freelist

void newlist() [private, final] ¶#

allocate a node from the freelist

struct Visitor [private] ¶#

foreach support for visiting and selecting nodes. A fruct is a low-overhead mechanism for capturing context relating to an opApply, and we use it here to sweep nodes when testing for various relationships.

See Node.attributes and Node.children

bool exist() ¶#: Is there anything to visit here?
Time complexity: O(1)
int opApply(int delegate(ref Node) dg) ¶#: traverse sibling nodes
Node name(T[] prefix, T[] local, bool delegate(Node) dg = null) ¶#: Locate a node with a matching name and/or prefix, and which passes an optional filter. Each of the arguments will be ignored where they are null.
Time complexity: O(n)
bool hasName(T[] prefix, T[] local) ¶#: Scan nodes for a matching name and/or prefix. Each of the arguments will be ignored where they are null.
Time complexity: O(n)
Node value(T[] match) ¶#: Sweep nodes looking for a match, and returns either a node or null. See value(x,y,z) or name(x,y,z) for additional filtering.
Time complexity: O(n)
bool hasValue(T[] match) ¶#: Sweep the nodes looking for a value match. Returns true if found. See value(x,y,z) or name(x,y,z) for additional filtering.
Time complexity: O(n)

struct NodeImpl [private] ¶#

The node implementation

void* user [public] ¶#: open for usage
Document document() ¶#: Return the hosting document
XmlNodeType type() ¶#: Return the node type-id
Node parent() ¶#: Return the parent, which may be null
Node child() ¶#: Return the first child, which may be null
Node childTail() [deprecated] ¶#: Return the last child, which may be null

Deprecated:
exposes too much implementation detail. Please file a ticket if you really need this functionality
Node prev() ¶#: Return the prior sibling, which may be null
Node next() ¶#: Return the next sibling, which may be null
T[] prefix() ¶#: Return the namespace prefix of this node (may be null)
Node prefix(T[] replace) ¶#: Set the namespace prefix of this node (may be null)
T[] name() ¶#: Return the vanilla node name (sans prefix)
Node name(T[] replace) ¶#: Set the vanilla node name (sans prefix)
T[] value() ¶#: Return the data content, which may be null
void value(T[] val) ¶#: Set the raw data content, which may be null
T[] toString(T[] output = null) ¶#: Return the full node name, which is a combination of the prefix & local names. Nodes without a prefix will return local-name only
uint position() ¶#: Return the index of this node, or how many prior siblings it has.
Time complexity: O(n)
Node detach() ¶#: Detach this node from its parent and siblings
XmlPath!(T).NodeSet query() [final] ¶#: Return an xpath handle to query this node
See also Document.query
Visitor children() ¶#: Return a foreach iterator for node children
Visitor attributes() ¶#: Return a foreach iterator for node attributes
bool hasAttributes() [deprecated] ¶#: Returns whether there are attributes present or not

Deprecated:
use node.attributes.exist instead
bool hasChildren() [deprecated] ¶#: Returns whether there are children present or nor

Deprecated:
use node.child or node.children.exist instead
Node copy(Node tree) ¶#: Duplicate the given sub-tree into place as a child of this node. Returns a reference to the subtree
Node move(Node tree) ¶#: Relocate the given sub-tree into place as a child of this node. Returns a reference to the subtree
Node element(T[] prefix, T[] local, T[] value = null) ¶#: Appends a new (child) Element and returns a reference to it.
Node attribute(T[] prefix, T[] local, T[] value = null) ¶#: Attaches an Attribute and returns this, the host
Node data(T[] data) ¶#: Attaches a Data node and returns this, the host
Node cdata(T[] cdata) ¶#: Attaches a CData node and returns this, the host
Node comment(T[] comment) ¶#: Attaches a Comment node and returns this, the host
Node doctype(T[] doctype) ¶#: Attaches a Doctype node and returns this, the host
Node pi(T[] pi) ¶#: Attaches a PI node and returns this, the host
Node element_(T[] prefix, T[] local, T[] value = null) [private] ¶#: Attaches a child Element, and returns a reference to the child
Node attribute_(T[] prefix, T[] local, T[] value = null) [private] ¶#: Attaches an Attribute, and returns the host
Node data_(T[] data) [private] ¶#: Attaches a Data node, and returns the host
Node cdata_(T[] cdata) [private] ¶#: Attaches a CData node, and returns the host
Node comment_(T[] comment) [private] ¶#: Attaches a Comment node, and returns the host
Node pi_(T[] pi, T[] patch) [private] ¶#: Attaches a PI node, and returns the host
Node doctype_(T[] doctype) [private] ¶#: Attaches a Doctype node, and returns the host
void attrib(Node node) [private] ¶#: Append an attribute to this node, The given attribute cannot have an existing parent.
void append(Node node) [private] ¶#: Append a node to this one. The given node cannot have an existing parent.
void prepend(Node node) [private] ¶#: Prepend a node to this one. The given node cannot have an existing parent.
Node set(T[] prefix, T[] local) [private] ¶#: Configure node values
Node create(XmlNodeType type, T[] value) [private] ¶#: Creates and returns a child Element node
Node remove() [private] ¶#: Detach this node from its parent and siblings
Node patch(T[] text) [private] ¶#: Patch the serialization text, causing DocPrinter to ignore the subtree of this node, and instead emit the provided text as raw XML output.

Warning:
this function does *not* copy the provided text, and may be removed from future revisions
Node mutate() [private] ¶#: purge serialization cache for this node and its ancestors
Node dup() [private] ¶#: Duplicate a single node
Node clone() [private] ¶#: Duplicate a subtree
void migrate(Document host) [private] ¶#: Reset the document host for this subtree

class XmlPath(T) ¶#

XPath support

Provides support for common XPath axis and filtering functions, via a native-D interface instead of typical interpreted notation.

The general idea here is to generate a NodeSet consisting of those tree-nodes which satisfy a filtering function. The direction, or axis, of tree traversal is governed by one of several predefined operations. All methods facilitiate call-chaining, where each step returns a new NodeSet instance to be operated upon.

The set of nodes themselves are collected in a freelist, avoiding heap-activity and making good use of D array-slicing facilities.

XPath examples

auto doc = new Document!(char);

// attach an element with some attributes, plus 
// a child element with an attached data value
doc.tree.element   (null, "element")
        .attribute (null, "attrib1", "value")
        .attribute (null, "attrib2")
        .element   (null, "child", "value");

// select named-elements
auto set = doc.query["element"]["child"];

// select all attributes named "attrib1"
set = doc.query.descendant.attribute("attrib1");

// select elements with one parent and a matching text value
set = doc.query[].filter((doc.Node n) {return n.children.hasData("value");});

Note that path queries are temporal - they do not retain content across mulitple queries. That is, the lifetime of a query result is limited unless you explicitly copy it. For example, this will fail to operate as one might expect

1 2	auto elements = doc.query["element"]; auto children = elements["child"];

The above will lose elements, because the associated document reuses node space for subsequent queries. In order to retain results, do this

1 2	auto elements = doc.query["element"].dup; auto children = elements["child"];

The above .dup is generally very small (a set of pointers only). On the other hand, recursive queries are fully supported

1	set = doc.query[].filter((doc.Node n) {return n.query[].count > 1;});

Typical usage tends to exhibit the following pattern, Where each query result is processed before another is initiated

foreach (node; doc.query.child("element"))
        {
        // do something with each node
        }

Supported axis include:

.child                  immediate children
.parent                 immediate parent 
.next                   following siblings
.prev                   prior siblings
.ancestor               all parents
.descendant             all descendants
.data                   text children
.cdata                  cdata children
.attribute              attribute children

Each of the above accept an optional string, which is used in an axis-specific way to filter nodes. For instance, a .child("food") will filter child elements. These variants are shortcuts to using a filter to post-process a result. Each of the above also have variants which accept a delegate instead.

In general, you traverse an axis and operate upon the results. The operation applied may be another axis traversal, or a filtering step. All steps can be, and generally should be chained together. Filters are implemented via a delegate mechanism

1	.filter (bool delegate(Node))

Where the delegate returns true if the node passes the filter. An example might be selecting all nodes with a specific attribute

1
2
3

auto set = doc.query.descendant.filter (
            (doc.Node n){return n.attributes.hasName (null, "test");}
           );

Obviously this is not as clean and tidy as true XPath notation, but that can be wrapped atop this API instead. The benefit here is one of raw throughput - important for some applications.

Note that every operation returns a discrete result. Methods first() and last() also return a set of one or zero elements. Some language specific extensions are provided for too

.child() can be substituted with [] notation instead

[] notation can be used to index a specific element, like .nth()

the .nodes attribute exposes an underlying Node[], which may be
         sliced or traversed in the usual D manner

Other (query result) utility methods include

.dup
.first
.last
.opIndex
.nth
.count
.opApply

XmlPath itself needs to be a class in order to avoid forward-ref issues.

alias Document!(T) Doc [public] ¶#

the typed document

alias Doc.Node Node [public] ¶#

generic document node

NodeSet start(Node root) [final] ¶#

Prime a query

Returns a NodeSet containing just the given node, which can then be used to cascade results into subsequent NodeSet instances.

struct NodeSet ¶#

This is the meat of XPath support. All of the NodeSet operators exist here, in order to enable call-chaining.

Note that some of the axis do double-duty as a filter also. This is just a convenience factor, and doesn't change the underlying mechanisms.

Node[] nodes [public] ¶#: array of selected nodes
NodeSet dup() ¶#: Return a duplicate NodeSet
uint count() ¶#: Return the number of selected nodes in the set
NodeSet first() ¶#: Return a set containing just the first node of the current set
NodeSet last() ¶#: Return a set containing just the last node of the current set
NodeSet opIndex(uint i) ¶#: Return a set containing just the nth node of the current set
NodeSet nth(uint index) ¶#: Return a set containing just the nth node of the current set
NodeSet opSlice() ¶#: Return a set containing all child elements of the nodes within this set
NodeSet opIndex(T[] name) ¶#: Return a set containing all child elements of the nodes within this set, which match the given name
NodeSet parent(T[] name = null) ¶#: Return a set containing all parent elements of the nodes within this set, which match the optional name
NodeSet data(T[] value = null) ¶#: Return a set containing all data nodes of the nodes within this set, which match the optional value
NodeSet cdata(T[] value = null) ¶#: Return a set containing all cdata nodes of the nodes within this set, which match the optional value
NodeSet attribute(T[] name = null) ¶#: Return a set containing all attributes of the nodes within this set, which match the optional name
NodeSet descendant(T[] name = null) ¶#: Return a set containing all descendant elements of the nodes within this set, which match the given name
NodeSet child(T[] name = null) ¶#: Return a set containing all child elements of the nodes within this set, which match the optional name
NodeSet ancestor(T[] name = null) ¶#: Return a set containing all ancestor elements of the nodes within this set, which match the optional name
NodeSet prev(T[] name = null) ¶#: Return a set containing all prior sibling elements of the nodes within this set, which match the optional name
NodeSet next(T[] name = null) ¶#: Return a set containing all subsequent sibling elements of the nodes within this set, which match the optional name
NodeSet filter(bool delegate(Node) filter) ¶#: Return a set containing all nodes within this set which pass the filtering test
NodeSet child(bool delegate(Node) filter, XmlNodeType type = XmlNodeType.Element) ¶#: Return a set containing all child nodes of the nodes within this set which pass the filtering test
NodeSet attribute(bool delegate(Node) filter) ¶#: Return a set containing all attribute nodes of the nodes within this set which pass the given filtering test
NodeSet descendant(bool delegate(Node) filter, XmlNodeType type = XmlNodeType.Element) ¶#: Return a set containing all descendant nodes of the nodes within this set, which pass the given filtering test
NodeSet parent(bool delegate(Node) filter) ¶#: Return a set containing all parent nodes of the nodes within this set which pass the given filtering test
NodeSet ancestor(bool delegate(Node) filter) ¶#: Return a set containing all ancestor nodes of the nodes within this set, which pass the given filtering test
NodeSet next(bool delegate(Node) filter, XmlNodeType type = XmlNodeType.Element) ¶#: Return a set containing all following siblings of the ones within this set, which pass the given filtering test
NodeSet prev(bool delegate(Node) filter, XmlNodeType type = XmlNodeType.Element) ¶#: Return a set containing all prior sibling nodes of the ones within this set, which pass the given filtering test
int opApply(int delegate(ref Node) dg) ¶#: Traverse the nodes of this set
bool always(Node node) [private] ¶#: Common predicate
NodeSet assign(uint mark) [private] ¶#: Assign a slice of the freelist to this NodeSet
void test(bool delegate(Node) filter, Node node) [private] ¶#: Execute a filter on the given node. We have to deal with potential query recusion, so we set all kinda crap to recover from that
bool has(Node p) [private] ¶#: We typically need to filter ancestors in order to avoid duplicates, so this is used for those purposes

uint mark() [private] ¶#

Return the current freelist index

uint push() [private] ¶#

Recurse and save the current state

void pop(uint prior) [private] ¶#

Restore prior state

Node[] slice(uint mark) [private] ¶#

Return a slice of the freelist

uint allocate(Node node) [private] ¶#

Allocate an entry in the freelist, expanding as necessary

interface IXmlPrinter(T) ¶#

Specification for an XML serializer

alias Document!(T) Doc [public] ¶#: the typed document
alias Doc.Node Node [public] ¶#: generic document node
alias print opCall [public] ¶#: alias for print method
T[] print(Doc doc) ¶#: Generate a text representation of the document tree
void print(Node root, void delegate(T[][]...) emit) ¶#: Generate a representation of the given node-subtree