View previous topic :: View next topic |
Author |
Message |
Lutger
Joined: 25 May 2006 Posts: 91
|
Posted: Sun Oct 08, 2006 11:59 am Post subject: xml.sax status? |
|
|
Hi, I'm sorry if I have overlooked the answer somewhere, but what is the status of mango.xml.sax? It's not in the release download, but so far it seems to work fine. Just wondering if it's okay to use from svn or if there are known issues / missing stuff. I'd like to use it instead of tinyxml (it's easier and faster).
btw, from what I've used now, mango is so much more easier than I thought, it's working great for me, thank you. |
|
Back to top |
|
|
teqdruid
Joined: 11 May 2004 Posts: 390 Location: UMD
|
Posted: Mon Oct 09, 2006 9:06 am Post subject: Re: xml.sax status? |
|
|
Lutger wrote: | Hi, I'm sorry if I have overlooked the answer somewhere, but what is the status of mango.xml.sax? It's not in the release download, but so far it seems to work fine. Just wondering if it's okay to use from svn or if there are known issues / missing stuff. I'd like to use it instead of tinyxml (it's easier and faster).
btw, from what I've used now, mango is so much more easier than I thought, it's working great for me, thank you. |
It's probably ready for use. It's not in the release version for a few reasons:
-It relies on mango.containers, which isn't quite ready for release
-There hasn't been a release since I got it working OK
-I haven't tested it quite as much as I'd like to (although I've done a fair amout)
-More optimizations
-The parser is a bit rough around the edges yet
-Benchmarking
I'd appreciate any and all feedback on the SAX interface and the parser. If you find any bugs, please file them in the Mango project trac. I'd also like to know how you find it in terms of speed. I designed it to have very low heap usage so it should run pretty quick (compile with -inline -O) but I haven't yet figured out how to avoid the vtable lookups it's doing- this should increase the speed a lot.
Glad to see someone's interested.
~John |
|
Back to top |
|
|
Lutger
Joined: 25 May 2006 Posts: 91
|
Posted: Mon Oct 09, 2006 11:26 am Post subject: |
|
|
Awesome. I'm glad you've made this thing. I only hacked up some loading code to parse xml files, but I'll give feedback when I have some.
In terms of performance, I have not done any really valid tests, but found that my hacked up thing is at least 7 to 8 times as fast as the tinyxml code I had, I suspect it will be more for larger files where tinyxml really stalls. More importantly, it took me only an hour or so to understand and write significantly cleaner code than the tinyxml stuff, that took me 2 hours or so - and I was already familiar with tinyxml. I like this sax thing, fast and simple.
Converting mango's String to char[], the following note in mango.text.string did affect performance by about 10?:
Quote: | Convert to the AbstractString types. The optional argument
dst will be resized as required to house the conversion.
To minimize heap allocation, use the following pattern:
String string;
wchar[] buffer;
wchar[] result = string.toUtf16 (buffer);
if (result.length > buffer.length)
buffer = result; |
|
|
Back to top |
|
|
teqdruid
Joined: 11 May 2004 Posts: 390 Location: UMD
|
Posted: Mon Oct 09, 2006 11:52 am Post subject: |
|
|
Lutger wrote: | In terms of performance, I have not done any really valid tests, but found that my hacked up thing is at least 7 to 8 times as fast as the tinyxml code I had, I suspect it will be more for larger files where tinyxml really stalls. More importantly, it took me only an hour or so to understand and write significantly cleaner code than the tinyxml stuff, that took me 2 hours or so - and I was already familiar with tinyxml. I like this sax thing, fast and simple. |
I'm not familar with TinyXML- it's DOM style? Looks like C++- were you writing in C++, or using some D bindings?
Quote: | Converting mango's String to char[], the following note in mango.text.string did affect performance by about 10?:
Quote: | Convert to the AbstractString types. The optional argument
dst will be resized as required to house the conversion.
To minimize heap allocation, use the following pattern:
String string;
wchar[] buffer;
wchar[] result = string.toUtf16 (buffer);
if (result.length > buffer.length)
buffer = result; |
|
This helped to speed up your code to get it out of the String class, you mean? Are you doing a UTF conversion? You shouldn't have to do any UTF conversion yourself, you can use the SAX template directly if you want to use anything other than char. If you're just copying the string, you might also look at the copy method.
In order to reduce heap allocations, the teqXML parser uses one buffer and moves the data around in that buffer. As such, when strings are delievered to the client, the memory references are only good during that function call, after which the memory might get shifted (the parser owns the object). I had also considered never moving memory and allocating more memory when more space was needed (and abandoning references no longer in use for the GC to handle); with this technique I could give ownership of the strings to the client. I decided that it would be better to minimize heap allocations and make the client code do any necessary heap allocation- I think it is more flexible this way. After using it, do you agree with this decision?
~John |
|
Back to top |
|
|
Lutger
Joined: 25 May 2006 Posts: 91
|
Posted: Mon Oct 09, 2006 5:32 pm Post subject: |
|
|
teqdruid wrote: |
I'm not familar with TinyXML- it's DOM style? Looks like C++- were you writing in C++, or using some D bindings? |
DOM, port from C++ under TinyXPath here at dsource.
Quote: | Quote: | Converting mango's String to char[], the following note in mango.text.string did affect performance by about 10?:
<snip |
This helped to speed up your code to get it out of the String class, you mean? Are you doing a UTF conversion? You shouldn't have to do any UTF conversion yourself, you can use the SAX template directly if you want to use anything other than char. If you're just copying the string, you might also look at the copy method. |
Hmm yes, missed that one, a copy is what I need. It doesn't have a slice does it?
Quote: | In order to reduce heap allocations, the teqXML parser uses one buffer and moves the data around in that buffer. As such, when strings are delievered to the client, the memory references are only good during that function call, after which the memory might get shifted (the parser owns the object). I had also considered never moving memory and allocating more memory when more space was needed (and abandoning references no longer in use for the GC to handle); with this technique I could give ownership of the strings to the client. I decided that it would be better to minimize heap allocations and make the client code do any necessary heap allocation- I think it is more flexible this way. After using it, do you agree with this decision?
~John |
I agree. I had one initial bug due to mistakenly relying on ownership, but quickly discovered the error. As long as it is documented this is the right way imo. Just because there is no const and we have garbage collection doesn't mean D libraries should prevent users from shooting at their feet at all costs. |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|