View previous topic :: View next topic |
Author |
Message |
manni
Joined: 16 Jan 2006 Posts: 25
|
Posted: Wed Mar 01, 2006 7:03 am Post subject: Error: ArrayBoundsError teqXML(880) |
|
|
Hallo,
i try to parse this little xml file:
Code: |
<?xml version="1.0" encoding="iso-8859-1" ?>
<PSI>
<FORMAT>
<SI_BEL>
<SI>
<VER_ZUS>
<GES ABSCHNITT="5">Gesamt</GES>
<GES ABSCHNITT="4">GESAMT(ohne MwSt.)</GES>
<GES ABSCHNITT="3">Verbindungen (ohne MwSt.)</GES>
<GRP ID="225">
<SPA ID="1" TYP="DATUM">Datum</SPA>
<SPA ID="2" TYP="UHRZEIT">Uhrzeit</SPA>
<SPA ID="6" TYP="ANZAHL">Anzahl</SPA>
<SPA ID="7" TYP="BETRAG">Betrag</SPA>
<SPA ID="15" TYP="URSPRUNG">Ursprung</SPA>
</GRP>
<GRP ID="330">
<SPA ID="1" TYP="DATUM">Datum</SPA>
<SPA ID="2" TYP="UHRZEIT">Uhrzeit</SPA>
<SPA ID="15" TYP="URSPRUNG">Ursprung</SPA>
<SPA ID="6" TYP="DAUER">Dauer</SPA>
<SPA ID="7" TYP="BETRAG">Betrag</SPA>
<SPA ID="16" TYP="DATENVOL">Datenvolumen</SPA>
</GRP>
<GRP ID="223">
<SPA ID="16" TYP="DATENVOL">Datenvolumen</SPA>
<SPA ID="7" TYP="BETRAG">Betrag</SPA>
</GRP>
</VER_ZUS>
</SI>
</SI_BEL>
</FORMAT>
</PSI>
|
with the programm:
Code: |
module mango.test.sax;
private import mango.xml.sax.DefaultSAXHandler,
mango.xml.sax.model.ISAXParser,
mango.xml.sax.model.ISAXHandler,
mango.xml.sax.parser.teqXML;
private import mango.io.Stdout,
mango.io.FileConduit,
mango.io.Buffer;
private import mango.text.model.UniString,
mango.text.String;
private alias StringT!(char) Utf8String;
private import mango.convert.Type;
void main()
{
readerTest1();
}
/**
Just outputs the data to the console.
*/
private class MyOutputHandler: DefaultSAXHandler!(char) {
private int tabs = 0;
this() {
}
}
void readerTest1() {
ISAXReader!() reader = new TeqXMLReader!()(512);
//FileConduit file = new FileConduit("SR0050531145120A.xml", FileStyle.ReadExisting);
FileConduit file = new FileConduit("short.xml", FileStyle.ReadExisting);
MyOutputHandler handler = new MyOutputHandler();
reader.parse(file, handler);
}
|
I get the error Message:
Error: ArrayBoundsError teqXML(880)
Have someone an Idea what happens?
My System is Linux Debian testing.
manni |
|
Back to top |
|
|
brad Site Admin
Joined: 22 Feb 2004 Posts: 490 Location: Atlanta, GA USA
|
|
Back to top |
|
|
manni
Joined: 16 Jan 2006 Posts: 25
|
Posted: Thu Mar 02, 2006 3:21 am Post subject: |
|
|
Hello,
ich have compile the programm with:
build -O -release -cleanup sax1.d
and now it run fine.
My 600MB xml File are parsesd in 1 Minute.
Nice nice
The next step, is to build a CVS File from the xml file.
manni |
|
Back to top |
|
|
teqdruid
Joined: 11 May 2004 Posts: 390 Location: UMD
|
Posted: Thu Mar 02, 2006 11:17 am Post subject: |
|
|
manni wrote: | Hello,
ich have compile the programm with:
build -O -release -cleanup sax1.d
and now it run fine.
My 600MB xml File are parsesd in 1 Minute.
Nice nice
The next step, is to build a CVS File from the xml file.
manni |
Sorry. Didn't see this until now. I'm glad you got it working. There are still a few bugs I'm trying to iron out, but it's nearing completion.
I haven't yet run any time trials, is the performance pretty good? I guess 10MB/second sounds OK. Do you happen to know how any other parsers stack up?
BTW, there will be further performance enhancements in the future, I just haven't gotten to all of them yet.
~John Demme |
|
Back to top |
|
|
manni
Joined: 16 Jan 2006 Posts: 25
|
Posted: Fri Mar 03, 2006 1:43 am Post subject: |
|
|
Hallo,
i have test it with perl:
use XML::Parser::PerlSAX;
real 0m30.035s
user 0m24.175s
sys 0m1.869s
In D: with new TeqXMLReader!()(512)
real 0m46.713s
user 0m42.195s
sys 0m1.829s
In D with new TeqXMLReader!()(1024)
real 0m48.861s
user 0m43.040s
sys 0m1.603s
In D with new TeqXMLReader!()(2048);
real 0m47.874s
user 0m43.224s
sys 0m1.390s
manni |
|
Back to top |
|
|
teqdruid
Joined: 11 May 2004 Posts: 390 Location: UMD
|
Posted: Fri Mar 03, 2006 2:32 pm Post subject: |
|
|
That's not quite the speed I was hoping for... Actually, it's performing better on my system. Try compiling with the -release and -inline flags. I was getting similar results until I used them, but with them the parser seems to be much, much faster. That's actually not too surprising considering that the parser makes a lot of calls to small methods, so it would benefit a lot from inlining. There's also a rather large amount of debug code in there, such as array bounds checking and asserts, which is removed with the -release option.
Could you also email me (me@teqdruid.com) the perl code that you're using to test? Also, what is the large XML file from? I wrote a quick app to generate a large XML file, but the file isn't exactly representative of typical XML files.
Thanks,
John
manni wrote: | Hallo,
i have test it with perl:
use XML::Parser::PerlSAX;
real 0m30.035s
user 0m24.175s
sys 0m1.869s
In D: with new TeqXMLReader!()(512)
real 0m46.713s
user 0m42.195s
sys 0m1.829s
In D with new TeqXMLReader!()(1024)
real 0m48.861s
user 0m43.040s
sys 0m1.603s
In D with new TeqXMLReader!()(2048);
real 0m47.874s
user 0m43.224s
sys 0m1.390s
manni |
|
|
Back to top |
|
|
manni
Joined: 16 Jan 2006 Posts: 25
|
Posted: Tue Mar 07, 2006 1:05 am Post subject: |
|
|
The perl program
Code: |
#!/usr/bin/env perl
use XML::Parser::PerlSAX;
my $file ='bigfile.xml';
my $handler = CamelHandler->new();
my $parser = XML::Parser::PerlSAX->new(Handler => $handler);
my $text;
$parser->parse(Source => { SystemId => $file});
package CamelHandler;
use strict;
sub new {
my $type = shift;
return bless {}, $type;
}
|
I think the perl module XML::Parser::PerlSAX is written in
C , it use Expat, maybe that is the reasen why perl is so fast.
The File is from a telefon Company . I believe that they wrote the xml File straight from the database.
Manfred |
|
Back to top |
|
|
teqdruid
Joined: 11 May 2004 Posts: 390 Location: UMD
|
Posted: Wed Mar 15, 2006 9:28 am Post subject: |
|
|
With your perl code, I have the following results:
Quote: | teqdruid@teqdruid ~/workspace/mango/mango/test $ time ./perlXmlRead.pl
real 0m17.514s
user 0m13.749s
sys 0m0.780s
teqdruid@teqdruid ~/workspace/mango/mango/test $ time ./timedXmlRead big.xml
Total time: 17897
real 0m18.355s
user 0m15.409s
sys 0m1.272s |
The line "total time:" is the time in milliseconds that my test program calculates it using, this way I'm not measuring the time the program takes to start or close. Similar code in the perl app would be the best comparison.
So teqXML is really close. What's interesting is that the sys time is so much larger. I wonder if this is just a matter of tuning Mango's IO stuff? I don't know anything about it, however. Or is this a measure of memory moves? Is the time the parser spends in malloc() or memmove() code counted here? If so, then I should try to cut down on memory operations I guess.... I'll just have to throw the profiler at it soon.
~John |
|
Back to top |
|
|
|