About us
Products
Services
Articles
Contact us

Designing the Benchmark

Generating and Parsing XML Documents for Benchmark Purposes

XML Parsing Benchmark    Running the Benchmark >>

The SAXDOMIX framework is packaged together with the benchmark's code (see the com.devsphere.xml.benchmark package). The SAX and DOM APIs and also their mixture are used to parse an XML table. In real-world cases, any of these methods might be used to parse such a document, depending on what you have to do. Since we just want to test the parsing, no processing is done. We simply ignore the information we get from the XML table.

The MainBase class provides most of the functionality of the XML parsing benchmark. It is extended by Main1 and Main2 that add the code specific to SAX 1.0 - DOM Level 1 and SAX 2.0 - DOM Level 2 respectively.

The benchmark measures the time and memory resources consumed during the parsing. Then, it calls the Java's garbage collector (with System.gc()) and the size of the used memory is calculated one more time.


Generating the XML Documents

An XML table contains a specified number of records like this:

<?xml version='1.0' encoding='US-ASCII'?>
<database>
...
    <person id='012345'>
        <name>Name012345</name>
        <email>[email protected]</email>
        <phone>12345 012345</phone>
        <address city='City012345' state='45' zip='012345' country='345'>
            <line1>L i n e 1 012345 012345</line1>
            <line2>L i n e 2 012345 012345</line2>
        </address>
    </person>
...
</database>

For validation, the following internal DTD is included.

<!DOCTYPE database [
<!ELEMENT database (person*)>
<!ELEMENT person (name, email, phone, address)>
<!ATTLIST person id CDATA #REQUIRED>
<!ELEMENT name (#PCDATA)>
<!ELEMENT email (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
<!ELEMENT address (line1, line2)>
<!ATTLIST address city CDATA #REQUIRED>
<!ATTLIST address state CDATA #REQUIRED>
<!ATTLIST address zip CDATA #REQUIRED>
<!ATTLIST address country CDATA #REQUIRED>
<!ELEMENT line1 (#PCDATA)>
<!ELEMENT line2 (#PCDATA)>
]>

When the namespace support is enabled, the records look like this:

<?xml version='1.0' encoding='US-ASCII'?>
<benchmark:database xmlns:benchmark='http://devsphere.com/xml/benchmark'>
...
    <benchmark:person id='012345'>
        <benchmark:name>Name012345</benchmark:name>
        <benchmark:email>[email protected]</benchmark:email>
        <benchmark:phone>12345 012345</benchmark:phone>
        <benchmark:address city='City012345' state='45' zip='012345' country='345'>
            <benchmark:line1>L i n e 1 012345 012345</benchmark:line1>
            <benchmark:line2>L i n e 2 012345 012345</benchmark:line2>
        </benchmark:address>
    </benchmark:person>
...
</benchmark:database>

Parsing the XML Documents

An XML table can be parsed in the main thread or in a specified number of concurrent threads. In any case, the file is first read to allow the operating system to cache its contents in memory. The parser will read the file using the Java I/O API too, but the hard disk isn't accessed during the parsing.

The events generated during the SAX parsing are passed to a default handler, which does nothing. A DOM tree that contains the whole table isn't processed either, but it is added to a vector not to become subject of the garbage collection. In a real-world application the SAX events are processed on the fly, while the DOM tree is used only after the parsing is finished.

When mixing the SAX and DOM methods, half of the table's records are obtained as DOM sub-trees and the other records remain SAX events. No reference to any SAX event or DOM sub-tree is kept in memory, so that these objects can be garbage-collected. A real-world application would process them on the fly during the parsing of the XML document.

XML Parsing Benchmark    Running the Benchmark >>


Copyright © 2000-2020 Devsphere

About us
Products
Services
Articles
Contact us