The Linux Foundation

 
LibXml2

From The Linux Foundation

Contents

libXML2

Description

libxml2 is the XML C Parser and toolkit for GNOME, usable outside of the GNOME platform.

Refer to LSB-Futures Tracker for more information.

List of apps and libraries using this library

Evolution, Kino, Dia, Gstreamer, Gnumeric, Inkscape, Anjuta, Koffice, etc. Also a requirement for PHP5 and included in other OSes such as Solaris and MacOS X.

API documentation

Reference manual documentation auto-generated from code is available at Reference Manual for libxml2.

Related test suites

Refer to LSB-Future tracker.

Library analysis data

Most commonly used interfaces

Found ~256 out of >1200 most commonly used interfaces in 90+ popular desktop OSS application we have analyzed. Interfaces are not listed here.

NOTE: Some commonly used interfaces are deprecated symbols.

Interfaces/Module excluded from LSB

libxml2 has 47 different sub-modules. Most of the modules are used by applications, except a few which are either deprecated or internal. The following modules should not be included into the LSB specification:

  • DOCBparser - deprecated
  • SAX - deprecated
  • schemasInternals - not stable enough to be included
  • XLink - unfinished and not stable
  • xmlunicode - low-level; should not be used externally but interface is public
  • chvalid - low-level; should not be used externally but interface is public
  • nanohttp - low-level; should not be used externally but interface is public
  • nanoftp - low-level; should not be used externally but interface is public

The following functions should not be included:

Module Interfaces Reason
HTMLparser htmlNodeStatus Experimental
catalog xmlCatalogGetSystem, xmlCatalogGetPublic deprecated
entities xmlCleanupPredefinedEntities, xmlCreateEntitiesTable, xmlInitializePredefinedEntities, xmlEncodeEntities deprecated
parser xmlGetFeature, xmlGetFeaturesList, xmlSetFeature deprecated
parserInternals xmlCheckLanguageID, xmlDecodeEntities, xmlHandleEntity, xmlNamespaceParseNCName, xmlNamespaceParseNSDef, xmlNamespaceParseQName, xmlParseNamespace, xmlParseQuotedString, xmlParserHandleReference, xmlScanName deprecated
parserInternals __xmlErrEncoding, xmlErrMemory internal
tree xmlDOMWrapRemoveNode, xmlDOMWrapReconcileNamespaces, xmlDOMWrapAdoptNode Experimental
tree xmlNewGlobalNs deprecated
valid xmlCopyElementContent, xmlFreeElementContent,xmlNewElementContent, xmlSprintfElementContent deprecated
xmlerror __xmlRaiseError, __xmlSimpleError internal
xmlschemas xmlSchemaValidError deprecated
xmlschemastypes xmlSchemaGetPredefinedType, xmlSchemaValidatePredefinedType, xmlSchemaValidateFacet,xmlSchemaValidateFacetWhtsp, xmlSchemaNewFacet, xmlSchemaCheckFacet, xmlSchemaFreeFacet, xmlSchemaGetBuiltInListSimpleTypeItemType, xmlSchemaValidateListSimpleTypeFacet, xmlSchemaIsBuiltInTypeFacet, xmlSchemaWhiteSpaceReplace, xmlSchemaGetFacetValueAsULong, xmlSchemaValidateLengthFacet, xmlSchemaValidateLengthFacetWhtsp, xmlSchemaValPredefTypeNodeNoNorm, xmlSchemaGetCanonValueWhtsp, xmlSchemaValueAppend, xmlSchemaValueGetNext, xmlSchemaValueGetAsString, xmlSchemaValueGetAsBoolean, xmlSchemaNewStringValue, xmlSchemaNewNOTATIONValue, xmlSchemaNewQNameValue, xmlSchemaCompareValuesWhtsp, xmlSchemaCopyValue internal or deprecated per Kasimier Buchcik


Test Suite Analysis

Introduction

libxml2 has extensive test coverage divided into three categories and maintained upstream.

  • testapi: exercises the library public entry points
  • runtest: runs basic internal regression tests
  • runsuite: runs an external regression test [schema tests]

For LSB the internal regression test (runtest) has the most value because it tests internal behvior by manipulating the public interface exported by the library. The tests parse an input file and compare the generated output.

The parser is an implementation of various W3C specifications and are subject to change.

The following subsection analyzes various versions of libxml2 internal tests running against ddifferent libxml2 versions.

Execute 2.6.20 test against 2.6.22 libxml2

Internal regression test cases

The 2.6.20 test suite run against the current version 2.6.22 of the library generates 56 errors (incluidng 5 Schema test case failures) out of 2557 total. The reason for each failure is categorized below.

CDATA ERRORS

The following test failures are due to a CDATA handling change when saving a script or style element from an XHTML document. Data is only declared to be CDATA if '&' or '<' are found in the content of the script or style element.

<!-- XML regression tests
-->
: File ./test/xhtml1 generated an error
<!-- XML regression tests on memory
-->
: Result for ./test/xhtml1 failed
: File ./test/xhtml1 generated an error
<!-- XML entity subst regression tests
-->
: File ./test/xhtml1 generated an error

The following code is an example of the difference in output between 2.6.20 and 2.6.22.

<   <script type="text/javascript">
<   ... unescaped script content ...
<   </script>
---
>   <script type="text/javascript"><![CDATA[
>   ... unescaped script < content ...
>   ]]></script>

The following is a CDATA error from XML push regression tests.

<!-- XML push regression tests
-->
: Result for ./test/xhtml1 failed
: File ./test/xhtml1 generated an error
SAX parser

The following test cases are for the SAX parser. Versions before 2.6.22 did not generate a callback for an undeclared entity (XML_WAR_UNDECLARED_ENTITY) but 2.6.22 generates callbacks for undeclared entities.

<!-- SAX1 callbacks regression tests
-->
: Got a difference for ./test/ent2
: File ./test/ent2 generated an error
: Got a difference for ./test/ent7
: File ./test/ent7 generated an error
: Got a difference for ./test/xml2
: File ./test/xml2 generated an error

The following SAX2 case also uses startElementNS rather than startElement.

<!-- SAX2 callbacks regression tests
-->
: Got a difference for ./test/ent2
: File ./test/ent2 generated an error

The remaining two SAX2 cases only have the additional callback inserted (similar to the SAX1 cases).

: Got a difference for ./test/ent7
: File ./test/ent7 generated an error
: Got a difference for ./test/xml2
: File ./test/xml2 generated an error

An example of the added callback follows:

: SAX.error: Entity 'title' not defined
+SAX.reference(title)
: SAX.characters(
: This text is about XML, the, 31)
Minor whitespace formatting errors

The following test cases failed due to minor changes in whitespace formatting of parsed output. These cases will be transparent to the application.

<!-- HTML regression tests
-->
: Result for ./test/HTML/Down.html failed
: File ./test/HTML/Down.html generated an error
: Result for ./test/HTML/attrents.html failed
: File ./test/HTML/attrents.html generated an error
: Result for ./test/HTML/cf_128.html failed
: File ./test/HTML/cf_128.html generated an error
: Result for ./test/HTML/doc2.htm failed
: File ./test/HTML/doc2.htm generated an error

The following test case is a bug fix, adding closing and starting div tags.

: Result for ./test/HTML/doc3.htm failed
: File ./test/HTML/doc3.htm generated an error

The following are more whitespace changes:

: Result for ./test/HTML/fp40.htm failed
: File ./test/HTML/fp40.htm generated an error
: result for ./test/HTML/liclose.html failed
: File ./test/HTML/liclose.html generated an error
: Result for ./test/HTML/pre.html failed
: Result for ./test/HTML/python.html failed
: File ./test/HTML/python.html generated an error
: Result for ./test/HTML/test2.html failed
: File ./test/HTML/test2.html generated an error
: Result for ./test/HTML/test3.html failed
: File ./test/HTML/test3.html generated an error
: Result for ./test/HTML/wired.html failed
: File ./test/HTML/wired.html generated an error

<!-- Push HTML regression tests
-->
: Result for ./test/HTML/Down.html failed
: File ./test/HTML/Down.html generated an error
: Result for ./test/HTML/attrents.html failed
: File ./test/HTML/attrents.html generated an error
: Result for ./test/HTML/cf_128.html failed
: File ./test/HTML/cf_128.html generated an error
: Result for ./test/HTML/doc2.htm failed
: File ./test/HTML/doc2.htm generated an error
: Result for ./test/HTML/doc3.htm failed

The following test case is a bug fix, adding closing and starting div tags.

: File ./test/HTML/doc3.htm generated an error
: Result for ./test/HTML/fp40.htm failed

The following are more whitespace changes:

: File ./test/HTML/fp40.htm generated an error
: Result for ./test/HTML/liclose.html failed
: File ./test/HTML/liclose.html generated an error
: Result for ./test/HTML/pre.html failed
: File ./test/HTML/pre.html generated an error
: Result for ./test/HTML/python.html failed
: File ./test/HTML/python.html generated an error
: Result for ./test/HTML/test2.html failed
: File ./test/HTML/test2.html generated an error
: Result for ./test/HTML/test3.html failed
: File ./test/HTML/test3.html generated an error
: Result for ./test/HTML/wired.html failed
: File ./test/HTML/wired.html generated an error

An example of the whitespace formatting difference difference between 2.6.20 and 2.6.22.0 follows.

diff result/HTML/cf_128.html ../libxml2-2.6.22/result/HTML/cf_128.html
4c4,6
< <body><table border="4"><tr>
---
> <body>
>
> <table border="4"><tr>
12c14,15
<    </tr></table></body>
---
>    </tr></table>
> </body>

diff result/HTML/Down.html ../libxml2-2.6.22/result/HTML/Down.html
6d5
< <p>
9d7
< </p>
HTML SAX errors

The following test no longer generate the deprecated SAX.ignorableWhitespace() and now generate SAX.characters().

<!-- HTML SAX regression tests
-->
: Got a difference for ./test/HTML/Down.html
: File ./test/HTML/Down.html generated an error
: Got a difference for ./test/HTML/attrents.html
: File ./test/HTML/attrents.html generated an error
: Got a difference for ./test/HTML/cf_128.html
: File ./test/HTML/cf_128.html generated an error
: Got a difference for ./test/HTML/doc2.htm
: File ./test/HTML/doc2.htm generated an error
: Got a difference for ./test/HTML/doc3.htm
: File ./test/HTML/doc3.htm generated an error
: Got a difference for ./test/HTML/fp40.htm
: File ./test/HTML/fp40.htm generated an error
: Got a difference for ./test/HTML/liclose.html
: File ./test/HTML/liclose.html generated an error
: Got a difference for ./test/HTML/pre.html
: File ./test/HTML/pre.html generated an error
: Got a difference for ./test/HTML/python.html
: File ./test/HTML/python.html generated an error
: Got a difference for ./test/HTML/reg1.html
: File ./test/HTML/reg1.html generated an error
: Got a difference for ./test/HTML/reg2.html
: File ./test/HTML/reg2.html generated an error
: Got a difference for ./test/HTML/reg3.html
: File ./test/HTML/reg3.html generated an error
: Got a difference for ./test/HTML/reg4.html
: File ./test/HTML/reg4.html generated an error
: Got a difference for ./test/HTML/script.html
: File ./test/HTML/script.html generated an error
: Got a difference for ./test/HTML/test2.html
: File ./test/HTML/test2.html generated an error
: Got a difference for ./test/HTML/test3.html
: File ./test/HTML/test3.html generated an error
: Got a difference for ./test/HTML/wired.html
: File ./test/HTML/wired.html generated an error

Examples of the SAX differences follow:

: diff ../libxml2-2.6.20/result/HTML/script.html.sax ../libxml2-2.6.22/result/HTML/script.html.sax
14c14
< SAX.ignorableWhitespace(
---
> SAX.characters(
20c20
< SAX.ignorableWhitespace(
---
> SAX.characters(
24c24
< SAX.ignorableWhitespace(
---
> SAX.characters(

diff result/HTML/Down.html.sax ../libxml2-2.6.22/result/HTML/Down.html.sax
19c19
< SAX.ignorableWhitespace(
---
> SAX.characters(
24d23
< SAX.startElement(p)
27d25
< SAX.endElement(p)
31c29
< SAX.ignorableWhitespace(
---
> SAX.characters(
Schema regression tests

These test cases validate the schema validation part of libxml2; validate instance of schema against schema def given. Above schema regression test cases failed[though they are not counted in reporting] because minor change in formatting of error output.

<!-- Schemas regression tests
-->
: Error for ./test/schemas/any3_0.xml on ./test/schemas/any3_0.xsd failed
: Error for ./test/schemas/bug303566_1.xml on ./test/schemas/bug303566_1.xsd failed
: Error for ./test/schemas/changelog093_0.xml on ./test/schemas/changelog093_1.xsd failed
: Result for ./test/schemas/derivation-ok-extension_0.xml on ./test/schemas/derivation-ok-extension_0.xsd failed
: Error for ./test/schemas/derivation-ok-extension_0.xml on ./test/schemas/derivation-ok-extension_0.xsd failed

Difference of few formatting output is listed below:

diff result/schemas/changelog093_1_0.err ../libxml2-2.6.22/result/schemas/changelog093_1_0.err
1c1
< ./test/schemas/changelog093_0.xml:7: element description: Schemas validity error : Element '{http://www.blackperl.com/XML/[[ChangeLog]]}description': Duplicate key-sequence ['PL'].
---
> ./test/schemas/changelog093_0.xml:7: element description: Schemas validity error : Element '{http://www.blackperl.com/XML/[[ChangeLog]]}description': Duplicate key-sequence ['PL'] in unique identity-constraint '{http://www.blackperl.com/XML/[[ChangeLog]]}changelogDescriptionLangConstraint'.

diff result/schemas/derivation-ok-extension_0_0.err ../libxml2-2.6.22/result/schemas/derivation-ok-extension_0_0.err
1c1
< ./test/schemas/derivation-ok-extension_0.xsd:10: element attribute: Schemas parser error : local complex type, attribute decl. 'barA_1': Duplicate attribute use specified.
---
> ./test/schemas/derivation-ok-extension_0.xsd:10: element attribute: Schemas parser warning : Element '{http://www.w3.org/2001/XMLSchema}attribute': Attribute use prohibitions are pointless when extending a type.

API test cases

Ran the API test cases from 2.6.20 against 2.6.21 and 2.6.22 and they work fine without any failure, which show that there is no change in API definition across minor releases.

Execute 2.6.17 test against 2.6.22 libxml2

Internal regression test

Ran test suite shipped with 2.6.17 against current version of library, and found around 110 test cases failed. All the failure seen in above are present here with the same reason. There is a increase in Schema regression test failure from 5 to 41[Recently most of work has been done on Schemas and still there are more changes anticipated] and 2.6.20 doesn't include HTML Push SAX regression which has around 17 failure.

There are 3 new failure also in in "Validity checking regression". Again because of formating change in the error string, removed the duplicate error strings see diff below:

diff ../libxml2-2.6.17/result/VC/[[OneID]] ../libxml2-2.6.22/result/VC/[[OneID]]
4d3
< ./test/VC/[[OneID]]:0: validity error : Element doc has too many ID attributes defined : id

diff ../libxml2-2.6.17/result/VC/[[OneID2]] ../libxml2-2.6.22/result/VC/[[OneID2]]
4d3
< validity error : Element doc has too many ID attributes defined : id

diff ../libxml2-2.6.17/result/VC/[[OneID3]] ../libxml2-2.6.22/result/VC/[[OneID3]]
4d3
< ./test/VC/[[OneID3]]:0: validity error : Element doc has too many ID attributes defined : val

Couple more samples of schema regression failure, though all have same resason as mentioned above.

diff result/schemas/ns0_0_2.err ../libxml2-2.6.22/result/schemas/ns0_0_2.err
1c1
< ./test/schemas/ns0_2.xml:1: element foo: Schemas validity error : Element 'foo': No matching global declaration available.
---
> ./test/schemas/ns0_2.xml:1: element foo: Schemas validity error : Element 'foo': No matching global declaration available for the validation root.

diff result/schemas/src-element2-1_0_0.err ../libxml2-2.6.22/result/schemas/src-element2-1_0_0.err
1c1
< ./test/schemas/src-element2-1_0.xsd:12: element element: Schemas parser error : Element ref. 'foo:bar': The attributes 'ref' and 'name' are mutually exclusive.
---
> ./test/schemas/src-element2-1_0.xsd:12: element element: Schemas parser error : Element '{http://www.w3.org/2001/XMLSchema}element': The attributes 'ref' and 'name' are mutually exclusive.

Additional test analysis results

The spreadsheet attached here [attachment:XML2_error_matrix] shows the results when we ran tests from prior versions of the library against later ones. This is in addition to the results discussed here.

Test case development

The provided libxml2 test cases have been modified as follows, and an email thread on the libxml mailing list describes some of the following:

  • the HTML test cases have been removed because many of the changes made are

due to ignorable whitespace changes in the HTML output

  • the Schema test has been modified to only check to see whether or not errors

occur. In the unmodified test case, the modified errors are captured and 'diff'-ed against stored error messages.

The modified tests completed successfully on older versions of the saved files (both 2.6.16 and 2.6.20 on the 2.6.22 library.)

Open question

  1. Its clear from analysis that internal regression test cases has good value to LSB runtime test but same time above deviation can not be avoided. Apart from this application are not using the library exactly same way the internal regression test cases are written to verify, and most of the time application doesn't depends on parsed output too. So the question is Should we include these test cases as part of LSB runtime, if so then how we will deal with similar failure in future?
2. Will change in SAX callback as found above will impact the application behavior/or require change in application code?

Options

Few option around inclusion of internal regression test cases.

  1. Include internal regression test, and give waivers to these type of failure, or update periodically.
2. Don't include internal regression test and just limit to API test cases, to make sure they are consistent.
3. Reduce the number of number of test input file to internal regression test, that will reduce the magnitude of failures, and same time get the benefit of these test cases.
4. Any other option?

Other information

[1] John Boyer at freedesktop.org proposes inclusion of libxml2 instead of libexpat as libxml2 is much more comprehensive and libxslt and libgdome2 are built on top of it. It is actively supported as well.


To add comments to this page, click on [wiki:LibXml2/Comments User Comments] and select edit.

[[Include(LibXml2/Comments)]]


[Article] [Discussion] [View source] [History]