3. Comparing Pieces of XML

3.1. The Difference Engine

At the center of XMLUnit's support for comparisons is the DifferenceEngine class. In practice you rarely deal with it directly but rather use it via instances of Diff or DetailedDiff classes (see Section 3.5, “Diff and DetailedDiff).

The DifferenceEngine walks two trees of DOM Nodes, the control and the test tree, and compares the nodes. Whenever it detects a difference, it sends a message to a configured DifferenceListener (see Section 3.3, “DifferenceListener) and asks a ComparisonController (see Section 3.2, “ComparisonController) whether the current comparison should be halted.

In some cases the order of elements in two pieces of XML may not be significant. If this is true, the DifferenceEngine needs help to determine which Elements to compare. This is the job of an ElementQualifier (see Section 3.4, “ElementQualifier).

The types of differences DifferenceEngine can detect are enumerated in the DifferenceConstants interface and represented by instances of the Difference class.

A Difference can be recoverable; recoverable Differences make the Diff class consider two pieces of XML similar while non-recoverable Differences render the two pieces different.

The types of Differences that are currently detected are listed in Table 1, “Document level Differences detected by DifferenceEngine to Table 4, “Other Differences detected by DifferenceEngine (the first two columns refer to the DifferenceConstants class).

Table 1. Document level Differences detected by DifferenceEngine

IDConstantrecoverableDescription
HAS_DOCTYPE_DECLARATION_IDHAS_DOCTYPE_DECLARATIONtrueOne piece of XML has a DOCTYPE declaration while the other one has not.
DOCTYPE_NAME_IDDOCTYPE_NAMEfalseBoth pieces of XML contain a DOCTYPE declaration but the declarations specify different names for the root element.
DOCTYPE_PUBLIC_ID_IDDOCTYPE_PUBLIC_IDfalseBoth pieces of XML contain a DOCTYPE declaration but the declarations specify different PUBLIC identifiers.
DOCTYPE_SYSTEM_ID_IDDOCTYPE_SYSTEM_IDtrueBoth pieces of XML contain a DOCTYPE declaration but the declarations specify different SYSTEM identifiers.
NODE_TYPE_IDNODE_TYPEfalseThe test piece of XML contains a different type of node than was expected. This type of difference will also occur if either the root control or test Node is null while the other is not.
NAMESPACE_PREFIX_IDNAMESPACE_PREFIXtrueTwo nodes use different prefixes for the same XML Namespace URI in the two pieces of XML.
NAMESPACE_URI_IDNAMESPACE_URIfalseTwo nodes in the two pieces of XML share the same local name but use different XML Namespace URIs.
SCHEMA_LOCATION_IDSCHEMA_LOCATIONtrueTwo nodes have different values for the schemaLocation attribute of the XMLSchema-Instance namespace. The attribute could be present on only one of the two nodes.
NO_NAMESPACE_SCHEMA_LOCATION_IDNO_NAMESPACE_SCHEMA_LOCATIONtrueTwo nodes have different values for the noNamespaceSchemaLocation attribute of the XMLSchema-Instance namespace. The attribute could be present on only one of the two nodes.

Table 2. Element level Differences detected by DifferenceEngine

IDConstantrecoverableDescription
ELEMENT_TAG_NAME_IDELEMENT_TAG_NAMEfalseThe two pieces of XML contain elements with different tag names.
ELEMENT_NUM_ATTRIBUTES_IDELEMENT_NUM_ATTRIBUTESfalseThe two pieces of XML contain a common element, but the number of attributes on the element is different.
HAS_CHILD_NODES_IDHAS_CHILD_NODESfalseAn element in one piece of XML has child nodes while the corresponding one in the other has not.
CHILD_NODELIST_LENGTH_IDCHILD_NODELIST_LENGTHfalseTwo elements in the two pieces of XML differ by their number of child nodes.
CHILD_NODELIST_SEQUENCE_IDCHILD_NODELIST_SEQUENCEtrueTwo elements in the two pieces of XML contain the same child nodes but in a different order.
CHILD_NODE_NOT_FOUND_IDCHILD_NODE_NOT_FOUNDfalseA child node in one piece of XML couldn't be matched against any other node of the other piece.
ATTR_SEQUENCE_IDATTR_SEQUENCEtrueThe attributes on an element appear in different order[a] in the two pieces of XML.

[a] Note that the order of attributes is not significant in XML, different parsers may return attributes in a different order even if parsing the same XML document. There is an option to turn this check off - see Section 3.8, “Configuration Options” - but it is on by default for backwards compatibility reasons


Table 3. Attribute level Differences detected by DifferenceEngine

IDConstantrecoverableDescription
ATTR_VALUE_EXPLICITLY_SPECIFIED_IDATTR_VALUE_EXPLICITLY_SPECIFIEDtrueAn attribute that has a default value according to the content model of the element in question has been specified explicitly in one piece of XML but not in the other.[a]
ATTR_NAME_NOT_FOUND_IDATTR_NAME_NOT_FOUNDfalseOne piece of XML contains an attribute on an element that is missing in the other.
ATTR_VALUE_IDATTR_VALUEfalseThe value of an element's attribute is different in the two pieces of XML.

[a] In order for this difference to be detected the parser must have been in validating mode when the piece of XML was parsed and the DTD or XML Schema must have been available.


Table 4. Other Differences detected by DifferenceEngine

IDConstantrecoverableDescription
COMMENT_VALUE_IDCOMMENT_VALUEfalseThe content of two comments is different in the two pieces of XML.
PROCESSING_INSTRUCTION_TARGET_IDPROCESSING_INSTRUCTION_TARGETfalseThe target of two processing instructions is different in the two pieces of XML.
PROCESSING_INSTRUCTION_DATA_IDPROCESSING_INSTRUCTION_DATAfalseThe data of two processing instructions is different in the two pieces of XML.
CDATA_VALUE_IDCDATA_VALUEfalseThe content of two CDATA sections is different in the two pieces of XML.
TEXT_VALUE_IDTEXT_VALUEfalseThe value of two texts is different in the two pieces of XML.

Note that some of the differences listed may be ignored by the DifferenceEngine if certain configuration options have been specified. See Section 3.8, “Configuration Options” for details.

DifferenceEngine passes differences found around as instances of the Difference class. In addition to the type of of difference this class also holds information on the nodes that have been found to be different. The nodes are described by NodeDetail instances that encapsulate the DOM Node instance as well as the XPath expression that locates the Node inside the given piece of XML. NodeDetail also contains a "value" that provides more information on the actual values that have been found to be different, the concrete interpretation depends on the type of difference as can be seen in Table 5, “Contents of NodeDetail.getValue() for Differences”.

Table 5. Contents of NodeDetail.getValue() for Differences

Difference.getId()NodeDetail.getValue()
HAS_DOCTYPE_DECLARATION_ID"not null" if the document has a DOCTYPE declaration, "null" otherwise.
DOCTYPE_NAME_IDThe name of the root element.
DOCTYPE_PUBLIC_IDThe PUBLIC identifier.
DOCTYPE_SYSTEM_IDThe SYSTEM identifier.
NODE_TYPE_IDIf one node was absent: "not null" if the node exists, "null" otherwise. If the node types differ the value will be a string-ified version of org.w3c.dom.Node.getNodeType().
NAMESPACE_PREFIX_IDThe Namespace prefix.
NAMESPACE_URI_IDThe Namespace URI.
SCHEMA_LOCATION_IDThe attribute's value or "[attribute absent]" if it has not been specified.
NO_NAMESPACE_SCHEMA_LOCATION_IDThe attribute's value or "[attribute absent]" if it has not been specified.
ELEMENT_TAG_NAME_IDThe tag name with any Namespace information stripped.
ELEMENT_NUM_ATTRIBUTES_IDThe number of attributes present turned into a String.
HAS_CHILD_NODES_ID"true" if the element has child nodes, "false" otherwise.
CHILD_NODELIST_LENGTH_IDThe number of child nodes present turned into a String.
CHILD_NODELIST_SEQUENCE_IDThe sequence number of this child node turned into a String.
CHILD_NODE_NOT_FOUND_IDThe name of the unmatched node or "null". If the node is an element inside an XML namespace the name will be Java5-QName-like {NS-URI}LOCAL-NAME - in all other cases it is the node's local name.
ATTR_SEQUENCE_IDThe attribute's name.
ATTR_VALUE_EXPLICITLY_SPECIFIED_ID"true" if the attribute has been specified, "false" otherwise.
ATTR_NAME_NOT_FOUND_IDThe attribute's name or "null". If the attribute belongs to an XML namespace the name will be Java5-QName-like {NS-URI}LOCAL-NAME - in all other cases it is the attribute's local name.
ATTR_VALUE_IDThe attribute's value.
COMMENT_VALUE_IDThe actual comment.
PROCESSING_INSTRUCTION_TARGET_IDThe processing instruction's target.
PROCESSING_INSTRUCTION_DATA_IDThe processing instruction's data.
CDATA_VALUE_IDThe content of the CDATA section.
TEXT_VALUE_IDThe actual text.

As said in the first paragraph you won't deal with DifferenceEngine directly in most cases. In cases where Diff or DetailedDiff don't provide what you need you'd create an instance of DifferenceEngine passing a ComparisonController in the constructor and invoke compare with your DOM trees to compare as well as a DifferenceListener and ElementQualifier. The listener will be called on any differences while the control method is executing.

Example 16. Using DifferenceEngine Directly

class MyDifferenceListener implements DifferenceListener {
    private boolean calledFlag = false;
    public boolean called() { return calledFlag; }

    public int differenceFound(Difference difference) {
        calledFlag = true;
        return RETURN_ACCEPT_DIFFERENCE;
    }

    public void skippedComparison(Node control, Node test) {
    }
}

DifferenceEngine engine = new DifferenceEngine(myComparisonController);
MyDifferenceListener listener = new MyDifferenceListener();
engine.compare(controlNode, testNode, listener,
               myElementQualifier);
System.err.println("There have been "
                   + (listener.called() ? "" : "no ")
                   + "differences.");

3.2. ComparisonController

The ComparisonController's job is to decide whether a comparison should be halted after a difference has been found. Its interface is:

    /**
     * Determine whether a Difference that the listener has been notified of
     *  should halt further XML comparison. Default behaviour for a Diff
     *  instance is to halt if the Difference is not recoverable.
     * @see Difference#isRecoverable
     * @param afterDifference the last Difference passed to <code>differenceFound</code>
     * @return true to halt further comparison, false otherwise
     */
    boolean haltComparison(Difference afterDifference);

Whenever a difference has been detected by the DifferenceEngine the haltComparison method will be called immediately after the DifferenceListener has been informed of the difference. This is true no matter what type of Difference has been found or which value the DifferenceListener has returned.

The only implementations of ComparisonController that ship with XMLUnit are Diff and DetailedDiff, see Section 3.5, “Diff and DetailedDiff for details about them.

A ComparisonController that halted the comparison on any non-recoverable difference could be implemented as:

Example 17. A Simple ComparisonController

public class HaltOnNonRecoverable implements ComparisonController {
    public boolean haltComparison(Difference afterDifference) {
        return !afterDifference.isRecoverable();
    }
}

3.3. DifferenceListener

DifferenceListener contains two callback methods that are invoked by the DifferenceEngine when differences are detected:

    /**
     * Receive notification that 2 nodes are different.
     * @param difference a Difference instance as defined in {@link
     * DifferenceConstants DifferenceConstants} describing the cause
     * of the difference and containing the detail of the nodes that
     * differ
     * @return int one of the RETURN_... constants describing how this
     * difference was interpreted
     */
    int differenceFound(Difference difference);

    /**
     * Receive notification that a comparison between 2 nodes has been skipped
     *  because the node types are not comparable by the DifferenceEngine
     * @param control the control node being compared
     * @param test the test node being compared
     * @see DifferenceEngine
     */
    void skippedComparison(Node control, Node test);

differenceFound is invoked by DifferenceEngine as soon as a difference has been detected. The return value of that method is completely ignored by DifferenceEngine, it becomes important when used together with Diff, though (see Section 3.5, “Diff and DetailedDiff). The return value should be one of the four constants defined in the the DifferenceListener interface:

    /** 
     * Standard return value for the <code>differenceFound</code> method.
     * Indicates that the <code>Difference</code> is interpreted as defined 
     * in {@link DifferenceConstants DifferenceConstants}.
     */
    int RETURN_ACCEPT_DIFFERENCE;
    /** 
     * Override return value for the <code>differenceFound</code> method.
     * Indicates that the nodes identified as being different should be 
     * interpreted as being identical.
     */
    int RETURN_IGNORE_DIFFERENCE_NODES_IDENTICAL;
    /** 
     * Override return value for the <code>differenceFound</code> method.
     * Indicates that the nodes identified as being different should be 
     * interpreted as being similar.
     */
    int RETURN_IGNORE_DIFFERENCE_NODES_SIMILAR;
    /** 
     * Override return value for the <code>differenceFound</code> method.
     * Indicates that the nodes identified as being similar should be 
     * interpreted as being different.
     */
    int RETURN_UPGRADE_DIFFERENCE_NODES_DIFFERENT = 3;

The skippedComparison method is invoked if the DifferenceEngine encounters two Nodes it cannot compare. Before invoking skippedComparison DifferenceEngine will have invoked differenceFound with a Difference of type NODE_TYPE.

A custom DifferenceListener that ignored any DOCTYPE related differences could be written as:

Example 18. A DifferenceListener that Ignores DOCTYPE Differences

public class IgnoreDoctype implements DifferenceListener {
    private static final int[] IGNORE = new int[] {
        DifferenceConstants.HAS_DOCTYPE_DECLARATION_ID,
        DifferenceConstants.DOCTYPE_NAME_ID,
        DifferenceConstants.DOCTYPE_PUBLIC_ID_ID,
        DifferenceConstants.DOCTYPE_SYSTEM_ID_ID
    };

    static {
        Arrays.sort(IGNORE);
    }

    public int differenceFound(Difference difference) {
        return Arrays.binarySearch(IGNORE, difference.getId()) >= 0
            ? RETURN_IGNORE_DIFFERENCE_NODES_IDENTICAL
            : RETURN_ACCEPT_DIFFERENCE;
    }
    
    public void skippedComparison(Node control, Node test) {
    }
}

Apart from Diff and DetailedDiff XMLUnit ships with an additional implementation of DifferenceListener.

3.3.1. IgnoreTextAndAttributeValuesDifferenceListener

IgnoreTextAndAttributeValuesDifferenceListener doesn't do anything in skippedComparison. It "downgrades" Differences of type ATTR_VALUE, ATTR_VALUE_EXPLICITLY_SPECIFIED and TEXT_VALUE to recoverable differences.

This means if instances of IgnoreTextAndAttributeValuesDifferenceListener are used together with Diff then two pieces of XML will be considered similar if they have the same basic structure. They are not considered identical, though.

Note that the list of ignored differences doesn't cover all textual differences. You should configure XMLUnit to ignore comments and whitespace and to consider CDATA sections and text nodes to be the same (see Section 3.8, “Configuration Options”) in order to cover COMMENT_VALUE and CDATA_VALUE as well.

3.4. ElementQualifier

When DifferenceEngine encounters a list of DOM Elements as children of another Element it will ask the configured ElementQualifier which Element of the control piece of XML should be compared to which of the test piece. Its contract is:

    /**
     * Determine whether two elements are comparable
     * @param control an Element from the control XML NodeList
     * @param test an Element from the test XML NodeList
     * @return true if the elements are comparable, false otherwise
     */
    boolean qualifyForComparison(Element control, Element test); 

For any given Element in the control piece of XML DifferenceEngine will cycle through the corresponding list of Elements in the test piece of XML until qualifyForComparison has returned true or the test document is exhausted.

When using DifferenceEngine or Diff it is completely legal to set the ElementQualifier to null. In this case any kind of Node is compared to the test Node that appears at the same position in the sequence.

Example 19. Example Nodes for ElementQualifier (the comments are not part of the example)

<!-- control piece of XML -->
<parent>
  <child1/>                        <!-- control node 1 -->
  <child2/>                        <!-- control node 2 -->
  <child2 foo="bar">xyzzy</child2> <!-- control node 3 -->
  <child2 foo="baz"/>              <!-- control node 4 -->
</parent>

<!-- test piece of XML -->
<parent>
  <child2 foo="baz"/>              <!-- test node 1 -->
  <child1/>                        <!-- test node 2 -->
  <child2>xyzzy</child2>           <!-- test node 3 -->
  <child2 foo="bar"/>              <!-- test node 4 -->
</parent>

Taking Example 19, “Example Nodes for ElementQualifier (the comments are not part of the example)” without any ElementQualifier DifferenceEngine will compare control node n to test node n for n between 1 and 4. In many cases this is exactly what is desired, but sometimes <a><b/><c/></a> should be similar to <a><c/><b/></a> because the order of elements doesn't matter - this is when you'd use a different ElementQualifier. XMLUnit ships with several implementations.

3.4.1. ElementNameQualifier

Only Elements with the same name - and Namespace URI if present - qualify.

In Example 19, “Example Nodes for ElementQualifier (the comments are not part of the example)” this means control node 1 will be compared to test node 2. Then control node 2 will be compared to test node 3 because DifferenceEngine will start to search for the matching test Element at the second test node, the same sequence number the control node is at. Control node 3 is compared to test node 3 as well and control node 4 to test node 4.

3.4.2. ElementNameAndAttributeQualifier

Only Elements with the same name - and Namespace URI if present - as well as the same values for all attributes given in ElementNameAndAttributeQualifier's constructor qualify.

Let's say "foo" has been passed to ElementNameAndAttributeQualifier's constructor when looking at Example 19, “Example Nodes for ElementQualifier (the comments are not part of the example)”. This again means control node 1 will be compared to test node 2 since they do have the same name and no value at all for attribute "foo". Then control node 2 will be compared to test node 3 - again, no value for "foo". Control node 3 is compared to test node 4 as they have the same value "bar". Finally control node 4 is compared to test node 1; here DifferenceEngine searches from the beginning of the test node list after test node 4 didn't match.

There are three constructors in ElementNameAndAttributeQualifier. The no-arg constructor creates an instance that compares all attributes while the others will compare a single attribute or a given subset of all attributes.

3.4.3. ElementNameAndTextQualifier

Only Elements with the same name - and Namespace URI if present - as well as the same text content nested into them qualify.

In Example 19, “Example Nodes for ElementQualifier (the comments are not part of the example)” this means control node 1 will be compared to test node 2 since they both don't have any nested text at all. Then control node 2 will be compared to test node 4. Control node 3 is compared to test node 3 since they have the same nested text and control node 4 to test node 4.

3.4.4. org.custommonkey.xmlunit.examples.RecursiveElementNameAndTextQualifier

All ElementQualifiers seen so far only looked at the Elements themselves and not at the structure nested into them at a deeper level. A frequent user question has been which ElementQualifier should be used if the pieces of XML in Example 20, “Example for RecursiveElementNameAndTextQualifier (the comments are not part of the example)” should be considered similar.

Example 20. Example for RecursiveElementNameAndTextQualifier (the comments are not part of the example)

<!-- control -->
<table>
  <tr>            <!-- control row 1 -->
    <td>foo</td>
  </tr>
  <tr>            <!-- control row 2 -->
    <td>bar</td>
  </tr>
</table>

<!-- test -->
<table>
  <tr>            <!-- test row 1 -->
    <td>bar</td>
  </tr>
  <tr>            <!-- test row 2 -->
    <td>foo</td>
  </tr>
</table>

At first glance ElementNameAndTextQualifier should work but it doesn't. When DifferenceEngine processed the children of table it would compare control row 1 to test row 1 since both tr elements have the same name and both have no textual content at all.

What is needed in this case is an ElementQualifier that looks at the element's name, as well as the name of the first child element and the text nested into that first child element. This is what RecursiveElementNameAndTextQualifier does.

RecursiveElementNameAndTextQualifier ignores whitespace between the elements leading up to the nested text.

3.4.5. org.custommonkey.xmlunit.examples.MultiLevelElementNameAndTextQualifier

MultiLevelElementNameAndTextQualifier has in a way been the predecessor of Section 3.4.4, “org.custommonkey.xmlunit.examples.RecursiveElementNameAndTextQualifier. It also matches element names and those of nested child elements until it finds matches, but unlike RecursiveElementNameAndTextQualifier, you must tell MultiLevelElementNameAndTextQualifier at which nesting level it should expect the nested text.

MultiLevelElementNameAndTextQualifier's constructor expects a single argument which is the nesting level of the expected text. If you use an argument of 1, MultiLevelElementNameAndTextQualifier is identical to ElementNameAndTextQualifier. In Example 20, “Example for RecursiveElementNameAndTextQualifier (the comments are not part of the example)” a value of 2 would be needed.

By default MultiLevelElementNameAndTextQualifier will not ignore whitespace between the elements leading up to the nested text. If your piece of XML contains this sort of whitespace (like Example 20, “Example for RecursiveElementNameAndTextQualifier (the comments are not part of the example)” which contains a newline and several space characters between <tr> and <td>) you can either instruct XMLUnit to ignore whitespace completely (see Section 3.8.1, “Whitespace Handling”) or use the two-arg constructor of MultiLevelElementNameAndTextQualifier introduced with XMLUnit 1.2 and set the ignoreEmptyTexts argument to true.

In general RecursiveElementNameAndTextQualifier requires less knowledge upfront and its whitespace-handling is more intuitive.

3.5. Diff and DetailedDiff

Diff and DetailedDiff provide simplified access to DifferenceEngine by implementing the ComparisonController and DifferenceListener interfaces themselves. They cover the two most common use cases for comparing two pieces of XML: checking whether the pieces are different (this is what Diff does) and finding all differences between them (this is what DetailedDiff does).

DetailedDiff is a subclass of Diff and can only be constructed by creating a Diff instance first.

The major difference between them is their implementation of the ComparisonController interface: DetailedDiff will never stop the comparison since it wants to collect all differences. Diff in turn will halt the comparison as soon as the first Difference is found that is not recoverable. In addition DetailedDiff collects all Differences in a list and provides access to it.

By default Diff will consider two pieces of XML as identical if no differences have been found at all, similar if all differences that have been found have been recoverable (see Table 1, “Document level Differences detected by DifferenceEngine to Table 4, “Other Differences detected by DifferenceEngine) and different as soon as any non-recoverable difference has been found.

It is possible to specify a DifferenceListener to Diff using the overrideDifferenceListener method. In this case each Difference will be evaluated by the passed in DifferenceListener. By returning RETURN_IGNORE_DIFFERENCE_NODES_IDENTICAL the custom listener can make Diff ignore the difference completely. Likewise any Difference for which the custom listener returns RETURN_IGNORE_DIFFERENCE_NODES_SIMILAR will be treated as if the Difference was recoverable.

There are several overloads of the Diff constructor that allow you to specify your piece of XML in many ways. There are overloads that accept additional DifferenceEngine and ElementQualifier arguments. Passing in a DifferenceEngine of your own is the only way to use a ComparisonController other than Diff.

Note that Diff and DetailedDiff use ElementNameQualifier as their default ElementQualifier. This is different from DifferenceEngine which defaults to no ElementQualifier at all.

To use a custom ElementQualifier you can also use the overrideElementQualifier method. Use this with an argument of null to unset the default ElementQualifier as well.

To compare two pieces of XML you'd create a Diff instance from those two pieces and invoke identical to check that there have been no differences at all and similar to check that any difference, if any, has been recoverable. If the pieces are identical they are also similar. Likewise if they are not similar they can't be identical either.

Example 21. Comparing Two Pieces of XML Using Diff

Diff d = new Diff("<a><b/><c/></a>", "<a><c/><b/></a>");
assertFalse(d.identical()); // CHILD_NODELIST_SEQUENCE Difference
assertTrue(d.similar());

The result of the comparison is cached in Diff, repeated invocations of identical or similar will not reevaluate the pieces of XML.

Note: calling toString on an instance of Diff or DetailedDiff will perform the comparision and cache its result immediately. If you change the DifferenceListener or ElementQualifier after calling toString it won't have any effect.

DetailedDiff provides only a single constructor that expects a Diff as argument. Don't use DetailedDiff if all you need to know is whether two pieces of XML are identical/similar - use Diff directly since its short-cut ComparisonController implementation will save time in this case.

Example 22. Finding All Differences Using DetailedDiff

Diff d = new Diff("<a><b/><c/></a>", "<a><c/><b/></a>");
DetailedDiff dd = new DetailedDiff(d);
dd.overrideElementQualifier(null);
assertFalse(dd.similar());
List l = dd.getAllDifferences();
assertEquals(2, l.size()); // expected <b/> but was <c/> and vice versa

3.6. MatchTracker

Sometimes you might be interested in any sort of comparison result and want to get notified of successful matches as well. Maybe you want to provide feedback on the amount of differences and similarities between two documents, for example.

The interface MatchTracker can be implemented to get notified on each and every successful match, note that there may be a lot more comparisons going on than you might expect and that your callback gets notified a lot.

Example 23. The MatchTracker interface

package org.custommonkey.xmlunit;

/**
 * Listener for callbacks from a {@link DifferenceEngine#compare
 * DifferenceEngine comparison} that is notified on each and every
 * comparision that resulted in a match.
 */
public interface MatchTracker {
    /**
     * Receive notification that 2 match.
     * @param match a Difference instance as defined in {@link
     * DifferenceConstants DifferenceConstants} describing the test
     * that matched and containing the detail of the nodes that have
     * been compared
     */
    void matchFound(Difference difference);
}

Despite its name the Difference instance passed into the matchFound method really describes a match and not a difference. You can expect that the getValue method on both the control and the test NodeDetail will be equal.

DifferenceEngine provides a constructor overload that allows you to pass in a MatchTracker instance and also provides a setMatchTracker method. Diff and DetailedDiff provide overrideMatchTracker methods that fill the same purpose.

Note that your MatchTracker won't receive any callbacks once the configured ComparisonController has decided that DifferenceEngine should halt the comparison.

3.7. JUnit 3.x Convenience Methods

XMLAssert and XMLTestCase contain quite a few overloads of methods for comparing two pieces of XML.

The method's names use the word Equal to mean the same as similar in the Diff class (or throughout this guide). So assertXMLEqual will assert that only recoverable differences have been encountered where assertXMLNotEqual asserts that some differences have been non-recoverable. assertXMLIdentical asserts that there haven't been any differences at all while assertXMLNotIdentical asserts that there have been differences (recoverable or not).

Most of the overloads of assertXMLEqual just provide different means to specify the pieces of XML as Strings, InputSources, Readers[7] or Documents. For each method there is a version that takes an additional err argument which is used to create the message if the assertion fails.

If you don't need any control over the ElementQualifier or DifferenceListener used by Diff these methods will save some boilerplate code. If CONTROL and TEST are pieces of XML represented as one of the supported inputs then

Diff d = new Diff(CONTROL, TEST);
assertTrue("expected pieces to be similar, " + d.toString(),
           d.similar());

and

assertXMLEqual("expected pieces to be similar", CONTROL, TEST);

are equivalent.

If you need more control over the Diff instance there is a version of assertXMLEqual (and assertXMLIdentical) that accepts a Diff instance as its argument as well as a boolean indicating whether you expect the Diff to be similar (identical) or not.

XMLTestCase contains a couple of compareXML methods that really are only shortcuts to Diff's constructors.

There is no way to use DifferenceEngine or DetailedDiff directly via the convenience methods.

3.8. Configuration Options

Unless you are using Document or DOMSource overrides when specifying your pieces of XML, XMLUnit will use the configured XML parsers (see Section 2.4.1, “JAXP”) and EntityResolvers (see Section 2.4.2, “EntityResolver). There are configuration options to use different settings for the control and test pieces of XML.

In addition some of the other configuration settings may lead to XMLUnit using the configured XSLT transformer (see Section 2.4.1, “JAXP”) under the covers.

3.8.1. Whitespace Handling

Two different configuration options affect how XMLUnit treats whitespace in comparisons:

  • Element Content Whitespace (see Section 2.4.3, “Element Content Whitespace”)

    If XMLUnit has been configured to ignore element content whitespace it will trim any text nodes found by the parser. This means that there won't appear to be any textual content in element <foo> for the following example. If you don't set XMLUnit.setIgnoreWhitespace there would be textual content consisting of a new line character.

    <foo>
    </foo>
    

    At the same time the following two <foo> elements will be considered identical if the option has been enabled, though.

    <foo>bar</foo>
    <foo> bar </foo>
    

    When this option is set to true, Diff will use the XSLT transformer under the covers.

  • "Normalizing" Whitespace

    If you set XMLUnit.setNormalizeWhitespace to true then XMLUnit will replace any kind of whitespace found in character content with a SPACE character and collapse consecutive whitespace characters to a single SPACE. It will also trim the resulting character content on both ends.

    The following two <foo> elements will be considered identical if the option has been set:

    <foo>bar baz</foo>
    <foo> bar
                baz</foo>
    

    Note that this is not related to "normalizing" the document as a whole (see Section 3.8.2, “"Normalizing" Documents”).

3.8.2. "Normalizing" Documents

"Normalize" in this context corresponds to the normalize method in DOM's Document class. It is the process of merging adjacent Text nodes and is not related to "normalizing whitespace" as described in the previous section.

Usually you don't need to care about this option since the XML parser is required to normalize the Document when creating it. The only reason you may want to change the option via XMLUnit.setNormalize is that your Document instances have not been created by an XML parser but rather been put together in memory using the DOM API directly.

3.8.3. Ignoring Comments

Using XMLUnit.setIgnoreComments you can make XMLUnit's difference engine ignore comments completely.

When this option is set to true, Diff will use the XSLT transformer under the covers.

3.8.4. Treating CDATA Sections and Text Nodes Alike

It is not always necessary to know whether a text has been put into a CDATA section or not. Using XMLUnit.setIgnoreDiffBetweenTextAndCDATA you can make XMLUnit consider the following two pieces of XML identical:

<foo>&lt;bar&gt;</foo>
<foo><![CDATA[<bar>]]></foo>

3.8.5. Entity Reference Expansion

Normally the XML parser will expand character references to their Unicode equivalents but for more complex entity definitions the parser may expand them or not. Using XMLUnit.setExpandEntityReferences you can control the parser's setting.

3.8.6. Comparison of Unmatched Elements

When XMLUnit cannot match a control Element to a test Element (the configured ElementQualifier - see Section 3.4, “ElementQualifier - doesn't return true for any of the test Elements) it will try to compare it against the first unmatched test Element (if there is one). Starting with XMLUnit 1.3 one can use XMLUnit.setCompareUnmatched to disable this behavior and generate CHILD_NODE_NOT_FOUND differences instead.

If the control document is

<root>
  <a/>
</root>

and the test document is

<root>
  <b/>
</root>

the default setting will create a single ELEMENT_TAG_NAME Difference ("expected a but found b"). Setting XMLUnit.setCompareUnmatched to false will create two Differences of type CHILD_NODE_NOT_FOUND (one for "a" and one for "b") instead.



[7] See Section 2.5, “Providing Input to XMLUnit” for some advice on choosing your input format.