p:validate-with-dtd (3.1) 

Validates a document using a DTD.

Summary

<p:declare-step type="p:validate-with-dtd">
  <input port="source" primary="true" content-types="xml html" sequence="false"/>
  <output port="result" primary="true" content-types="xml html" sequence="false"/>
  <output port="report" primary="false" content-types="xml json" sequence="true"/>
  <option name="assert-valid" as="xs:boolean" required="false" select="true()"/>
  <option name="report-format" as="xs:string" required="false" select="'xvrl'"/>
  <option name="serialization" as="map(xs:QName,item()*)?" required="false" select="()"/>
</p:declare-step>

The p:validate-with-dtd step validates the document appearing on the source port using DTD (Document Type Definition) validation. The result port emits a copy of the source document, possibly augmented.

Ports:

Port

Type

Primary?

Content types

Seq?

Description

source

input

true

xml html

false

The document to validate.

result

output

true

xml html

false

A copy of the document that appeared on the source port. If validation was successful, the output may have been augmented by the DTD. (For example, default attributes may have been added).

report

output

false

xml json

true

A report that describes the validation results, both for valid and invalid source documents. The format for this report is determined by the report-format option.

When the assert-valid option is true and the document is invalid, nothing will appear on this port because error XC0210 is raised.

Options:

Name

Type

Req?

Default

Description

assert-valid

xs:boolean

false

true

Determines what happens if the document is invalid:

  • If true, error XC0210 is raised.

  • If false, the step always succeeds. The validity of the document must be determined by inspecting the document that appears on the report port.

report-format

xs:string

false

xvrl

The format for the document on the report port. The value xvrl (default) will always work: the report will be in XVRL (Extensible Validation Report Language).

Whether any other formats are supported is implementation-defined and therefore dependent on the XProc processor used.

serialization

map(xs:QName,item()*)?

false

()

This option can supply a map with serialization properties for serializing the document on the source port, before it is re-parsed for validation (see the description for an explanation).

If the source document has a serialization document-property, the two sets of serialization properties are merged (properties in the document-property have precedence).

Description

The p:validate-with-dtd step validates the document appearing on the source port using DTD (Document Type Definition) validation. This works a little differently than the other validation techniques: validation takes place by first serializing the document (as if written to disk) and subsequently re-parse it using a validating XML parser. The DTD (or a link to it) must be supplied by the source document itself or by the serialization process.

The serialization options (whether provide by the serialization document-property or the serialization option) must include at least a doctype-system property. Without a system identifier, the document cannot be successfully parsed with a validating parser.

Examples

Basic usage (valid source document)

Assume we have an input document, called input-valid.xml, that looks like this:

<address>
  <first>Douglas</first>
  <last>Adams</last>
  <phone>42</phone>
</address>

We want to validate this document using the following DTD, called example.dtd:

<!ELEMENT address (first, last, phone)>
<!ATTLIST address type CDATA #IMPLIED>
<!ELEMENT first (#PCDATA)>
<!ELEMENT last (#PCDATA)>
<!ELEMENT phone (#PCDATA)>

To perform this validation using the p:validate-with-dtd step, we need to link the DTD to the document using the doctype-system serialization property. The output of the example is what is returned on the report port.

Pipeline document:

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0">

  <p:input port="source"/>
  <p:output port="result" pipe="report@validate"/>

  <p:validate-with-dtd serialization="map{'doctype-system' : 'example.dtd'}" name="validate"/>

</p:declare-step>

Result document:

<report xmlns="http://www.xproc.org/ns/xvrl">
   <metadata>
      <timestamp>2025-02-06T10:45:02.29+01:00</timestamp>
      <document href="file:/…/…/input-valid.xml"/>
      <validator name="org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser"/>
   </metadata>
</report>

Basic usage (invalid source document)

Using the same DTD as in Basic usage (valid source document), we’re now going to validate an invalid document (called input-invalid.xml). Since we want to have a look at what comes out of the report port, we have to set the assert-valid option to false.

<address>
  <FIRST>Douglas</FIRST>
  <last>Adams</last>
  <phone>42</phone>
</address>

Performing this validation using the p:validate-with-dtd step returns the following on the report port:

Pipeline document:

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0">

  <p:input port="source"/>
  <p:output port="result" pipe="report@validate"/>

  <p:validate-with-dtd assert-valid="false" serialization="map{'doctype-system' : 'example.dtd'}" name="validate"/>

</p:declare-step>

Result document:

<report xmlns="http://www.xproc.org/ns/xvrl">
   <metadata>
      <timestamp>2025-02-06T10:45:02.62+01:00</timestamp>
      <document href="file:/…/…/input-invalid.xml"/>
      <validator name="org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser"/>
   </metadata>
   <detection severity="fatal-error">
      <message>SAXParseException: Element type "FIRST" must be declared.</message>
   </detection>
</report>

Additional details

  • If validation fails (and assert-valid is false), all document-properties on the source port are preserved on the result port. If validation succeeds, only the base-uri and serialization document-properties are preserved, the content-type document-property will be application/xml.

  • The document appearing on the report port only has a content-type property. It has no other document-properties (also no base-uri).

Errors raised

Error code

Description

XC0210

It is a dynamic error if the assert-valid option on <p:validate-with-dtd> is true and the input document is not valid.

Reference information

This description of the p:validate-with-dtd step is for XProc version: 3.1. This is a non-required step (an XProc 3.1 processor does not have to support this).

The formal specification for the p:validate-with-dtd step can be found here.

The p:validate-with-dtd step is part of categories: