p:validate-with-xml-schema (3.0) 

Validates a document using XML Schema.

Summary

<p:declare-step type="p:validate-with-xml-schema">
  <input port="source" primary="true" content-types="xml html" sequence="false"/>
  <output port="result" primary="true" content-types="xml html" sequence="false"/>
  <input port="schema" primary="false" content-types="xml" sequence="true"/>
  <output port="report" primary="false" content-types="xml json" sequence="true"/>
  <option name="assert-valid" as="xs:boolean" required="false" select="true()"/>
  <option name="mode" as="item()*" required="false" select="'strict'" values="('strict','lax')"/>
  <option name="parameters" as="map(xs:QName, item()*)?" required="false" select="()"/>
  <option name="report-format" as="xs:string" required="false" select="'xvrl'"/>
  <option name="try-namespaces" as="xs:boolean" required="false" select="false()"/>
  <option name="use-location-hints" as="xs:boolean" required="false" select="false()"/>
  <option name="version" as="xs:string?" required="false" select="()"/>
</p:declare-step>

The p:validate-with-xml-schema step validates the document appearing on the source port using XML Schema validation. The most common way to provide a schema is through its schema port. The result port emits a copy of the source document with default attributes/elements filled in and (optional) PSVI annotations.

Ports:

Port

Type

Primary?

Content types

Seq?

Description

source

input

true

xml html

false

The document to validate.

result

output

true

xml html

false

The document that appeared on the source port with the following alterations (see also the XML Schema recommendation):

  • If the XProc processor supports PSVI (Post-Schema-Validation-Infoset) annotations:

    • The document is valid: the source document with PSVI annotations and any defaulting of attributes and elements filled in.

    • The document is invalid and the assert-valid option is false: the source document with maybe some PSVI annotations (at least for the sub-trees that are valid).

  • If PSVI annotations are not supported by the XProc processor used:

    • The document is valid: the source document with any defaulting of attributes and elements filled in.

    • The document is invalid and the assert-valid option is false: the source document, unchanged.

When the assert-valid option is true and the document is invalid, nothing will appear on this port because error XC0156 is raised.

schema

input

false

xml

true

Schema(s) to validate against. Providing a schema (or more than one) on this port is the most common way of supplying schemas to the step. There are other ways to provide schemas, see Locating schemas for more information.

report

output

false

xml json

true

A report that describes the validation results, both for valid and invalid source documents. The format for this report is determined by the report-format option.

When the assert-valid option is true and the document is invalid, nothing will appear on this port because error XC0156 is raised.

Options:

Name

Type

Req?

Default

Description

assert-valid

xs:boolean

false

true

Determines what happens if the document is invalid:

  • If true, error XC0156 is raised.

  • If false, the step always succeeds. The validity of the document must be determined by inspecting the document that appears on the report port.

mode

item()*

false

strict

This option controls how the schema validation starts:

  • Setting this to strict means that the document element must be declared and schema-valid, otherwise it will be treated as invalid.

  • Setting this to lax means that the absence of a declaration for the document element does not itself count as an unsuccessful outcome of validation. See Validating in lax mode for an example.

parameters

map(xs:QName, item()*)?

false

()

Parameters controlling the validation. See Validation parameters for more information.

report-format

xs:string

false

xvrl

The format for the document on the report port. The value xvrl (default) will always work: the report will be in XVRL (Extensible Validation Report Language).

Whether any other formats are supported is implementation-defined and therefore dependent on the XProc processor used.

try-namespaces

xs:boolean

false

false

Whether to try to dereference any namespace URIs in the source document for locating schemas. See Locating schemas for more information.

use-location-hints

xs:boolean

false

false

Determines what to do with schema location hints in the source document. See Locating schemas for more information.

version

xs:string?

false

()

If this option is set, the specified version of XML Schema must be used for validation. Likely values are 1.0 or 1.1. Which XML Schema versions are supported is implementation-defined and therefore dependent on the XProc processor used. In all likelihood, version 1.0 will always be supported.

If this option is not set, the XML schema version use and therefore dependent on the XProc processor used. For instance, it might be simply 1.0, or the XProc processor might take a look at the XML schema itself to determine the version.

Description

The p:validate-with-xml-schema step validates the document appearing on the source against one or more W3C XML Schema(s).

The schema(s) used for validation can be provided in several ways. Probably the most common way is to provide them on the schema port. Another likely way to provide schemas is using schema references in the source document. If you want the p:validate-with-xml-schema step to do this, you must set the use-location-hint option to true. For more information about providing schemas see the Locating schemas section below.

The outcome of the step, what appears on the result port, is a copy of the source document with a few alterations. If the document is valid all default attributes and elements will be filled in. If the processor supports PSVI annotations (as described in the XML Schema recommendation) these will be present to. For details see the description of the result port.

Locating schemas

One or more schemas can be provided on the schema port. But it is also possible the document on the source port contains schema references on its own, for instance an xsi:schemaLocation attribute. So which schema(s) should the step use for validation? The rules are as follows:

  • If documents are provided on the schema port, these will be used. For most use-cases, this is the preferred way of providing the schema(s).

  • If there are no schemas supplied on the schema port:

    • If the use-location-hint option is true, the XProc processor will have a look at schema references in the source document. Which location hints it will recognize as such is implementation-defined and therefore dependent on the XProc processor used. However, most probably, the xsi:noNamespaceSchemaLocation and xsi:schemaLocation attributes should do the trick (the xsi namespace prefix here is bound to the http://www.w3.org/2001/XMLSchema-instance namespace). See Using location hints for an example.

      If the use-location-hint option is false (default), schema references in the source document are ignored.

    • If the try-namespaces option is true, the XProc processor will try to retrieve the schema for a namespace using the namespace URI. So if we have a document in the http://www.something.org/ns/documents namespace, the XProc processor will perform an HTTP GET request on this URI. If this returns a valid XML schema, the show is on. Some implementations might also be able to handle RDDL documents that refer to schemas.

      If the try-namespaces option is false (default) no attempt like this will be made.

Validation parameters

The p:validate-with-xml-schema step has a parameters port of datatype map(xs:QName, item()*)?. This (optional) map passes additional parameters for the validation process to the step:

  • The parameters in this map, their values and semantics are implementation-defined and therefore dependent on the XProc processor used.

  • A special entry with key c:compile (the c namespace prefix is bound to the standard XProc namespace http://www.w3.org/ns/xproc-step) is reserved for parameters for the schema compilation (if applicable). The value of this key must be a map itself.

  • If the report-format option is set to xvrl (default): Any entries with keys in the xvrl namespace (http://www.xproc.org/ns/xvrl) are passed as parameters to the process that generates the XVRL report appearing on the report port. All standard XVRL generation parameters are supported.

Examples

Basic usage (valid source document)

We’re going to use a schema, that validates simple XML documents, consisting of a <things> root element and zero or more <thing> children. The root element has an optional attribute called status with default value normal.

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
  <xs:element name="things">
    <xs:complexType>
      <xs:sequence>
        <xs:element maxOccurs="unbounded" minOccurs="0" name="thing" type="xs:string"/>
      </xs:sequence>
      <xs:attribute default="normal" name="status" type="xs:string"/>
    </xs:complexType>
  </xs:element>
</xs:schema>

Let’s use this schema to validate a valid document (called input-valid.xml) and see what comes out of the result port:

<things>
   <thing>A thing...</thing>
   <thing>Another thing...</thing>
</things>

Pipeline document:

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0">

  <p:input port="source"/>
  <p:output port="result"/>

  <p:validate-with-xml-schema>
    <p:with-input port="schema" href="example.xsd"/>
  </p:validate-with-xml-schema>

</p:declare-step>

Result document:

<things status="normal">
   <thing>A thing...</thing>
   <thing>Another thing...</thing>
</things>

Notice that the missing optional attribute status, as defined in the schema, has been added to the <things> root element, with its default value normal. This will happen to every optional attribute and/or element that is not present in the source.

 

Now let’s have a look at the XVRL report appearing on the report port (for the same, valid, source document):

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0">

  <p:input port="source"/>
  <p:output port="result" pipe="report@validate"/>

  <p:validate-with-xml-schema name="validate">
    <p:with-input port="schema" href="example.xsd"/>
  </p:validate-with-xml-schema>

</p:declare-step>

Result document:

<report xmlns="http://www.xproc.org/ns/xvrl">
   <metadata>
      <timestamp>2025-02-06T10:45:10.8+01:00</timestamp>
      <document href="file:/…/…/input-valid.xml"/>
      <schema href="file:/…/…/example.xsd"
              schematypens="http://www.w3.org/2001/XMLSchema"/>
      <validator name="org.apache.xerces.jaxp.validation.XMLSchemaFactory"/>
   </metadata>
   <digest/>
</report>

The exact format of the report might differ across implementations. Please experiment before using it.

Basic usage (invalid source document)

We’re going to use the same schema as in Basic usage (valid source document), but now provide an invalid source document (called input-invalid.xml):

<things>
   <thing>A thing...</thing>
   <thing-error>Another thing...</thing-error>
</things>

The pipeline will catch the resulting XVRL report. Please notice that we need to set the assert-valid option to false. If we had left it to its default value true, error XC0156 would have been raised.

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0">

  <p:input port="source"/>
  <p:output port="result" pipe="report@validate"/>

  <p:validate-with-xml-schema assert-valid="false" name="validate">
    <p:with-input port="schema" href="example.xsd"/>
  </p:validate-with-xml-schema>

</p:declare-step>

Result document:

<report xmlns="http://www.xproc.org/ns/xvrl">
   <metadata>
      <timestamp>2025-02-06T10:45:10.08+01:00</timestamp>
      <document href="file:/…/…/input-invalid.xml"/>
      <schema href="file:/…/…/example.xsd"
              schematypens="http://www.w3.org/2001/XMLSchema"/>
      <validator name="org.apache.xerces.jaxp.validation.XMLSchemaFactory"/>
   </metadata>
   <detection severity="error">
      <location line="3" xpath="/Q{}things[1]/Q{}thing-error[1]"/>
      <message>cvc-complex-type.2.4.a: Invalid content was found starting with element 'thing-error'. One of '{thing}' is expected.</message>
   </detection>
   <digest/>
</report>

Again, the exact format of the report might differ across implementations. Please experiment before using it.

 

Another way of handling validation errors is to have p:validate-with-xml-schema raise its error XC0156 and catch this in a <p:try>/<p:catch> construction. The following pipeline shows you the <c:errors> result, that is available inside the <p:catch>:

<p:declare-step xmlns:err="http://www.w3.org/ns/xproc-error" xmlns:p="http://www.w3.org/ns/xproc" version="3.0">

  <p:input port="source"/>
  <p:output port="result"/>

  <p:try>
    <p:validate-with-xml-schema>
      <p:with-input port="schema" href="example.xsd"/>
    </p:validate-with-xml-schema>
    <p:catch code="err:XC0156">
      <p:identity/>
    </p:catch>
  </p:try>
  
</p:declare-step>

Result document:

<c:errors xmlns:c="http://www.w3.org/ns/xproc-step">
   <c:error xmlns:err="http://www.w3.org/ns/xproc-error"
            code="err:XC0156"
            name="!1.1.1.1"
            type="p:validate-with-xml-schema"
            href="file:/…/…/…"
            line="7"
            column="33">
      <report xmlns="http://www.xproc.org/ns/xvrl">
         <metadata>
            <timestamp>2025-02-06T10:45:10.39+01:00</timestamp>
            <document href="file:/…/…/input-invalid.xml"/>
            <schema href="file:/…/…/example.xsd"
                    schematypens="http://www.w3.org/2001/XMLSchema"/>
            <validator name="org.apache.xerces.jaxp.validation.XMLSchemaFactory"/>
         </metadata>
         <detection severity="error">
            <location line="3" xpath="/Q{}things[1]/Q{}thing-error[1]"/>
            <message>cvc-complex-type.2.4.a: Invalid content was found starting with element 'thing-error'. One of '{thing}' is expected.</message>
         </detection>
         <digest/>
      </report>
   </c:error>
</c:errors>

The exact contents of the <c:errors> element might differ across implementations. Please experiment before using it.

Using location hints

Sometimes you have source documents that already contain schema references, for instance:

<things>
   <thing>A thing...</thing>
   <thing>Another thing...</thing>
</things>

If we want the p:validate-with-xml-schema step to use this reference, we have to set the try-location-hints to true. We don’t need to validate against any other schemas, so we set the schema port to empty.

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0">

  <p:input port="source"/>
  <p:output port="result"/>

  <p:validate-with-xml-schema>
    <p:with-input port="schema" href="example.xsd"/>
  </p:validate-with-xml-schema>

</p:declare-step>

Result document:

<things status="normal">
   <thing>A thing...</thing>
   <thing>Another thing...</thing>
</things>

Validating in lax mode

Usually you want a document to completely validate against a schema. However, there are use-cases where the documents to validate are wrapped inside some root element. This happens, for instance, when in XProc you have a sequence of documents and use p:wrap-sequence to wrap these results into a single XML document. The p:validate-with-xml-schema step allows you to disregard the root element and validate its child elements only by setting the mode option to lax.

Source document:

<weird-root-element>
   <things>
      <thing>A thing...</thing>
      <thing>Another thing...</thing>
   </things>
   <things/>
</weird-root-element>

Pipeline document:

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0">

  <p:input port="source"/>
  <p:output port="result"/>

  <p:validate-with-xml-schema mode="lax">
    <p:with-input port="schema" href="example.xsd"/>
  </p:validate-with-xml-schema>

</p:declare-step>

Result document:

<weird-root-element>
   <things status="normal">
      <thing>A thing...</thing>
      <thing>Another thing...</thing>
   </things>
   <things status="normal"/>
</weird-root-element>

Additional details

  • p:validate-with-xml-schema preserves all document-properties of the document appearing on its source port for the document on its result port.

  • The document appearing on the report port only has a content-type property. It has no other document-properties (also no base-uri).

  • A schema can contain <xs:include> or <xs:import> elements. It is implementation-defined, and therefore dependent on the XProc processor used, if the documents supplied on the schema port are considered when resolving these elements.

Errors raised

Error code

Description

XC0011

It is a dynamic error if the specified schema version is not available.

XC0055

It is a dynamic error if the implementation does not support the specified mode.

XC0117

It is a dynamic error if a report-format option was specified that the processor does not support.

XC0152

It is a dynamic error if the document supplied on schema port is not a valid XML schema document.

XC0156

It is a dynamic error if the assert-valid option on <p:validate-with-xml-schema> is true and the input document is not valid.

Reference information

This description of the p:validate-with-xml-schema step is for XProc version: 3.0. This is a non-required step (an XProc 3.0 processor does not have to support this).

The formal specification for the p:validate-with-xml-schema step can be found here.

The p:validate-with-xml-schema step is part of categories:

The p:validate-with-xml-schema step is also present in version: 3.1.