p:load (3.0)

Loads a document.

Summary
Description
- Determining the content-type
Examples
- Basic usage
Additional details
Errors raised
Reference information

Summary

<p:declare-step type="p:load">
  <output port="result" primary="true" content-types="any" sequence="false"/>
  <option name="href" as="xs:anyURI" required="true"/>
  <option name="content-type" as="xs:string?" required="false" select="()"/>
  <option name="document-properties" as="map(xs:QName, item()*)?" required="false" select="()"/>
  <option name="parameters" as="map(xs:QName, item()*)?" required="false" select="()"/>
</p:declare-step>

The p:load loads a document indicated by a URI and returns this on its result port.

Ports:

Port	Type	Primary?	Content types	Seq?	Description
`result`	`output`	`true`	`any`	`false`	The loaded document.

Options:

Name	Type	Req?	Default	Description
`href`	`xs:anyURI`	`true`		The URI for loading the document. In most cases, `p:load` will be used to load a file from disk. An absolute URI for this must start with `file://`. For instance, on Windows, `file:///C:/some/path/document.xml` (although Windows uses backslashes (`\`) to separate path components, slashes (`/`) work fine and are more universal). Using a single slash after `file:` also works: `file:/C:/some/path/document.xml`. If this value is relative, it is resolved against the base URI of the element on which this option is specified. In most cases this will be the static base URI of your pipeline (the path where the XProc source containing the `p:load` step is stored).
`content-type`	`xs:string?`	`false`	`()`	The content-type of the document to load, for instance `text/plain` or `application/json`. The document is interpreted according to this. If this option is not present, the content-type is determined as described in Determining the content-type.
`document-properties`	`map(xs:QName, item()*)?`	`false`	`()`	Any document-properties for the loaded document.
`parameters`	`map(xs:QName, item()*)?`	`false`	`()`	Parameters controlling the loading of the document. Some keys and values are determined by the type of document loaded (see below). Any additional parameters are implementation-defined and therefore dependent on the XProc processor used.

Description

The p:load step is one of the few that has no source port. It is used to load some document from disk, the web or elsewhere, and returns this document on its result port. XProc must know what kind of document it is loading, the mechanism for this is described in Determining the content-type. It is also possible to set document-properties.

What exactly happens depends on the loaded document’s content-type:

For an XML document-type, the document is loaded and interpreted (de-serialized) as XML.
There is one pre-defined parameter for the parameters option: dtd-validate (xs:boolean). If true, DTD validation must be performed when parsing the document.
Text document-types are loaded “as-is”.
For a JSON document-type, the document is loaded and interpreted (de-serialized) as JSON.
The parameters option recognizes the parsing options as defined for the XPath parse-json() function (the $options argument).
For an HTML document-type, the document is loaded and parsed into well-formed XML, even although HTML documents do not have to be well-formed. How this is done exactly is implementation-defined and therefore dependent on the XProc processor used.
For any other document-type, the document is loaded as a binary document.

There are many ways to load a document into an XProc pipeline. For instance, you could use the href attribute of <p:with-input>, or use its <p:document> child element. The <p:document> element is even defined as having the same functionality as p:load, so there’s no difference in functionality.

Why then p:load? Its main raison d’être is probably as left-over from the XProc 1.0 days. Using a p:load in XProc 1.0 was the only way to dynamically load a document, for instance when you had computed its filename. In recent versions, using AVTs, this is no longer a problem: <p:with-input href="{$filename}"/>.

The main reason for using p:load probably comes from software engineering: it makes it very explicit in your code what you’re doing, an explicit p:load stands out more than a nested <p:document>. Whether this is reason enough is up to you.

Determining the content-type

When a document is loaded, p:load must know its content-type. This is determined as follows:

When a content-type option is specified, this is used.
If a protocol is used that specifies/returns a content-type, this is used. This is for instance the case when loading documents over HTTP(S).
If no explicit type information was found, determining the content-type is implementation-defined and therefore dependent on the XProc processor used.
When loading a document from disk (using the file:// protocol), in most cases, the XProc processor determines the content-type based on the filename extension. So a .xml file will become XML, .txt text, etc. What extensions are mapped to what content-type is, again, implementation-defined. However, you can be reasonably sure the most common extensions are interpreted correctly.

Examples

Basic usage

Assume there is an XML document (in the same location as the pipeline) called extra.xml with the following contents:

<extras>
  <extra>This is nice!</extra>
</extras>

The most simple pipeline that uses p:load to load this document is:

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0">

  <p:output port="result"/>

  <p:load href="extra.xml"/>

</p:declare-step>

Result document:

<extras>
   <extra>This is nice!</extra>
</extras>

Additional details

With regard to the document-properties of the loaded document:
- The content-type document-property is the content-type of the loaded document. See also Determining the content-type.
- The base-uri document-property is, in most cases, the URI the document is loaded from, as indicated by the href option.
  However, the document-properties option might also contain a base-uri entry. If so, the value in the document-properties option is used.
A content-type can be specified using the content-type option and as en entry in the document-properties option map. If both are present they must be the same. If not, error XD0062 is raised.

Errors raised

Error code	Description
`XD0011`	It is a dynamic error if the resource referenced by the `href` option does not exist, cannot be accessed or is not a file.
`XD0023`	It is a dynamic error if a DTD validation is performed and either the document is not valid or no DTD is found.
`XD0043`	It is a dynamic error if the `dtd-validate` parameter is `true` and the processor does not support DTD validation.
`XD0049`	It is a dynamic error if the text value is not a well-formed XML document
`XD0057`	It is a dynamic error if the text document does not conform to the JSON grammar, unless the parameter liberal is true and the processor chooses to accept the deviation.
`XD0058`	It is a dynamic error if the parameter duplicates is reject and the text document contains a JSON object with duplicate keys.
`XD0059`	It is a dynamic error if the parameter map contains an entry whose key is defined in the specification of `fn:parse-json` and whose value is not valid for that key, or if it contains an entry with the key fallback when the parameter `escape` with `true()` is also present.
`XD0060`	It is a dynamic error if the text document can not be converted into the XPath data model
`XD0062`	It is a dynamic error if the `@content-type` is specified and the document-properties has a “`content-type`” that is not the same.
`XD0064`	It is a dynamic error if the base URI is not both absolute and valid according to RFC 3986 .
`XD0078`	It is a dynamic error if the loaded document cannot be represented as an HTML document in the XPath data model.
`XD0079`	It is a dynamic error if a supplied content-type is not a valid media type of the form “`type/subtype+ext`” or “`type/subtype`”.

Reference information

This description of the p:load step is for XProc version: 3.0. This is a required step (an XProc 3.0 processor must support this).

The formal specification for the p:load step can be found here.

The p:load step is part of categories:

The p:load step is also present in version: 3.1.