Loads a document.
<p:declare-step type="p:load"> <output port="result" primary="true" content-types="any" sequence="false"/> <option name="href" as="xs:anyURI" required="true"/> <option name="content-type" as="xs:string?" required="false" select="()"/> <option name="document-properties" as="map(xs:QName, item()*)?" required="false" select="()"/> <option name="parameters" as="map(xs:QName,item()*)?" required="false" select="()"/> </p:declare-step>
The p:load
loads a document indicated by a URI and returns this on its result
port.
Ports:
Port | Type | Primary? | Content types | Seq? | Description |
---|---|---|---|---|---|
|
|
|
|
| The loaded document. |
Options:
The p:load
step is one of the few that has no source
port. It is used to load some document from disk, the web or elsewhere,
and returns this document on its result
port. XProc must know what kind of document it is loading, the mechanism for this is
described in Determining the content-type. It is also possible to set document-properties.
What exactly happens depends on the loaded document’s content-type:
For an XML document-type, the document is loaded and interpreted (de-serialized) as XML.
There is one pre-defined parameter for the parameters
option: dtd-validate
(xs:boolean
). If
true
, DTD validation must be performed when parsing the document.
Text document-types are loaded “as-is”.
For a JSON document-type, the document is loaded and interpreted (de-serialized) as JSON.
The parameters
option recognizes the parsing options as defined for the XPath parse-json()
function (the
$options
argument).
For an HTML document-type, the document is loaded and parsed into well-formed XML, even although HTML documents do not have to be well-formed. How this is done exactly is implementation-defined and therefore dependent on the XProc processor used.
For any other document-type, the document is loaded as a binary document.
There are many ways to load a document into an XProc pipeline. For instance, you could use the href
attribute of
<p:with-input>
, or use its <p:document>
child element. The <p:document>
element is even
defined as having the same functionality as p:load
, so there’s no difference in functionality.
Why then p:load
? Its main raison d’être is probably as left-over from the XProc 1.0 days. Using a p:load
in XProc 1.0 was
the only way to dynamically load a document, for instance when you had computed its filename. In recent versions, using AVTs, this is no longer
a problem: <p:with-input href="{$filename}"/>
.
The main reason for using p:load
probably comes from software engineering: it makes it very explicit in your code what you’re doing,
an explicit p:load
stands out more than a nested <p:document>
. Whether this is reason enough is up to you.
When a document is loaded, p:load
must know its content-type. This is determined as follows:
When a content-type
option is specified, this is used.
If a protocol is used that specifies/returns a content-type, this is used. This is for instance the case when loading documents over HTTP(S).
If no explicit type information was found, determining the content-type is implementation-defined and therefore dependent on the XProc processor used.
When loading a document from disk (using the file://
protocol), in most cases, the XProc processor determines the
content-type based on the filename extension. So a .xml
file will become XML, .txt
text, etc. What extensions
are mapped to what content-type is, again, implementation-defined. However, you can be reasonably sure the most common extensions are
interpreted correctly.
Assume there is an XML document (in the same location as the pipeline) called extra.xml
with the following contents:
<extras> <extra>This is nice!</extra> </extras>
The most simple pipeline that uses p:load
to load this document is:
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0"> <p:output port="result"/> <p:load href="extra.xml"/> </p:declare-step>
Result document:
<extras> <extra>This is nice!</extra> </extras>
With regard to the document-properties of the loaded document:
The content-type
document-property is the content-type of the loaded document. See also Determining the content-type.
The base-uri
document-property is, in most cases, the URI the document is loaded from, as indicated by the
href
option.
However, the document-properties
option might also contain a base-uri
entry. If so, the value in the
document-properties
option is used.
A content-type can be specified using the content-type
option and as en entry in the document-properties
option map. If both are present they must be the same. If not, error XD0062
is raised.
Error code | Description |
---|---|
It is a dynamic error if the resource referenced by the | |
It is a dynamic error if a DTD validation is performed and either the document is not valid or no DTD is found. | |
It is a dynamic error if the | |
It is a dynamic error if the text value is not a well-formed XML document | |
It is a dynamic error if the text document does not conform to the JSON grammar, unless the parameter liberal is true and the processor chooses to accept the deviation. | |
It is a dynamic error if the parameter duplicates is reject and the text document contains a JSON object with duplicate keys. | |
It is a dynamic error if the parameter map contains an entry whose key is defined in the specification of
| |
It is a dynamic error if the text document can not be converted into the XPath data model | |
It is a dynamic error if the | |
It is a dynamic error if the base URI is not both absolute and valid according to RFC 3986 . | |
It is a dynamic error if the loaded document cannot be represented as an HTML document in the XPath data model. | |
It is a dynamic error if a supplied content-type is not a valid media type of the form “ |
This description of the p:load
step is for XProc version: 3.1. This is a required step (an XProc 3.1 processor must support this).
The formal specification for the p:load
step can be found here.
The p:load
step is part of categories:
The p:load
step is also present in version:
3.0.