p:archive (3.1) 
Perform operations on archive files.
<p:declare-step type="p:archive">
<input port="source" primary="true" content-types="any" sequence="true"/>
<output port="result" primary="true" content-types="any" sequence="false"/>
<input port="archive" primary="false" content-types="any" sequence="true">
<p:empty/>
<input/>
<input port="manifest" primary="false" content-types="xml" sequence="true">
<p:empty/>
<input/>
<output port="report" primary="false" content-types="application/xml" sequence="false"/>
<option name="format" as="xs:QName" required="false" select="'zip'"/>
<option name="parameters" as="map(xs:QName, item()*)?" required="false" select="()"/>
<option name="relative-to" as="xs:anyURI?" required="false" select="()"/>
</p:declare-step>The p:archive step can perform several different operations on archive files (for instance ZIP files). The most common one will likely be
creating one, but it could also provide services like update, freshen or even merge. The resulting archive appears on its result
port.
Ports:
Port | Type | Primary? | Content types | Seq? | Description |
|---|---|---|---|---|---|
|
|
|
|
| The |
|
|
|
|
| The resulting archive. |
|
|
|
|
| Optional archives for operations like update, freshen or merge. |
|
|
|
|
| An optional manifest document that tells the step how to construct the archive. If no manifest document is provided on this port, a default manifest is constructed automatically. See The XML archive manifest document format for details. |
|
|
|
|
| A report about the archiving operation. This will be the same as the manifest, optionally amended with additional attributes and/or elements. |
Options:
The p:archive step is the Swiss army knife for handling archives. Its most common use is creating archives, but it could also be used for
operations like update, freshen or even merge.
To make all this possible, the operation of p:archive is unfortunately quite complicated. The details are below, here’s a
summary:
What’s exactly in the resulting archive is controlled using a manifest document (see The XML archive manifest document format). In such a manifest you specify the URI of the document to add and the path of this document in the archive.
A manifest of an existing archive, sometimes useful as a starting point, can be produced using the p:archive-manifest
step.
Besides the documents in the manifest you can also specify documents to add by providing these on the step’s source
port. Any document appearing on this port that is not already mentioned in the manifest is automatically added to the manifest. The path of
such a document in the resulting archive can be controlled using the relative-to option.
When adding documents to the archive, p:archive compares the base URIs in the manifest with those of the documents appearing on the
source port (the value of the base-uri document-property). If these match, the document on the
source port is added. If not, the URI in the manifest is used to load a document (usually from disk).
Archives come in many formats. The only format the p:archive step is required to handle is ZIP. However, depending on the XProc processor used,
other formats may also be processed.
An archive manifest is an XML document that specifies files to process constructing the archive. It is also used as the result format of
the p:archive-manifest step.
Its root element is <c:archive> (the c prefix here is bound to the http://www.w3.org/ns/xproc-step
namespace):
A <c:entry> element describes a single entry (a file) in the archive:
<c:entry (any other attribute)
href = xs:anyURI
name = xs:string
comment? = xs:string
content-type? = xs:string
level? = xs:string
method? = xs:string >
(any child element)*
</c:entry>
Attribute | # | Type | Description |
|---|---|---|---|
| 1 |
| The URI of the entry. This plays an important role in determining which and how files are added to the archive, see below. A relative value is made absolute against the base URI of the manifest itself. |
| 1 |
| The name of the entry. This is the path of the file within the archive. Usually this is a relative path. However, depending on how archives are constructed, an absolute path (a path starting with a
|
| ? |
| An optional comment associated with the entry. |
| ? |
| The content-type (MIME type) of the entry. The |
| ? |
| The compression level of the entry. There are no defined values, all values are XProc processor dependent. |
| ? |
| The compression method of the entry. There is only one defined value: |
p:archive algorithmThe p:archive step follows a, rather complicated, algorithm. It has two phases:
1 - Construct a complete manifest
First, the manifest (the document, if any, appearing on the manifest port) is checked and completed if necessary:
If no document appears on the manifest port, an empty manifest is created.
The base URIs of the documents appearing on the source port are compared against the list of base URIs in the manifest
(the c:entry/@href values, made absolute). If there are documents on the source port that are not in the
manifest, an entry (<c:entry> element) for this document is created:
The c:entry/@href attribute becomes the base URI of the document.
The c:entry/@name (which is the path/name of the entry in the archive) is computed against the value of the
relative-to option:
If the base URI of the document starts with the value of the relative-to option, the
c:entry/@name attribute value becomes the substring after this.
If the base URI of the document does not start with the value of the relative-to option, the
c:entry/@name attribute value becomes the path of this base URI (without a leading /).
For instance, assume the relative-to option is set to file:///some/path/. A document with base URI
file:///some/path/etc/x.txt gets a c:entry/@name attribute value etc/x.txt. A document with
base URI file:///someother/path/y.txt gets a c:entry/@name attribute value
someother/path/y.txt.
The result of all this is that we now have a manifest that has entries (<c:entry> elements) for all documents appearing on the
source port. It can also have entries for documents that are not on the source port: because such an entry
was present in the initial manifest and no matching document on the source port was found for it.
2 - Process the manifest
The now completed manifest is processed. For every entry (<c:entry> element):
If the value of the c:entry/@href attribute matches the base URI of one of the documents appearing on the
source port, this document is added to the archive.
When appropriate (for instance for XML documents), the value of its (optional) serialization document-property is
used for serializing it (convert it to text format).
For other entries, the value of the c:entry/@href attribute is used to load the file (for instance from disk if it starts
with file:/) and add it to the archive.
These documents are used “as is”: no parsing/serialization takes place.
In both cases, the value of the c:entry/@name attribute becomes the name/path of the entry in the archive. The values of the
other attributes of the <c:entry> element might also get used, but this is dependent on the XProc processor used and/or the
archive’s format.
The p:archive step is supposed to retain the order of the <c:entry> elements. This is, for instance, important when constructing an
e-book in EPUB format: this has a non-compressed entry that must be first in the archive.
When the value of the format option is absent or zip, the following applies:
The values of the c:entry/@name attributes in the manifest must be relative paths (without a leading
/).
The archive port accepts zero or one ZIP archive. If this port is empty, an empty ZIP archive is used as its default
value.
The parameters option is a map that associates parameters (the keys in the map) with values. For ZIP archives, the
following parameters can be used:
Parameter | Description |
|---|---|
| Specifies the operation to perform. It’s default value is |
| For entries that have no
|
| For entries that have no
|
The command parameter can have one of the following values:
Command | Description |
|---|---|
| The archive appearing on the
Please note that when there is no document on the |
| This behaves like the |
| This behaves like the |
| For the |
In probably most cases, the p:archive step will be used to create an archive. If you have no special requirements this is easy: simply supply
the documents for the archive on the step’s source port. The only thing you need to take into account is the name/path of
the entries in the archive: for this the relative-to option is important.
Pipeline document:
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0">
<p:input port="source" sequence="true">
<p:document href="in1.xml"/>
<p:document href="test/in2.xml"/>
</p:input>
<p:output port="result"/>
<p:variable name="relative-to" select="resolve-uri('.', static-base-uri())"/>
<p:archive relative-to="{$relative-to}"/>
<p:store href="tmp/result.zip"/>
<p:archive-manifest relative-to="{$relative-to}"/>
</p:declare-step>Result document:
<c:archive xmlns:c="http://www.w3.org/ns/xproc-step">
<c:entry name="in1.xml"
content-type="application/xml"
href="file:/…/…/in1.xml"
method="deflated"
size="91"
compressed-size="80"
time="2025-09-03T14:55:52+02:00"/>
<c:entry name="test/in2.xml"
content-type="application/xml"
href="file:/…/…/test/in2.xml"
method="deflated"
size="98"
compressed-size="84"
time="2025-09-03T14:55:52+02:00"/>
</c:archive>The pipeline’s input consists of two documents, in1.xml and test/in2.xml. Note that (because the
p:document/@href attributes have relative values) the paths to these documents are relative to the location of the pipeline
itself.
When we construct an archive we usually don’t want the full path of the files on disk in the archive also. In this case we choose
to use their relative paths against the pipeline. To achieve this we need the path (directory) where the pipeline is stored. This is done
with the expression resolve-uri('.', static-base-uri()) and stored in the relative-to variable.
We then create the archive using p:archive. The two input documents appear on its source port. We do not provide a manifest
on the manifest port, so one will get constructed automatically.
The names of the entries in the resulting archive get constructed by “subtracting” the value of the
relative-to option from the base URIs of the source documents. The results will be their relative names against the
pipeline’s location.
We store the resulting zip and, just to show you what’s inside, ask for an archive manifest using the p:archive-manifest step.
The p:archive step also has a report port that outputs the manifest of the resulting archive. So, building
on the Basic usage example, we could also have shown what’s inside the created archive like this:
Pipeline document:
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0">
<p:input port="source" sequence="true">
<p:document href="in1.xml"/>
<p:document href="test/in2.xml"/>
</p:input>
<p:output port="result" pipe="report@create-archive"/>
<p:variable name="relative-to" select="resolve-uri('.', static-base-uri())"/>
<p:archive relative-to="{$relative-to}" name="create-archive"/>
<p:store href="tmp/result.zip"/>
</p:declare-step>Result document:
<c:archive xmlns:c="http://www.w3.org/ns/xproc-step"> <c:entry href="file:/…/…/in1.xml" name="in1.xml"/> <c:entry href="file:/…/…/test/in2.xml" name="test/in2.xml"/> </c:archive>
Note that the information in the manifest is less than what p:archive-manifest produces. What exactly happens here is
implementation-defined and therefore dependent on the XProc processor used.
This example creates a manifest that references some additional file for the archive. Note that in the archive we give it a different name
than its source using the c:entry/@name attribute. When the manifest is processed, p:archive notices that test/in2.xml
is not on its source port and therefore loads it from disk.
Pipeline document:
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0" name="example">
<p:input port="source" href="in1.xml"/>
<p:output port="result"/>
<p:identity name="manifest">
<p:with-input>
<c:archive xmlns:c="http://www.w3.org/ns/xproc-step">
<c:entry name="test/extra.xml" href="test/in2.xml"/>
</c:archive>
</p:with-input>
</p:identity>
<p:variable name="relative-to" select="resolve-uri('.', static-base-uri())"/>
<p:archive relative-to="{$relative-to}">
<p:with-input pipe="source@example"/>
<p:with-input port="manifest" pipe="result@manifest"/>
</p:archive>
<p:store href="tmp/result.zip"/>
<p:archive-manifest relative-to="{$relative-to}"/>
</p:declare-step>Result document:
<c:archive xmlns:c="http://www.w3.org/ns/xproc-step">
<c:entry name="test/extra.xml"
content-type="application/xml"
href="file:/…/…/test/extra.xml"
method="deflated"
size="62"
compressed-size="49"
time="2025-02-05T13:05:59+01:00"/>
<c:entry name="in1.xml"
content-type="application/xml"
href="file:/…/…/in1.xml"
method="deflated"
size="91"
compressed-size="80"
time="2025-09-03T14:55:52+02:00"/>
</c:archive>The only document-property for the document appearing on the result port is content-type (its value
depending on the archive’s format). Note it has no base-uri document-property and no document-properties from the
document on the source or archive port survive.
Documents appearing on the source port must have a base-uri document-property. All these
base-uri document-properties must have a unique value.
A relative value for the relative-to option gets de-referenced against the base URI of the element in the pipeline it is
specified on. In most cases this will be the path of the pipeline document.
The only format this step is required to handle is ZIP. The ZIP format definition can be found here.
Error code | Description |
|---|---|
It is a dynamic error if the map | |
It is a dynamic error if the number of documents on the | |
It is a dynamic error if the format of the archive does not match the format as specified in the | |
It is a dynamic error if two or more documents appear on the | |
It is a dynamic error if the format of the archive does not match the specified format, cannot be understood, determined and/or processed. | |
It is a dynamic error if the document on port | |
It is a dynamic error if more than one document appears on the port | |
It is a dynamic error if an archive manifest is invalid according to the specification. | |
It is a dynamic error if the resource referenced by the | |
It is a dynamic error if the base URI is not both absolute and valid according to RFC 3986 . |
This description of the p:archive step is for XProc version: 3.1. This is a required step (an XProc 3.1 processor must support this).
The formal specification for the p:archive step can be found here.
The p:archive step is part of categories:
The p:archive step is also present in version:
3.0.