p:archive (3.0)

Perform operations on archive files.

Summary
Description
Examples
Additional details
Errors raised
Reference information

Summary

<p:declare-step type="p:archive">
  <input port="source" primary="true" content-types="any" sequence="true"/>
  <output port="result" primary="true" content-types="any" sequence="false"/>
  <input port="archive" primary="false" content-types="any" sequence="true">
    <p:empty/>
  <input/>
  <input port="manifest" primary="false" content-types="xml" sequence="true">
    <p:empty/>
  <input/>
  <output port="report" primary="false" content-types="application/xml" sequence="false"/>
  <option name="format" as="xs:QName" required="false" select="'zip'"/>
  <option name="parameters" as="map(xs:QName, item()*)?" required="false" select="()"/>
  <option name="relative-to" as="xs:anyURI?" required="false" select="()"/>
</p:declare-step>

The p:archive step can perform several different operations on archive files (for instance ZIP files). The most common one will likely be creating one, but it could also provide services like update, freshen or even merge. The resulting archive appears on its result port.

Ports:

Port	Type	Primary?	Content types	Seq?	Description
`source`	`input`	`true`	`any`	`true`	The `source` port is used to provide the documents to be archived. How and which of these documents are processed is governed by the document(s) appearing on the other input ports and the combination of options and parameters. See below for details.
`result`	`output`	`true`	`any`	`false`	The resulting archive.
`archive`	`input`	`false`	`any`	`true`	Optional archives for operations like update, freshen or merge.
`manifest`	`input`	`false`	`xml`	`true`	An optional manifest document that tells the step how to construct the archive. If no manifest document is provided on this port, a default manifest is constructed automatically. See The XML archive manifest document format for details.
`report`	`output`	`false`	`application/xml`	`false`	A report about the archiving operation. This will be the same as the manifest, optionally amended with additional attributes and/or elements.

Options:

Name	Type	Req?	Default	Description
`format`	`xs:QName`	`false`	`zip`	The format of the archive. If its value is `zip` (the default), the `p:archive` step expects a ZIP archive on the `source` port. Whether any other archive formats can be handled and what their names (values for this option) are is implementation-defined and therefore dependent on the XProc processor used.
`parameters`	`map(xs:QName, item()*)?`	`false`	`()`	Parameters controlling the archiving. Several parameters are defined for processing ZIP archives (see Handling of ZIP archives). A specific XProc processor might define its own.
`relative-to`	`xs:anyURI?`	`false`	`()`	This is option is used in creating a manifest when no manifest is provided on the `manifest` port. If a manifest is present this option is not used.

Name

Type

Req?

Default

Description

format

xs:QName

false

zip

The format of the archive.

If its value is zip (the default), the p:archive step expects a ZIP archive on the source port.
Whether any other archive formats can be handled and what their names (values for this option) are is implementation-defined and therefore dependent on the XProc processor used.

parameters

map(xs:QName, item()*)?

false

()

Parameters controlling the archiving. Several parameters are defined for processing ZIP archives (see Handling of ZIP archives). A specific XProc processor might define its own.

relative-to

xs:anyURI?

false

()

This is option is used in creating a manifest when no manifest is provided on the manifest port. If a manifest is present this option is not used.

Description

The p:archive step is the Swiss army knife for handling archives. Its most common use is creating archives, but it could also be used for operations like update, freshen or even merge.

To make all this possible, the operation of p:archive is unfortunately quite complicated. The details are below, here’s a summary:

What’s exactly in the resulting archive is controlled using a manifest document (see The XML archive manifest document format). In such a manifest you specify the URI of the document to add and the path of this document in the archive.
A manifest of an existing archive, sometimes useful as a starting point, can be produced using the p:archive-manifest step.
Besides the documents in the manifest you can also specify documents to add by providing these on the step’s source port. Any document appearing on this port that is not already mentioned in the manifest is automatically added to the manifest. The path of such a document in the resulting archive can be controlled using the relative-to option.
When adding documents to the archive, p:archive compares the base URIs in the manifest with those of the documents appearing on the source port (the value of the base-uri document-property). If these match, the document on the source port is added. If not, the URI in the manifest is used to load a document (usually from disk).

Archives come in many formats. The only format the p:archive step is required to handle is ZIP. However, depending on the XProc processor used, other formats may also be processed.

The XML archive manifest document format

An archive manifest is an XML document that specifies files to process constructing the archive. It is also used as the result format of the p:archive-manifest step.

Its root element is <c:archive> (the c prefix here is bound to the http://www.w3.org/ns/xproc-step namespace):

<c:archive>
  ( <c:entry> |
    (any other element)
  )*
</c:archive>

Child element	#	Description
`c:entry`	*	An entry (a file) in the archive.

A <c:entry> element describes a single entry (a file) in the archive:

<c:entry (any other attribute)
         href = xs:anyURI
         name = xs:string
         comment? = xs:string
         content-type? = xs:string
         level? = xs:string
         method? = xs:string >
  (any child element)*
</c:entry>

Attribute	#	Type	Description
`href`	1	`xs:anyURI`	The URI of the entry. This plays an important role in determining which and how files are added to the archive, see below. A relative value is made absolute against the base URI of the manifest itself.
`name`	1	`xs:string`	The name of the entry. This is the path of the file within the archive. Usually this is a relative path. However, depending on how archives are constructed, an absolute path (a path starting with a `/`) is possible. Archives constructed by XProc steps always produce relative paths (no leading `/`).
`comment`	?	`xs:string`	An optional comment associated with the entry.
`content-type`	?	`xs:string`	The content-type (MIME type) of the entry. The `p:archive` step ignores it, but the `p:archive-manifest` step always adds it.
`level`	?	`xs:string`	The compression level of the entry. There are no defined values, all values are XProc processor dependent.
`method`	?	`xs:string`	The compression method of the entry. There is only one defined value: `none`, meaning, of course, no compression. Any other values are XProc processor dependent.

The `p:archive` algorithm

The p:archive step follows a, rather complicated, algorithm. It has two phases:

1 - Construct a complete manifest

First, the manifest (the document, if any, appearing on the manifest port) is checked and completed if necessary:

If no document appears on the manifest port, an empty manifest is created.
The base URIs of the documents appearing on the source port are compared against the list of base URIs in the manifest (the c:entry/@href values, made absolute). If there are documents on the source port that are not in the manifest, an entry (<c:entry> element) for this document is created:
- The c:entry/@href attribute becomes the base URI of the document.
- The c:entry/@name (which is the path/name of the entry in the archive) is computed against the value of the relative-to option:
  - If the base URI of the document starts with the value of the relative-to option, the c:entry/@name attribute value becomes the substring after this.
  - If the base URI of the document does not start with the value of the relative-to option, the c:entry/@name attribute value becomes the path of this base URI (without a leading /).
  For instance, assume the relative-to option is set to file:///some/path/. A document with base URI file:///some/path/etc/x.txt gets a c:entry/@name attribute value etc/x.txt. A document with base URI file:///someother/path/y.txt gets a c:entry/@name attribute value someother/path/y.txt.

The result of all this is that we now have a manifest that has entries (<c:entry> elements) for all documents appearing on the source port. It can also have entries for documents that are not on the source port: because such an entry was present in the initial manifest and no matching document on the source port was found for it.

2 - Process the manifest

The now completed manifest is processed. For every entry (<c:entry> element):

If the value of the c:entry/@href attribute matches the base URI of one of the documents appearing on the source port, this document is added to the archive.
When appropriate (for instance for XML documents), the value of its (optional) serialization document-property is used for serializing it (convert it to text format).
For other entries, the value of the c:entry/@href attribute is used to load the file (for instance from disk if it starts with file:/) and add it to the archive.
These documents are used “as is”: no parsing/serialization takes place.

In both cases, the value of the c:entry/@name attribute becomes the name/path of the entry in the archive. The values of the other attributes of the <c:entry> element might also get used, but this is dependent on the XProc processor used and/or the archive’s format.

The p:archive step is supposed to retain the order of the <c:entry> elements. This is, for instance, important when constructing an e-book in EPUB format: this has a non-compressed entry that must be first in the archive.

Handling of ZIP archives

When the value of the format option is absent or zip, the following applies:

The values of the c:entry/@name attributes in the manifest must be relative paths (without a leading /).
The archive port accepts zero or one ZIP archive. If this port is empty, an empty ZIP archive is used as its default value.

The parameters option is a map that associates parameters (the keys in the map) with values. For ZIP archives, the following parameters can be used:

Parameter	Description
`command`	Specifies the operation to perform. It’s default value is `update`. See below for a description of the commands.
`level`	For entries that have no `c:entry/@level` attribute specified, this is the default compression level for entries added or updated in the archive. For ZIP archives, its possible values are: `smallest` `fastest` `default` `huffman` `none`
`method`	For entries that have no `c:entry/@method` attribute specified, this is the default compression method for entries added or updated in the archive. For ZIP archives, its possible values are: `deflated` `none`

Parameter

Description

command

Specifies the operation to perform. It’s default value is update. See below for a description of the commands.

level

For entries that have no c:entry/@level attribute specified, this is the default compression level for entries added or updated in the archive. For ZIP archives, its possible values are:

smallest
fastest
default
huffman
none

method

For entries that have no c:entry/@method attribute specified, this is the default compression method for entries added or updated in the archive. For ZIP archives, its possible values are:

deflated
none

The command parameter can have one of the following values:

Command	Description
`update` (default)	The archive appearing on the `archive` port is updated: An entry in this ZIP archive that corresponds with a `c:entry/@name` attribute in the manifest gets updated as specified in the `<c:entry>` element. For other entries in the ZIP archive, first their name/path is made absolute using the base URI of the archive. If a file exists with that URI and is newer than the entry in the ZIP archive, it is updated. For all `<c:entry>` elements in the manifest that have no corresponding entry in the ZIP archive, the document gets added. Please note that when there is no document on the `archive` port, `p:archive` will always create a new, fresh, archive.
`create`	This behaves like the `update` command except that timestamps are ignored and updates (if any) always take place.
`freshen`	This behaves like the `update` command except that no new files will be added.
`delete`	For the `delete` command a ZIP archive `must` be present on the `archive` port. It removes all entries in the ZIP archive that have a corresponding `c:entry/@name` attribute in the manifest. All other manifest entries are ignored.

Examples

Basic usage

In probably most cases, the p:archive step will be used to create an archive. If you have no special requirements this is easy: simply supply the documents for the archive on the step’s source port. The only thing you need to take into account is the name/path of the entries in the archive: for this the relative-to option is important.

Pipeline document:

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0">

  <p:input port="source" sequence="true">
    <p:document href="in1.xml"/>
    <p:document href="test/in2.xml"/>
  </p:input>
  <p:output port="result"/>

  <p:variable name="relative-to" select="resolve-uri('.', static-base-uri())"/>

  <p:archive relative-to="{$relative-to}"/>

  <p:store href="tmp/result.zip"/>
  <p:archive-manifest relative-to="{$relative-to}"/>

</p:declare-step>

Result document:

<c:archive xmlns:c="http://www.w3.org/ns/xproc-step">
   <c:entry name="in1.xml"
            content-type="application/xml"
            href="file:/…/…/in1.xml"
            method="deflated"
            size="91"
            compressed-size="80"
            time="2025-07-09T09:52:16+02:00"/>
   <c:entry name="test/in2.xml"
            content-type="application/xml"
            href="file:/…/…/test/in2.xml"
            method="deflated"
            size="98"
            compressed-size="84"
            time="2025-07-09T09:52:16+02:00"/>
</c:archive>

The pipeline’s input consists of two documents, in1.xml and test/in2.xml. Note that (because the p:document/@href attributes have relative values) the paths to these documents are relative to the location of the pipeline itself.
When we construct an archive we usually don’t want the full path of the files on disk in the archive also. In this case we choose to use their relative paths against the pipeline. To achieve this we need the path (directory) where the pipeline is stored. This is done with the expression resolve-uri('.', static-base-uri()) and stored in the relative-to variable.
We then create the archive using p:archive. The two input documents appear on its source port. We do not provide a manifest on the manifest port, so one will get constructed automatically.
The names of the entries in the resulting archive get constructed by “subtracting” the value of the relative-to option from the base URIs of the source documents. The results will be their relative names against the pipeline’s location.
We store the resulting zip and, just to show you what’s inside, ask for an archive manifest using the p:archive-manifest step.

Using the report port

The p:archive step also has a report port that outputs the manifest of the resulting archive. So, building on the Basic usage example, we could also have shown what’s inside the created archive like this:

Pipeline document:

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0">

  <p:input port="source" sequence="true">
    <p:document href="in1.xml"/>
    <p:document href="test/in2.xml"/>
  </p:input>
  <p:output port="result" pipe="report@create-archive"/>

  <p:variable name="relative-to" select="resolve-uri('.', static-base-uri())"/>

  <p:archive relative-to="{$relative-to}" name="create-archive"/>

  <p:store href="tmp/result.zip"/>

</p:declare-step>

Result document:

<c:archive xmlns:c="http://www.w3.org/ns/xproc-step">
   <c:entry href="file:/…/…/in1.xml" name="in1.xml"/>
   <c:entry href="file:/…/…/test/in2.xml" name="test/in2.xml"/>
</c:archive>

Note that the information in the manifest is less than what p:archive-manifest produces. What exactly happens here is implementation-defined and therefore dependent on the XProc processor used.

Using a manifest

This example creates a manifest that references some additional file for the archive. Note that in the archive we give it a different name than its source using the c:entry/@name attribute. When the manifest is processed, p:archive notices that test/in2.xml is not on its source port and therefore loads it from disk.

Pipeline document:

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0" name="example">

  <p:input port="source" href="in1.xml"/>
  <p:output port="result"/>

  <p:identity name="manifest">
    <p:with-input>
      <c:archive xmlns:c="http://www.w3.org/ns/xproc-step">
        <c:entry name="test/extra.xml" href="test/in2.xml"/>
      </c:archive>
    </p:with-input>
  </p:identity>

  <p:variable name="relative-to" select="resolve-uri('.', static-base-uri())"/>
  <p:archive relative-to="{$relative-to}">
    <p:with-input pipe="source@example"/>
    <p:with-input port="manifest" pipe="result@manifest"/>
  </p:archive>

  <p:store href="tmp/result.zip"/>
  <p:archive-manifest relative-to="{$relative-to}"/>

</p:declare-step>

Result document:

<c:archive xmlns:c="http://www.w3.org/ns/xproc-step">
   <c:entry name="test/extra.xml"
            content-type="application/xml"
            href="file:/…/…/test/extra.xml"
            method="deflated"
            size="62"
            compressed-size="49"
            time="2025-02-05T13:05:59+01:00"/>
   <c:entry name="in1.xml"
            content-type="application/xml"
            href="file:/…/…/in1.xml"
            method="deflated"
            size="91"
            compressed-size="80"
            time="2025-07-09T09:52:16+02:00"/>
</c:archive>

Additional details

The only document-property for the document appearing on the result port is content-type (its value depending on the archive’s format). Note it has no base-uri document-property and no document-properties from the document on the source or archive port survive.
Documents appearing on the source port must have a base-uri document-property. All these base-uri document-properties must have a unique value.
A relative value for the relative-to option gets de-referenced against the base URI of the element in the pipeline it is specified on. In most cases this will be the path of the pipeline document.
The only format this step is required to handle is ZIP. The ZIP format definition can be found here.

Errors raised

Error code	Description
`XC0079`	It is a dynamic error if the map `parameters` contains an entry whose key is defined by the implementation and whose value is not valid for that key.
`XC0080`	It is a dynamic error if the number of documents on the `archive` does not match the expected number of archive input documents for the given `format` and `command`.
`XC0081`	It is a dynamic error if the format of the archive does not match the format as specified in the `format` option.
`XC0084`	It is a dynamic error if two or more documents appear on the `p:archive` step's `source` port that have the same base URI or if any document that appears on the `source` port has no base URI.
`XC0085`	It is a dynamic error if the format of the archive does not match the specified format, cannot be understood, determined and/or processed.
`XC0100`	It is a dynamic error if the document on port `manifest` does not conform to the given schema.
`XC0112`	It is a dynamic error if more than one document appears on the port `manifest`.
`XC0118`	It is a dynamic error if an archive manifest is invalid according to the specification.
`XD0011`	It is a dynamic error if the resource referenced by the `href` option does not exist, cannot be accessed or is not a file.
`XD0064`	It is a dynamic error if the base URI is not both absolute and valid according to RFC 3986 .

Reference information

This description of the p:archive step is for XProc version: 3.0. This is a required step (an XProc 3.0 processor must support this).

The formal specification for the p:archive step can be found here.

The p:archive step is part of categories:

The p:archive step is also present in version: 3.1.

p:archive (3.0)

Summary

Description

The XML archive manifest document format

The p:archive algorithm

Handling of ZIP archives

Examples

Basic usage

Using the report port

Using a manifest

Additional details

Errors raised

Reference information

The `p:archive` algorithm