Perform operations on archive files.
<p:declare-step type="p:archive"> <input port="source" primary="true" content-types="any" sequence="true"/> <output port="result" primary="true" content-types="any" sequence="false"/> <input port="archive" primary="false" content-types="any" empty="true" sequence="true"> <p:empty/> <input/> <input port="manifest" primary="false" content-types="xml" empty="true" sequence="true"> <p:empty/> <input/> <output port="report" primary="false" content-types="application/xml" sequence="false"/> <option name="format" as="xs:QName" required="false" select="'zip'"/> <option name="parameters" as="map(xs:QName, item()*)?" required="false" select="()"/> <option name="relative-to" as="xs:anyURI?" required="false" select="()"/> </p:declare-step>
The p:archive
step can perform several different operations on archive files (for instance ZIP files). The most common one will likely be
creating one, but it could also provide services like update, freshen or even merge. The resulting archive appears on its result
port.
Ports:
Port | Type | Primary? | Content types | Seq? | Description |
---|---|---|---|---|---|
|
|
|
|
| The |
|
|
|
|
| The resulting archive. |
|
|
|
|
| Optional archives for operations like update, freshen or merge. |
|
|
|
|
| An optional manifest document that tells the step how to construct the archive. If no manifest document is provided on this port, a default manifest is constructed automatically. See The XML archive manifest document format for details. |
|
|
|
|
| A report about the archiving operation. This will be the same as the manifest, optionally amended with additional attributes and/or elements. |
Options:
The p:archive
step is the Swiss army knife for handling archives. Its most common use is creating archives, but it could also be used for
operations like update, freshen or even merge.
To make all this possible, the operation of p:archive
is unfortunately quite complicated. The details are below, here’s a
summary:
What’s exactly in the resulting archive is controlled using a manifest document (see The XML archive manifest document format). In such a manifest you specify the URI of the document to add and the path of this document in the archive.
A manifest of an existing archive, sometimes useful as a starting point, can be produced using the p:archive-manifest
step.
Besides the documents in the manifest you can also specify documents to add by providing these on the step’s source
port. Any document appearing on this port that is not already mentioned in the manifest is automatically added to the manifest. The path of
such a document in the resulting archive can be controlled using the relative-to
option.
When adding documents to the archive, p:archive
compares the base URIs in the manifest with those of the documents appearing on the
source
port (the value of the base-uri
document-property). If these match, the document on the
source
port is added. If not, the URI in the manifest is used to load a document (usually from disk).
Archives come in many formats. The only format the p:archive
step is required to handle is ZIP. However, depending on the XProc processor used,
other formats may also be processed.
An archive manifest is an XML document that specifies files to process constructing the archive. It is also used as the result format of
the p:archive-manifest
step.
Its root element is <c:archive>
(the c
prefix here is bound to the http://www.w3.org/ns/xproc-step
namespace):
A <c:entry>
element describes a single entry (a file) in the archive:
<c:entry name = xs:string href = xs:anyURI content-type? = xs:string comment? = xs:string method? = xs:string level? = xs:string (any other attribute) > (any child element)* </c:entry>
Attribute | # | Type | Description |
---|---|---|---|
| 1 |
| The name of the entry. This is the path of the file within the archive. Usually this is a relative path. However, depending on how archives are constructed, an absolute path (a path starting with a
|
| 1 |
| The URI of the entry. This plays an important role in determining which and how files are added to the archive, see below. A relative value is made absolute against the base URI of the manifest itself. |
| ? |
| The content-type (MIME type) of the entry. The |
| ? |
| An optional comment associated with the entry. |
| ? |
| The compression method of the entry. There is only one defined value: |
| ? |
| The compression level of the entry. There are no defined values, all values are XProc processor dependent. |
p:archive
algorithmThe p:archive
step follows a, rather complicated, algorithm. It has two phases:
1 - Construct a complete manifest
First, the manifest (the document, if any, appearing on the manifest
port) is checked and completed if necessary:
If no document appears on the manifest
port, an empty manifest is created.
The base URIs of the documents appearing on the source
port are compared against the list of base URIs in the manifest
(the c:entry/@href
values, made absolute). If there are documents on the source port that are not in the
manifest, an entry (<c:entry>
element) for this document is created:
The c:entry/@href
attribute becomes the base URI of the document.
The c:entry/@name
(which is the path/name of the entry in the archive) is computed against the value of the
relative-to
option:
If the base URI of the document starts with the value of the relative-to
option, the
c:entry/@name
attribute value becomes the substring after this.
If the base URI of the document does not start with the value of the relative-to
option, the
c:entry/@name
attribute value becomes the path of this base URI (without a leading /
).
For instance, assume the relative-to
option is set to file:///some/path/
. A document with base URI
file:///some/path/etc/x.txt
gets a c:entry/@name
attribute value etc/x.txt
. A document with
base URI file:///someother/path/y.txt
gets a c:entry/@name
attribute value
someother/path/y.txt
.
The result of all this is that we now have a manifest that has entries (<c:entry>
elements) for all documents appearing on the
source
port. It can also have entries for documents that are not on the source port: because such an entry
was present in the initial manifest and no matching document on the source
port was found for it.
2 - Process the manifest
The now completed manifest is processed. For every entry (<c:entry>
element):
If the value of the c:entry/@href
attribute matches the base URI of one of the documents appearing on the
source
port, this document is added to the archive.
When appropriate (for instance for XML documents), the value of its (optional) serialization
document-property is
used for serializing it (convert it to text format).
For other entries, the value of the c:entry/@href
attribute is used to load the file (for instance from disk if it starts
with file:/
) and add it to the archive.
These documents are used “as is”: no parsing/serialization takes place.
In both cases, the value of the c:entry/@name
attribute becomes the name/path of the entry in the archive. The values of the
other attributes of the <c:entry>
element might also get used, but this is dependent on the XProc processor used and/or the
archive’s format.
The p:archive
step is supposed to retain the order of the <c:entry>
elements. This is, for instance, important when constructing an
e-book in EPUB format: this has a non-compressed entry that must be first in the archive.
When the value of the format
option is absent or zip
, the following applies:
The values of the c:entry/@name
attributes in the manifest must be relative paths (without a leading
/
).
The archive
port accepts zero or one ZIP archive. If this port is empty, an empty ZIP archive is used as its default
value.
The parameters
option is a map that associates parameters (the keys in the map) with values. For ZIP archives, the
following parameters can be used:
Parameter | Description |
---|---|
| Specifies the operation to perform. It’s default value is |
| For entries that have no
|
| For entries that have no
|
The command
parameter can have one of the following values:
Command | Description |
---|---|
| The archive appearing on the
Please note that when there is no document on the |
| This behaves like the |
| This behaves like the |
| For the |
In probably most cases, the p:archive
step will be used to create an archive. If you have no special requirements this is easy: simply supply
the documents for the archive on the step’s source
port. The only thing you need to take into account is the name/path of
the entries in the archive: for this the relative-to
option is important.
Pipeline document:
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0"> <p:input port="source" sequence="true"> <p:document href="in1.xml"/> <p:document href="test/in2.xml"/> </p:input> <p:output port="result"/> <p:variable name="relative-to" select="resolve-uri('.', static-base-uri())"/> <p:archive relative-to="{$relative-to}"/> <p:store href="tmp/result.zip"/> <p:archive-manifest relative-to="{$relative-to}"/> </p:declare-step>
Result document:
<c:archive xmlns:c="http://www.w3.org/ns/xproc-step"> <c:entry name="in1.xml" content-type="application/xml" href="file:/…/…/in1.xml" method="deflated" size="92" compressed-size="81" time="2024-12-02T13:40:20+01:00"/> <c:entry name="test/in2.xml" content-type="application/xml" href="file:/…/…/test/in2.xml" method="deflated" size="99" compressed-size="85" time="2024-12-02T13:40:20+01:00"/> </c:archive>
The pipeline’s input consists of two documents, in1.xml
and test/in2.xml
. Note that (because the
p:document/@href
attributes have relative values) the paths to these documents are relative to the location of the pipeline
itself.
When we construct an archive we usually don’t want the full path of the files on disk in the archive also. In this case we choose
to use their relative paths against the pipeline. To achieve this we need the path (directory) where the pipeline is stored. This is done
with the expression resolve-uri('.', static-base-uri())
and stored in the relative-to
variable.
We then create the archive using p:archive
. The two input documents appear on its source
port. We do not provide a manifest
on the manifest
port, so one will get constructed automatically.
The names of the entries in the resulting archive get constructed by “subtracting” the value of the
relative-to
option from the base URIs of the source documents. The results will be their relative names against the
pipeline’s location.
We store the resulting zip and, just to show you what’s inside, ask for an archive manifest using the p:archive-manifest
step.
The p:archive
step also has a report
port that outputs the manifest of the resulting archive. So, building on the Basic usage example, we could also have shown what’s inside the created archive like this:
Pipeline document:
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0"> <p:input port="source" sequence="true"> <p:document href="in1.xml"/> <p:document href="test/in2.xml"/> </p:input> <p:output port="result" pipe="report@create-archive"/> <p:variable name="relative-to" select="resolve-uri('.', static-base-uri())"/> <p:archive relative-to="{$relative-to}" name="create-archive"/> <p:store href="tmp/result.zip"/> </p:declare-step>
Result document:
<c:archive xmlns:c="http://www.w3.org/ns/xproc-step"> <c:entry href="file:/…/…/in1.xml" name="in1.xml"/> <c:entry href="file:/…/…/test/in2.xml" name="test/in2.xml"/> </c:archive>
Note that the information in the manifest is less than what p:archive-manifest
produces. What exactly happens here is
implementation-defined and therefore dependent on the XProc processor used.
This example creates a manifest that references some additional file for the archive. Note that in the archive we give it a different name
than its source using the c:entry/@name
attribute. When the manifest is processed, p:archive
notices that test/in2.xml
is not on its source
port and therefore loads it from disk.
Pipeline document:
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0" name="example"> <p:input port="source" href="in1.xml"/> <p:output port="result"/> <p:identity name="manifest"> <p:with-input> <c:archive xmlns:c="http://www.w3.org/ns/xproc-step"> <c:entry name="test/extra.xml" href="test/in2.xml"/> </c:archive> </p:with-input> </p:identity> <p:variable name="relative-to" select="resolve-uri('.', static-base-uri())"/> <p:archive relative-to="{$relative-to}"> <p:with-input pipe="source@example"/> <p:with-input port="manifest" pipe="result@manifest"/> </p:archive> <p:store href="tmp/result.zip"/> <p:archive-manifest relative-to="{$relative-to}"/> </p:declare-step>
Result document:
<c:archive xmlns:c="http://www.w3.org/ns/xproc-step"> <c:entry name="test/extra.xml" content-type="application/xml" href="file:/…/…/test/extra.xml" method="deflated" size="60" compressed-size="47" time="2024-09-03T10:36:32+02:00"/> <c:entry name="in1.xml" content-type="application/xml" href="file:/…/…/in1.xml" method="deflated" size="92" compressed-size="81" time="2024-12-02T13:40:20+01:00"/> </c:archive>
The only document-property for the document appearing on the result
port is content-type
(its value
depending on the archive’s format). Note it has no base-uri
document-property and no document-properties from the
document on the source
or archive
port survive.
Documents appearing on the source
port must have a base-uri
document-property. All these
base-uri
document-properties must have a unique value.
A relative value for the relative-to
option gets de-referenced against the base URI of the element in the pipeline it is
specified on. In most cases this will be the path of the pipeline document.
The only format this step is required to handle is ZIP. The ZIP format definition can be found here.
Error code | Description |
---|---|
It is a dynamic error if the map | |
It is a dynamic error if the number of documents on the | |
It is a dynamic error if the format of the archive does not match the format as specified in the | |
It is a dynamic error if two or more documents appear on the | |
It is a dynamic error if the format of the archive does not match the specified format, cannot be understood, determined and/or processed. | |
It is a dynamic error if the document on port | |
It is a dynamic error if more than one document appears on the port | |
It is a dynamic error if an archive manifest is invalid according to the specification. | |
It is a dynamic error if the resource referenced by the | |
It is a dynamic error if the base URI is not both absolute and valid according to RFC 3986 . |
This description of the p:archive
step is for XProc version: 3.1. This is a required step (an XProc 3.1 processor must support this).
The formal specification for the p:archive
step can be found here.
The p:archive
step is part of categories:
The p:archive
step is also present in version:
3.0.