This site is work in progress and therefore incomplete yet.

 p:archive-manifest (3.1) 

Create an XML manifest document describing the contents of an archive file.

Summary

<p:declare-step type="p:archive-manifest">
  <input port="source" primary="true" content-types="any" sequence="false"/>
  <output port="result" primary="true" content-types="application/xml" sequence="false"/>
  <option name="format" as="xs:QName?" required="false" select="()"/>
  <option name="override-content-types" as="array(array(xs:string))?" required="false" select="()"/>
  <option name="parameters" as="map(xs:QName, item()*)?" required="false" select="()"/>
  <option name="relative-to" as="xs:anyURI?" required="false" select="()"/>
</p:declare-step>

The p:archive-manifest step creates an XML manifest document describing the contents of the archive file appearing on its source port (for instance a ZIP file).

Ports:

Type

Port

Primary?

Content types

Seq?

Description

input

source

true

any

false

The archive file to create the manifest for.

output

result

true

application/xml

false

The created XML manifest document. See the p:archive step for a description of its format.

Options:

Name

Type

Req?

Default

Description

format

xs:QName?

false

()

The format of the archive file on the source port:

  • If its value is zip, the p:archive-manifest step expects a ZIP archive on the source port.

  • If absent or the empty sequence, the p:archive-manifest step tries to guess the archive file format. The only format that this step is required to recognize and handle is ZIP.

  • Whether any other archive formats can be handled and what their names (values for this option) are depends on the XProc processor used.

override-content-types

array(array(xs:string))?

false

()

Use this to override the content-type determination of the files in the archive (see Overriding content-types).

parameters

map(xs:QName, item()*)?

false

()

Parameters used to control the XML manifest document generation. The XProc specification does not define any parameters for this option. A specific XProc processor might define its own.

relative-to

xs:anyURI?

false

()

This option can be used to set/override the base URI of the archive. If you don’t specify it, it is, as expected, the base URI of the document appearing on the source port. The use of this option is rare, but you might need it when:

  • The archive on the source port has no base-uri document-property. This would raise error XC0120.

  • You use this manifest as a base for creating a new one with p:archive. The base URI plays an important role here and setting it to specific value is sometimes useful.

Description

The p:archive-manifest step takes an archive file (for instance a ZIP file) on its source port and returns on its result port an XML document describing the contents of the archive: the archive manifest. The archive manifest format is described in the p:archive step.

Archive manifests can be used in several ways. Some examples:

  • To inspect which files are present in an archive, for instance to check whether what you’ve got is complete.

  • As an input manifest for p:archive. This step takes, on its manifest port, a manifest like the one produced by p:archive-manifest and uses this to create a new archive or update an existing one. You could for instance first get a manifest using p:archive-manifest, change it to reflect the changes you need and then feed it to p:archive to produce a new archive.

Archives come in many formats. The only format the p:archive-manifest step is required to handle is ZIP. However, depending on the XProc processor used, other formats may also be processed.

Overriding content-types

One of the things the p:archive-manifest step does is determining the content-type (MIME type) of the archive entries. This is usually done based on the filename/extension. It is recorded in the manifest c:entry/@content-type attribute.

Sometimes it is useful to override this mechanism and assign specific content-types to some of the entries. For instance, the files Microsoft Office produces (.docx, .xlsx, etc.) are archives with a lot of XML documents inside. Some of these documents have the extension .rels and would therefore not be recognized as XML documents. The override-content-types option makes it possible to adjust this behavior.

The value of the override-content-types option must be an array of arrays. The inner arrays must have exactly two members:

  • The first member must be an XPath regular expression.

  • The second member must be a valid a MIME content-type.

Determining an archive entry’s content-type is now as follows:

  • The inner arrays of the override-content-types option value are processed in order of appearance (so order is significant).

  • The XPath regular expression (in the first member of the inner array) is matched against the full path of an entry in the archive (as in matches($path-in-archive, $regular-expression)).

  • If a match is found, the content-type (the second member of the inner array) is used as the entry’s content-type.

  • If no match was found for all the inner arrays, the normal mechanism for determining the content-type is used.

For example: setting the override-content-types option to [ ['.rels$', 'application/xml'], ['^special/', 'application/octet-stream'] ] means that all files ending with .rels will get the content-type application/xml. All files in the archive’s special directory (including sub-directories) will get the content-type application/octet-stream. See also the Overriding content types example.

Examples

Basic usage

Assume we have a simple ZIP archive with two entries:

  • An XML file in the root called reference.xml

  • An image in an images/ sub-directory called logo.png.

The following pipeline creates an archive manifest for this ZIP file:

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0">

  <p:input port="source"/>
  <p:output port="result"/>

  <p:archive-manifest/>

</p:declare-step>

Resulting archive manifest:

<c:archive xmlns:c="http://www.w3.org/ns/xproc-step">
   <c:entry name="images/logo.png"
            content-type="image/png"
            href="file:/…/…/test.zip/images/logo.png"
            method="deflated"
            size="86656"
            compressed-size="85694"
            time="2024-07-04T11:12:22.4+02:00"/>
   <c:entry name="reference.xml"
            content-type="application/xml"
            href="file:/…/…/test.zip/reference.xml"
            method="deflated"
            size="78"
            compressed-size="77"
            time="2024-07-09T19:58:50.75+02:00"/>
</c:archive>

As you can see, the XProc processor I’m using to process this example (MorganaXProc-III) adds a few extra attributes to the <c:entry> elements: size, compressed-size and time.

Also note the contents of the c:entry/@href attributes: they are a combination of the full path/filename of the archive and the path of the entry within the archive (as in the c:entry/@name attribute). The c:entry/@href attribute plays an important role when creating archives using p:archive.

Overriding content types

This example uses the same ZIP archive as in Basic usage. The following pipeline explicitly sets the content type for .png files to application/octet-stream:

Pipeline document:

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0">

  <p:input port="source"/>
  <p:output port="result"/>

  <p:archive-manifest>
    <p:with-option name="override-content-types" select="[ ['\.png$', 'application/octet-stream'] ]"/>
  </p:archive-manifest>

</p:declare-step>

Resulting archive manifest:

<c:archive xmlns:c="http://www.w3.org/ns/xproc-step">
   <c:entry name="images/logo.png"
            content-type="application/octet-stream"
            href="file:/…/…/test.zip/images/logo.png"
            method="deflated"
            size="86656"
            compressed-size="85694"
            time="2024-07-04T11:12:22.4+02:00"/>
   <c:entry name="reference.xml"
            content-type="application/xml"
            href="file:/…/…/test.zip/reference.xml"
            method="deflated"
            size="78"
            compressed-size="77"
            time="2024-07-09T19:58:50.75+02:00"/>
</c:archive>

Using the relative-to option

This example uses the same ZIP archive as in Basic usage. It sets the relative-to to file:///test/. This is reflected in the c:entry/@href attributes:

Pipeline document:

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0">

  <p:input port="source"/>
  <p:output port="result"/>

  <p:archive-manifest relative-to="file:///test/">
  
  </p:archive-manifest>

</p:declare-step>

Resulting archive manifest:

<c:archive xmlns:c="http://www.w3.org/ns/xproc-step">
   <c:entry name="images/logo.png"
            content-type="image/png"
            href="file:///test/images/logo.png"
            method="deflated"
            size="86656"
            compressed-size="85694"
            time="2024-07-04T11:12:22.4+02:00"/>
   <c:entry name="reference.xml"
            content-type="application/xml"
            href="file:///test/reference.xml"
            method="deflated"
            size="78"
            compressed-size="77"
            time="2024-07-09T19:58:50.75+02:00"/>
</c:archive>

Additional details

  • The only document-property for the document appearing on the result is content-type, with value application/xml. Note it has no base-uri document-property and no document-properties from the document on the source port survive.

  • A relative value for the relative-to option gets de-referenced against the base URI of the element in the pipeline it is specified on. In most cases this will be the path of the pipeline document.

  • The only format this step is required to handle is ZIP. The ZIP format definition can be found here.

Errors raised

Error code

Description

XC0079

It is a dynamic error if the map parameters contains an entry whose key is defined by the implementation and whose value is not valid for that key.

XC0085

It is a dynamic error if the format of the archive does not match the specified format, cannot be understood, determined and/or processed.

XC0120

It is a dynamic error if the relative-to option is not present and the document on the source port does not have a base URI.

XC0146

It is a dynamic error if the specified value for the override-content-types option is not an array of arrays, where the inner arrays have exactly two members of type xs:string.

XC0147

It is a dynamic error if the specified value is not a valid XPath regular expression.

XD0064

It is a dynamic error if the base URI is not both absolute and valid according to RFC 3986 .

XD0079

It is a dynamic error if a supplied content-type is not a valid media type of the form “type/subtype+ext” or “type/subtype”.

Reference information

This description of the p:archive-manifest step is for XProc version: 3.1. This is a required step (an XProc 3.1 processor must support this).

The formal specification for the p:archive-manifest step can be found here.

The p:archive-manifest step is part of categories:

The p:archive-manifest step is also present in version: 3.0.