p:directory-list (3.1) 

List the contents of a directory.

Summary

<p:declare-step type="p:directory-list">
  <output port="result" primary="true" content-types="application/xml" sequence="false"/>
  <option name="path" as="xs:anyURI" required="true"/>
  <option name="detailed" as="xs:boolean" required="false" select="false()"/>
  <option name="exclude-filter" as="xs:string*" required="false" select="()"/>
  <option name="include-filter" as="xs:string*" required="false" select="()"/>
  <option name="max-depth" as="xs:string?" required="false" select="'1'"/>
  <option name="override-content-types" as="array(array(xs:string))?" required="false" select="()"/>
</p:declare-step>

The p:directory-list step produces an XML document that contains an overview of the contents of a specified directory.

Ports:

Port

Type

Primary?

Content types

Seq?

Description

result

output

true

application/xml

false

The resulting XML document that describes the contents of the directory. See The result document.

Options:

Name

Type

Req?

Default

Description

path

xs:anyURI

true

 

The path of the directory to describe the contents of.

detailed

xs:boolean

false

false

Whether detailed information about the directory and its contents is returned. See TBD

exclude-filter

xs:string*

false

()

A sequence of XPath regular expression that specifies which directories/files are excluded. See Including and excluding files and directories. See also the Including and excluding files example.

include-filter

xs:string*

false

()

A sequence of XPath regular expression that specifies which directories/files are included. See Including and excluding files and directories. See also the Including and excluding files example.

max-depth

xs:string?

false

1

How deep (how many levels of subdirectories) the directory is described. Its value must be a string that can be cast to either a (non-negative) integer or the word unbounded:

  • A value of 0 means that only information about the given directory is returned.

  • A value of 1 (default) returns information about the direct contents of the given directory.

  • A numerical value greater than 1 returns information up to that level of subdirectories.

  • A value unbounded returns information about all subdirectories.

See also the Changing the depth of the directory lisiting example.

override-content-types

array(array(xs:string))?

false

()

Use this to override the content-type determination of files. Determining the content-type of a file happens when you ask for detailed information (the detailed option is set to true).

This works just like the mechanism for the override-content-types option of p:archive-manifest, except that the regular expression matching is done against the paths as used for the matching of the include-filter and exclude-filter options. For more information see Including and excluding files and directories.

Description

The p:directory-list step provides you with an overview of the contents of a directory, similar to a Windows dir or a Unix/Linux/macOS ls command. This often comes in handy, for instance when you need to perform some operation on all files in a directory (or a directory tree). The Handling all files in a directory example gives an example of how to do this.

The p:directory-list step takes a directory path as its main input in the path option. The result port emits a document (see The result document) that describes this directory by listing its contents (files and subdirectories). What happens exactly depends on the settings of the other options. The step has no input port(s).

The directory to describe, as specified in the path option, must exist. Otherwise, error XC0017 is raised.

Including and excluding files and directories

The include-filter and exclude-filter determine which files and directories are included/excluded in the result. Both options are a sequence of (zero or more) XPath regular expression strings.

  • If the include-filter is not specified (or the empty sequence), all files/directories are included.

    Otherwise, every regular expression string in the option value is matched against the relative file/directory paths (relative to the path that was given in the path option). A match means the file/directory is included.

  • If the exclude-filter is not specified (or the empty sequence), no files/directories are excluded.

    Otherwise, every regular expression string in the option value is matched against the relative file/directory paths (relative to the path that was given in the path option). A match means the file/directory is excluded.

  • A file/directory is part of the result if it is included and not excluded.

Matching the regular expressions behaves like applying the XPath matches() function (like in matches($relative-path, $regular-expression)).

The result document

The root element of the resulting XML document is <c:directory> (the c prefix here is bound to the http://www.w3.org/ns/xproc-step namespace):

<c:directory name = xs:string
             xml:base = xs:anyURI
             hidden? = xs:boolean
             last-modified? = xs:dateTime
             readable? = xs:boolean
             size? = xs:integer
             writable? = xs:boolean >
  ( <c:file> |
    <c:directory> |
    <c:other>  )*
</c:directory>

 

Attribute

#

Type

Description

name

1

xs:string

The name of the directory (without a path in front).

xml:base

1

xs:anyURI

The URI of the directory, always ending with a slash.

  • For the root <c:directory> element this will be the absolute path of the directory described.

  • For any nested <c:directory> elements, this will be the name of the directory.

hidden

?

xs:boolean

Whether this directory is hidden for the current user. See below.

last-modified

?

xs:dateTime

The date and time this directory was last modified. See below.

readable

?

xs:boolean

Whether this directory is readable for the current user. See below.

size

?

xs:integer

The size of the directory entry (in bytes). See below.

writable

?

xs:boolean

Whether this directory is readable for the current user. See below.

 

Child element

#

Description

c:file

*

An file in the given directory

c:directory

*

A subdirectory in the given directory

c:other

*

Anything in the given directory that is “special”. What is considered special is implementation defined and therefore depends on the XProc processor used.

Every file in a directory is described using a <c:file> element:

<c:file name = xs:string
        xml:base = xs:anyURI
        content-type? = xs:string
        hidden? = xs:boolean
        last-modified? = xs:dateTime
        readable? = xs:boolean
        size? = xs:integer
        writable? = xs:boolean />

 

Attribute

#

Type

Description

name

1

xs:string

The name of the file (without a path in front).

xml:base

1

xs:anyURI

The name of the file (identical to the name attribute).

content-type

?

xs:string

The content-type (MIME type) of this file. If this cannot be determined, its value is application/octet-stream. See below.

hidden

?

xs:boolean

Whether this file is hidden for the current user. See below.

last-modified

?

xs:dateTime

The date and time this file was last modified. See below.

readable

?

xs:boolean

Whether this file is readable for the current user. See below.

size

?

xs:integer

The size of the file entry (in bytes). See below.

writable

?

xs:boolean

Whether this file is readable for the current user. See below.

Anything else in a directory is described using the <c:other> element. This looks just like the <c:file> element, but without a content-type attribute.

 

About the optional attributes on the result elements:

  • If the detailed option is false (default), only the name and xml:base attributes will be there.

  • If the detailed option is true, the other, optional, attributes will be present also.

What the values of the various attributes actually mean is implementation defined and therefore depends on the XProc processor used. For most attributes there will be no surprises, but what, for instance, is the size of a directory? It may take some experiments to get things right.

Examples

Basic usage

Assume we have a disk layout that looks like this:

-- data -- + -- x1.txt
           | 
           + -- x1.xml
           |
           + -- sub1/ -- + -- sub1-x1.xml
                        |
                        + -- sub2/ -- + -- sub2.tmp 
                                      |
                                      + -- sub2-x1.txt

For the examples to come we assume this data directory is in the same location as our pipeline. Simply asking for the directory listing, using the default values for the options of p:directory-list, is as follows:

Pipeline document:

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0">

  <p:output port="result"/>

  <p:directory-list path="data"/>

</p:declare-step>

Result document:

<c:directory xmlns:c="http://www.w3.org/ns/xproc-step"
             xml:base="file:/…/…/data/"
             name="data">
   <c:directory xml:base="sub1/" name="sub1"/>
   <c:file xml:base="x1.txt" name="x1.txt"/>
   <c:file xml:base="x1.xml" name="x1.xml"/>
</c:directory>

When we ask for details, the following happens:

Pipeline document:

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0">

  <p:output port="result"/>

  <p:directory-list path="data" detailed="true"/>

</p:declare-step>

Result document:

<c:directory xmlns:c="http://www.w3.org/ns/xproc-step"
             xmlns:mox="http://www.xml-project.com/morganaxproc"
             xml:base="file:/…/…/data/"
             name="data"
             readable="true"
             writable="true"
             mox:executable="true"
             hidden="false"
             last-modified="2024-12-31T14:05:13.97Z"
             size="0">
   <c:directory xml:base="sub1/"
                name="sub1"
                readable="true"
                writable="true"
                mox:executable="true"
                hidden="false"
                last-modified="2024-12-27T11:30:00.95Z"
                size="0"/>
   <c:file xml:base="x1.txt"
           name="x1.txt"
           content-type="text/plain"
           readable="true"
           writable="true"
           mox:executable="true"
           hidden="false"
           last-modified="2024-12-27T11:30:00.96Z"
           size="0"/>
   <c:file xml:base="x1.xml"
           name="x1.xml"
           content-type="application/xml"
           readable="true"
           writable="true"
           mox:executable="true"
           hidden="false"
           last-modified="2024-12-31T14:05:13.97Z"
           size="83"/>
</c:directory>

Changing the depth of the directory lisiting

The following examples work on the same directory structure as described in Basic usage. Asking for a directory description with max-depth option set to 0 just gives us the main directory itself:

Pipeline document:

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0">

  <p:output port="result"/>

  <p:directory-list path="data" max-depth="0"/>

</p:declare-step>

Result document:

<c:directory xmlns:c="http://www.w3.org/ns/xproc-step"
             xml:base="file:/…/…/data/"
             name="data"/>

And getting the full directory structure is as follows:

Pipeline document:

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0">

  <p:output port="result"/>

  <p:directory-list path="data" max-depth="unbounded"/>

</p:declare-step>

Result document:

<c:directory xmlns:c="http://www.w3.org/ns/xproc-step"
             xml:base="file:/…/…/data/"
             name="data">
   <c:directory xml:base="sub1/" name="sub1">
      <c:file xml:base="sub1-x1.xml" name="sub1-x1.xml"/>
      <c:directory xml:base="sub2/" name="sub2">
         <c:file xml:base="sub2-x1.txt" name="sub2-x1.txt"/>
         <c:file xml:base="sub2.tmp" name="sub2.tmp"/>
      </c:directory>
   </c:directory>
   <c:file xml:base="x1.txt" name="x1.txt"/>
   <c:file xml:base="x1.xml" name="x1.xml"/>
</c:directory>

Including and excluding files

The following examples work on the same directory structure as described in Basic usage. Assume we only need the text files in the directory tree: all files ending with .txt. A regular expression that matches this is \.txt$, so we have to pass this as the value of the include-filter option:

Pipeline document:

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0">

  <p:output port="result"/>

  <p:directory-list path="data" include-filter="\.txt$" max-depth="unbounded"/>

</p:declare-step>

Result document:

<c:directory xmlns:c="http://www.w3.org/ns/xproc-step"
             xml:base="file:/…/…/data/"
             name="data">
   <c:directory xml:base="sub1/" name="sub1">
      <c:directory xml:base="sub2/" name="sub2">
         <c:file xml:base="sub2-x1.txt" name="sub2-x1.txt"/>
      </c:directory>
   </c:directory>
   <c:file xml:base="x1.txt" name="x1.txt"/>
</c:directory>

Assume that we know that all files that start with an x are not interesting. We can exclude these by passing the regular expression ^x as the value of the exclude-filter option:

Pipeline document:

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0">

  <p:output port="result"/>

  <p:directory-list path="data" include-filter="\.txt$" exclude-filter="^x" max-depth="unbounded"/>

</p:declare-step>

Result document:

<c:directory xmlns:c="http://www.w3.org/ns/xproc-step"
             xml:base="file:/…/…/data/"
             name="data">
   <c:directory xml:base="sub1/" name="sub1">
      <c:directory xml:base="sub2/" name="sub2">
         <c:file xml:base="sub2-x1.txt" name="sub2-x1.txt"/>
      </c:directory>
   </c:directory>
</c:directory>

Finally, assume we both need the XML and text files in the directory tree, but not anything else. For this we could do two things:

  • Create a regular expression that incorporates both, and pass it as an include-filter attribute on the <p:directory-list> element, just like we did in the examples above: <p:directory list path="data" include-filter="\.(xml|txt)$" max-depth="unbounded"/>

  • Or we could pass a regular expression for each file type. If we do it this way we can no longer pass the include-filter option as an attribute. We have to use a <p:with-option> child element:

Pipeline document:

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0">

  <p:output port="result"/>

  <p:directory-list path="data" max-depth="unbounded">
    <p:with-option name="include-filter" select="('\.xml$', '\.txt$')"/>
  </p:directory-list>

</p:declare-step>

Result document:

<c:directory xmlns:c="http://www.w3.org/ns/xproc-step"
             xml:base="file:/…/…/data/"
             name="data">
   <c:directory xml:base="sub1/" name="sub1">
      <c:file xml:base="sub1-x1.xml" name="sub1-x1.xml"/>
      <c:directory xml:base="sub2/" name="sub2">
         <c:file xml:base="sub2-x1.txt" name="sub2-x1.txt"/>
      </c:directory>
   </c:directory>
   <c:file xml:base="x1.txt" name="x1.txt"/>
   <c:file xml:base="x1.xml" name="x1.xml"/>
</c:directory>

Handling all files in a directory

Again, the following examples work on the same directory structure as described in Basic usage. Assume we need to do something with all XML documents in the data directory. Using p:directory-list we can easily get the names of these files. However, to process them we will need to load them, and for that its handy if we have their full absolute URIs. These can be added using the p:make-absolute-uris step and change the name attributes into full URIs:

Pipeline document:

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0">

  <p:output port="result"/>

  <p:directory-list path="data" include-filter="\.xml$" max-depth="unbounded"/>
  <p:make-absolute-uris match="@name"/>

</p:declare-step>

Result document:

<c:directory xmlns:c="http://www.w3.org/ns/xproc-step"
             xml:base="file:/…/…/data/"
             name="file:/…/…/data/data">
   <c:directory xml:base="sub1/" name="file:/…/…/data/sub1/sub1">
      <c:file xml:base="sub1-x1.xml" name="file:/…/…/data/sub1/sub1-x1.xml"/>
   </c:directory>
   <c:file xml:base="x1.xml" name="file:/…/…/data/x1.xml"/>
</c:directory>

We can now use this result to process all the XML documents. The following pipeline simply loads them (using p:load), and wraps all contents (using p:wrap-sequence) in an <all-xml-documents> element:

Pipeline document:

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0">

  <p:output port="result"/>

  <p:directory-list path="data" include-filter="\.xml$" max-depth="unbounded"/>
  <p:make-absolute-uris match="@name"/>

  <p:for-each>
    <p:with-input select="//c:file"/>
    <p:load href="{/*/@name}"/>
  </p:for-each>
  <p:wrap-sequence wrapper="all-xml-documents"/>

</p:declare-step>

Result document:

<all-xml-documents>
   <data>This is document sub1/sub1-x1.xml</data>
   <data>This is document data/x1.xml</data>
</all-xml-documents>

Additional details

  • Only the base-uri property will be set. Its value will be the absolute URI of the directory described.

  • A relative value for the path option is resolved against the base URI of the element on which this option is specified. In most cases this will be the static base URI of your pipeline (the path where the XProc source containing the p:directory-list is stored).

  • If some entry (file or directory) is included in the result, all directories leading up to this entry are always included, even if they're excluded because of the include-filter and exclude-filter option settings. This assures that the hierarchy of the result always matches the hierarchy of the filesystem.

  • Working on “normal” files and/or directories (on disk, URI scheme file://) is always supported. Whether any other types are supported is implementation-defined, and therefore depends on the XProc processor used. For this, also the interpretation/definition of what is a “directory” and “file” may vary.

  • An XProc processor may add additional, implementation-defined, attributes to the various result elements as described in The result document. These attributes will always be in some, XProc processor dependent, namespace.

Errors raised

Error code

Description

XC0012

It is a dynamic error if the contents of the directory path are not available to the step due to access restrictions in the environment in which the pipeline is run.

XC0017

It is a dynamic error if the absolute path does not identify a directory.

XC0090

It is a dynamic error if an implementation does not support directory listing for a specified scheme.

XC0147

It is a dynamic error if the specified value is not a valid XPath regular expression.

XD0064

It is a dynamic error if the base URI is not both absolute and valid according to RFC 3986 .

Reference information

This description of the p:directory-list step is for XProc version: 3.1. This is a non-required step (an XProc 3.1 processor does not have to support this).

The formal specification for the p:directory-list step can be found here.

The p:directory-list step is part of categories:

The p:directory-list step is also present in version: 3.0.