p:text-sort (3.0) 

Sorts lines in a text document.

Summary

<p:declare-step type="p:text-sort">
  <input port="source" primary="true" content-types="text" sequence="false"/>
  <output port="result" primary="true" content-types="text" sequence="false"/>
  <option name="case-order" as="xs:string?" required="false" select="()" values="('upper-first', 'lower-first')"/>
  <option name="collation" as="xs:string" required="false" select="'https://www.w3.org/2005/xpath-functions/collation/codepoint'"/>
  <option name="lang" as="xs:language?" required="false" select="()"/>
  <option name="order" as="xs:string" required="false" select="'ascending'" values="('ascending', 'descending')"/>
  <option name="sort-key" as="xs:string" required="false" select="'.'"/>
  <option name="stable" as="xs:boolean" required="false" select="true()"/>
</p:declare-step>

The p:text-sort step sorts lines in the text document appearing on its source port.

Ports:

Port

Type

Primary?

Content types

Seq?

Description

source

input

true

text

false

The text document to sort the lines of.

result

output

true

text

false

The resulting document.

Options:

Name

Type

Req?

Default

Description

case-order

xs:string?

false

()

Defines whether upper-case characters are considered to come before or after lower-case characters. Must have a value upper-first or lower-first.

If not provided, its value is language-dependent.

This option is only used if no value is available for the collation option.

collation

xs:string

false

https://www.w3.org/2005/xpath-functions/collation/codepoint

The collation to use for sorting. The only collation that is always supported is the Unicode codepoint collation. This is also the default value for this option. Whether any other collations are supported is implementation-defined and therefore dependent on the XProc processor used.

lang

xs:language?

false

()

The language to sort the lines for. This influences, for instance, the order of accented characters. Its default value is implementation-defined and therefore dependent on the XProc processor used.

A value for the lang option must be a valid language code according to RFC 4646 (tags for identifying languages). For instance: en-us or nl-nl.

This option is only used if no value is available for the collation option.

order

xs:string

false

ascending

The sort order, either ascending (default) or descending.

sort-key

xs:string (XPath expression)

false

.

An XPath expression that results in the string to sort the lines on. It is evaluated for each line, with the line to sort (as a string) as context item (accessible with the dot operator .).

During this evaluation, the position() and last() functions are available to get the position of the line in the document and the number of lines. See Reversing the line order for an example of using these functions here.

stable

xs:boolean

false

true

This option tells the sorting algorithm what to do with lines with same sorting key. If its value is true (default), these lines are retained in their original order. If false, there is no need to this and the algorithm may change their order (but not necessarily so).

Description

The p:text-sort step takes the text document appearing on its source port and turns this into lines. These lines are then sorted according to the values of the step options. This sort process is the same as described for the XSLT xsl:sort instruction. The result appears on the result port.

Examples

Basic usage

Assume we have a text document, called lines.txt, that looks like this and we want to sort the lines using p:text-sort:

XProc steps rock!
An important addition to our XML processing toolkit.
What a joy!

Pipeline document:

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0">

  <p:input port="source" href="lines.txt"/>
  <p:output port="result"/>

  <p:text-sort/>

</p:declare-step>

Result document:

An important addition to our XML processing toolkit.
What a joy!
XProc steps rock!

Reversing the line order

This example is not very useful in itself, but it shows the use of the position() and last() function in the sort-key option. We set this option to last() - position(), which has the effect of reversing the line order.

Source document (lines-2.txt):

line 1
line 2
line 3

Pipeline document:

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0">

  <p:input port="source" href="lines-2.txt"/>
  <p:output port="result"/>

  <p:text-sort sort-key="last() - position()"/>

</p:declare-step>

Result document:

line 3
line 2
line 1

Additional details

  • p:text-sort preserves all document-properties of the document(s) appearing on its source port.

  • What exactly constitutes a line-end is defined in the XML specification.

  • All lines returned by p:text-sort are terminated with a line-end character (line-feed, &#xA;).

Errors raised

Error code

Description

XC0098

It is a dynamic error if a dynamic XPath error occurred while applying sort-key to a line.

XC0099

It is a dynamic error if the result of applying sort-key to a given line results in a sequence with more than one item.

Reference information

This description of the p:text-sort step is for XProc version: 3.0. This is a required step (an XProc 3.0 processor must support this).

The formal specification for the p:text-sort step can be found here.

The p:text-sort step is part of categories:

The p:text-sort step is also present in version: 3.1.