p:cast-content-type (3.0) 

Changes the media type of a document.

Summary

<p:declare-step type="p:cast-content-type">
  <input port="source" primary="true" content-types="any" sequence="false"/>
  <output port="result" primary="true" content-types="any" sequence="false"/>
  <option name="content-type" as="xs:string" required="true"/>
  <option name="parameters" as="map(xs:QName,item()*)?" required="false" select="()"/>
</p:declare-step>

The p:cast-content-type step takes the document appearing on its source port and changes its media type according to the value of the content-type option, transforming the document if necessary.

Ports:

Port

Type

Primary?

Content types

Seq?

Description

source

input

true

any

false

The document to change the media type of.

result

output

true

any

false

The resulting document.

Options:

Name

Type

Req?

Default

Description

content-type

xs:string

true

 

The media type of the resulting document.

This must be a valid media type (either type/subtype or type/subtype+ext). If not, error XD0079 is raised.

parameters

map(xs:QName,item()*)?

false

()

Parameters controlling the casting/transformation of the document. Keys, values and their meaning are dependent on the XProc processor used.

Description

A document flowing through an XProc pipeline has a media type, which tells the XProc processor what kind of document it is dealing with. The media type of a document is recorded in its content-type document-property. Example values are text/xml for XML documents, application/json for JSON documents, etc. For more information about media types see for example Wikipedia.

The p:cast-content-type step has a required content-type option and tries to cast (change) the media type of the document appearing on its source port according to the value of this option. Sometimes this is a (very) simple operation: for instance, changing one XML media type to another just changes the value of the content-type document-property. However, you can also request more complex changes, like converting an XML document into JSON or vice versa.

Of course, not every media type can be cast into every other media type. The following sections describe what you can (and cannot) do. If you request an impossible cast, error XC0071 is raised.

A brief explanation of media types and how XProc treats them can be found in the XProc media type usage section below.

Converting XML documents

When the input document is an XML document (has an XML media type), the following casts are supported:

  • Casting to another XML media type simply changes the content-type document-property.

  • Casting to an HTML media type changes the content-type document-property and removes any serialization document-property.

  • Casting to a JSON media type converts the XML into JSON:

    • The XPath and XQuery Functions and Operators 3.1 standard defines an XML format for the representation of JSON data. The XPath function xml-to-json() converts this format into a JSON conformant string (and for further processing, parse-json() turns this string into a map/array).

      If an input document of p:cast-content-type is conformant to this XML format for the representation of JSON data, it’s converted into its JSON equivalent (like calling parse-json(xml-to-json())). See Converting the XML representation of JSON for an example.

    • If the input document has a <c:param-set> root element and <c:param name="…" value="…"/> child elements (the c prefix here is bound to the http://www.w3.org/ns/xproc-step namespace), it will turn this into a JSON map with the values of the name attributes as keys. See the Converting param-sets example.

      Param-sets are an XProc 1.0 construct, used for passing parameters (there were no maps in those days). Unless you’re converting XProc 1.0 steps into 3.x, it’s unlikely you’ll need this feature.

    • In all other cases it’s up to the XProc processor what happens. It could turn your XML into some kind of JSON, but it could just as well raise an error.

    A serialization document-property is removed when converting to JSON.

  • Casting to a text media type converts the XML into text. The incoming XML comes out as text, as a string, complete with tags, attributes, etc.

    The result of this conversion is the same as calling the XPath serialize($doc, $param) function, where $doc is the document to convert and $param is its serialization document-property. See the Converting XML to text example.

    A serialization document-property is removed.

  • Casting to any other media type where the input document is a <c:data> document (see c:data documents) results in a document with the specified media type and a representation that is the content of the <c:data> element after decoding it. The value of the c:data/@content-type attribute and the value of the content-type option of p:cast-content-type must be the same!

    A serialization document-property is removed.

  • Casting to any other media type where the input is not a valid <c:data> document is implementation-defined and therefore dependent on the XProc processor used.

Converting HTML documents

When the input document is an HTML document (has an HTML media type), the following casts are supported:

  • Casting to another HTML media type simply changes the content-type document-property.

  • Casting to an XML media type changes the content-type document-property and removes a serialization document-property.

  • Casting to a JSON media type is implementation-defined and therefore dependent on the XProc processor used.

  • Casting to a text media type works the same as casting an XML media type to text. See casting XML to text above.

  • Casting to any other media type is implementation-defined and therefore dependent on the XProc processor used.

Converting JSON documents

When the input document is a JSON document (has a JSON media type), the following casts are supported:

  • Casting to another JSON media type simply changes the content-type document-property.

  • Casting to an HTML media type is implementation-defined and therefore dependent on the XProc processor used.

  • Casting to an XML media type converts the JSON into XML according to the rules specified in the XPath XML format for the representation of JSON data. See the Converting JSON into XML example.

    A serialization document-property is removed.

  • Casting to a text media type converts the JSON into text. The incoming JSON (which in XProc consists of maps/arrays) comes out as text, as a string.

    The result of this conversion is the same as calling the XPath serialize($doc, $param) function, where $doc is the document to convert and $param is its serialization document-property.

    A serialization document-property is removed.

  • Casting to any other media type is implementation-defined and therefore dependent on the XProc processor used.

Converting text documents

When the input document is an text document (has a text media type), the following casts are supported:

  • Casting to another text media type simply changes the content-type document-property.

  • Casting to an XML media type parses the text value of the document by calling the XPath parse-xml() function. This assumes of course that the text is a well-formed XML document. If not, error XD0049 is raised.

  • Casting to an HTML media type parses the document into an HTML document. How this is done is implementation-defined and therefore dependent on the XProc processor used. If unsuccessful, error XD0060 is raised.

  • Casting to a JSON media type parses the document by calling the XPath parse-json($doc, $param) function, where $doc is the document to convert and $param is its serialization document-property.

    A serialization document-property is removed.

  • Casting to any other media type is implementation-defined and therefore dependent on the XProc processor used.

Converting other media types

When the input document has any other media type (meaning XProc treats it as a binary document), the following casts are supported:

  • Casting from an unrecognized media type to an XML media type produces a <c:data> document (see c:data documents). The <c:data/@content-type> attribute is the document’s content type. The content of the c:data element is the base64 encoded representation of the document. See the Converting a binary media type into XML example.

    A serialization document-property is removed.

  • Casting from an unrecognized media type to a HTML, JSON, text or other unrecognized media type is implementation-defined and therefore dependent on the XProc processor used.

<c:data> documents

The p:cast-content-type step uses <c:data> documents to convert XML from and into binary media types (the c prefix here is bound to the http://www.w3.org/ns/xproc-step namespace):

<c:data content-type = xs:string
        charset? = xs:string
        encoding? = xs:string />

 

Attribute

#

Type

Description

content-type

1

xs:string

The MIME type of the content.

charset

?

xs:string

The character set of the content, for instance UTF-8 or ASCII. For an explanation of character encodings see Wikipedia.

encoding

?

xs:string

The encoding of the content. The most used encoding is base64 (see Wikipedia).

XProc media type usage

A document media type (in XProc passed around in the content-type document-property) tells XProc (and your code if it needs to know this) what kind of document we’re dealing with: the document type. XProc recognizes and handles five document types: XML, HTML, JSON, text and binary.

The relation between document type and media type is as follows:

Document type

Media types

Examples

XML

*/xml

*/*+xml except application/xhtml+xml

text/xml

application/xml

image/svg+xml

HTML

text/html

application/xhtml+xml

text/html

application/xhtml+xml

JSON

application/json

application/json

Text

text/*

(not matching one of the XML or HTML media types)

text/plain

text/csv

Binary

Anything else

image/jpeg

application/octet-stream

application/zip

Examples

Converting the XML representation of JSON

If an input document of p:cast-content-type is conformant to the XPath XML format for the representation of JSON data and the content-type option is a JSON media type, p:cast-content-type converts this into its JSON equivalent.

The following source document is a shortened version of the example in the XPath standard:

<map xmlns="http://www.w3.org/2005/xpath-functions">
   <string key="desc">Distances </string>
   <boolean key="uptodate">true</boolean>
   <null key="author"/>
   <map key="cities">
      <array key="Brussels">
         <map>
            <string key="to">London</string>
            <number key="distance">322</number>
         </map>
      </array>
   </map>
</map>

Pipeline document:

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0">

  <p:input port="source"/>
  <p:output port="result"/>

  <p:cast-content-type content-type="application/json"/>

</p:declare-step>

The resulting JSON map:

{"desc":"Distances ","uptodate":true,"author":null,"cities":{"Brussels":[{"to":"London","distance":322}]}}

Converting param-sets

Param-sets are constructs used in the XProc 1.0 days for passing sets of parameters, for instance to XSLT stylesheets. The current version uses maps for this. To enable converting param-sets into maps, p:cast-content-type contains support for this. In XProc, a map is JSON data, so the content-type option must be a JSON media type.

The source param-set document:

<c:param-set xmlns:c="http://www.w3.org/ns/xproc-step">
   <c:param name="param1" value="y"/>
   <c:param name="param2" value="1234"/>
</c:param-set>

Pipeline document:

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0">

  <p:input port="source"/>
  <p:output port="result"/>

  <p:cast-content-type content-type="application/json"/>

</p:declare-step>

The resulting JSON map:

{"param1":"y","param2":"1234"}

JSON maps are passed around as XPath maps, so it’s easy to store such a map in a variable and use it later. Just add the following variable declaration directly after the p:cast-content-type invocation:

<p:variable name="param-set-map" as="map(*)" select="."/>

Unless you’re converting XProc 1.0 code into a newer version, i’s unlikely you’ll need this param-set conversion feature.

Converting XML to text

Let’s convert this simple XML document into text:

<input-document timestamp="2024-08-23T09:12:45">
   <text color="red">Hi there!</text>
</input-document>

Pipeline document:

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0">

  <p:input port="source"/>
  <p:output port="result"/>

  <p:cast-content-type content-type="text/plain"/>

</p:declare-step>

The resulting text (it looks like it is another XML document, but it is just text):

<?xml version="1.0" encoding="UTF-8"?>
<input-document timestamp="2024-08-23T09:12:45">
      <text color="red">Hi there!</text>
    </input-document>

Now assume we need this text representation without the XML header (the <?xml … ?> part at the top). The p:cast-content-type step uses the document serialization document-property to guide the conversions. This document-property is a map containing the required serialization properties. For this example: map{'omit-xml-declaration': true()}.

Document-properties can be specified using the p:set-properties step. The value of the properties option of p:set-properties is itself a map, with the document-property names as keys. Therefore, its value becomes a map within a map: map{'serialization': map{'omit-xml-declaration': true()}}.

The following code (using the same input document as above) does the trick:

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0">

  <p:input port="source"/>
  <p:output port="result"/>

  <p:set-properties properties="map{'serialization': map{'omit-xml-declaration': true()}}"/>

  <p:cast-content-type content-type="text/plain"/>

</p:declare-step>

Result document:

<input-document timestamp="2024-08-23T09:12:45">
      <text color="red">Hi there!</text>
    </input-document>

Converting JSON into XML

Converting JSON into XML means p:cast-content-type produces XML according to the XPath XML format for the representation of JSON data specification. Here we do the inverse of what is done in the Converting the XML representation of JSON example.

Source document:

{"desc":"Distances","uptodate":true,"author":null,"cities":{"Brussels":[{"to":"London","distance":322}]}}

Pipeline document:

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0">

  <p:input port="source"/>
  <p:output port="result"/>

  <p:cast-content-type content-type="text/xml"/>

</p:declare-step>

Result document:

<map xmlns="http://www.w3.org/2005/xpath-functions">
   <string key="desc">Distances</string>
   <boolean key="uptodate">true</boolean>
   <null key="author"/>
   <map key="cities">
      <array key="Brussels">
         <map>
            <string key="to">London</string>
            <number key="distance">322</number>
         </map>
      </array>
   </map>
</map>

Converting a binary media type into XML

This example transforms a piece of text that has been given the (bogus) media type of x/x into XML. Because XProc does not recognize this media type, it treats the document as binary. The result of the p:cast-content-type step is the document’s base64 encoded contents, wrapped in a <c:data> element.

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0">

  <p:input port="source">
    <p:inline content-type="x/x">Hi there!</p:inline>
  </p:input>
  <p:output port="result"/>

  <p:cast-content-type content-type="text/xml"/>

</p:declare-step>

Result document:

<c:data xmlns:c="http://www.w3.org/ns/xproc-step"
        content-type="x/x"
        encoding="base64">SGkgdGhlcmUh</c:data>

Additional details

  • If the value of the content-type option and the media type of a document are the same, the document will appear unchanged on the result port.

  • p:cast-content-type preserves all document-properties of the document(s) appearing on its source port.

    Exceptions are the content-type document-property which is updated accordingly and the serialization document-property which is sometimes removed.

Errors raised

Error code

Description

XC0071

It is a dynamic error if the <p:cast-content-type> step cannot perform the requested cast.

XC0072

It is a dynamic error if the <c:data> contains content is not a valid base64 string.

XC0073

It is a dynamic error if the <c:data> element does not have a @content-type attribute.

XC0074

It is a dynamic error if the content-type is supplied and is not the same as the @content-type specified on the <c:data> element.

XC0079

It is a dynamic error if the map parameters contains an entry whose key is defined by the implementation and whose value is not valid for that key.

XD0049

It is a dynamic error if the text value is not a well-formed XML document

XD0057

It is a dynamic error if the text document does not conform to the JSON grammar, unless the parameter liberal is true and the processor chooses to accept the deviation.

XD0058

It is a dynamic error if the parameter duplicates is reject and the text document contains a JSON object with duplicate keys.

XD0059

It is a dynamic error if the parameter map contains an entry whose key is defined in the specification of fn:parse-json and whose value is not valid for that key, or if it contains an entry with the key fallback when the parameter escape with true() is also present.

XD0060

It is a dynamic error if the text document can not be converted into the XPath data model

XD0079

It is a dynamic error if a supplied content-type is not a valid media type of the form “type/subtype+ext” or “type/subtype”.

Reference information

This description of the p:cast-content-type step is for XProc version: 3.0. This is a required step (an XProc 3.0 processor must support this).

The formal specification for the p:cast-content-type step can be found here.

The p:cast-content-type step is part of categories:

The p:cast-content-type step is also present in version: 3.1.