p:http-request (3.1) 

Interact using HTTP (or related protocols).

Summary

<p:declare-step type="p:http-request">
  <input port="source" primary="true" content-types="any" sequence="true"/>
  <output port="result" primary="true" content-types="any" sequence="true"/>
  <output port="report" primary="false" content-types="application/json" sequence="true"/>
  <option name="href" as="xs:anyURI" required="true"/>
  <option name="assert" as="xs:string" required="false" select="'.?status-code lt 400'"/>
  <option name="auth" as="map(xs:string, item()+)?" required="false" select="()"/>
  <option name="headers" as="map(xs:string, xs:string)?" required="false" select="()"/>
  <option name="method" as="xs:string?" required="false" select="'GET'"/>
  <option name="parameters" as="map(xs:QName, item()*)?" required="false" select="()"/>
  <option name="serialization" as="map(xs:QName,item()*)?" required="false" select="()"/>
</p:declare-step>

The p:http-request step allows pipelines to interact with resources (for instance websites) over HTTP or related protocols.

Ports:

Port

Type

Primary?

Content types

Seq?

Description

source

input

true

any

true

Document(s) used in constructing the request body.

By default, source documents are used for HTTP methods that require a body (for instance POST) only. If the HTTP method does not specify a body (for instance GET), any documents appearing on the source port are ignored. You can control this behaviour with the send-body-anyway parameter (see Parameters).

result

output

true

any

true

The request result document(s). See The response result and report.

report

output

false

application/json

true

A map containing information about the response. See The response result and report.

Options:

Name

Type

Req?

Default

Description

href

xs:anyURI

true

 

The URI to use for the request.

assert

xs:string

false

.?status-code lt 400

Any request can fail, but what exactly failure is depends on the expectations of the receiver. This option takes an XPath expression that can inspect the request results. If the result of this expression (executed after a response is received) is false, dynamic error XC0126 is raised. See Asserting the request status

auth

map(xs:string, item()+)?

false

()

Information for the authentication of the request (in other words: about “logging in”). See Request authentication

headers

map(xs:string, xs:string)?

false

()

A map containing the request headers. Each map key is used as a header name and the value associated is used as the header value.

There are some special rules regarding the request headers, see Specifying request headers.

Request headers can influence the construction of the request. See Usage of request headers.

method

xs:string?

false

GET

The HTTP request method to use for the request. Its value is converted to upper-case.

Any implementations must support the HTTP methods GET (default), POST, PUT, DELETE, and HEAD. Whether any other methods are supported is implementation-defined and therefore dependent on the XProc processor used.

parameters

map(xs:QName, item()*)?

false

()

A map with parameters for fine-tuning the construction of the request and/or the handling of the server response. See Parameters.

serialization

map(xs:QName,item()*)?

false

()

Before the document(s) on the source port are used, they are first serialized (as if written to disk). This option can supply a map with serialization properties, controlling this serialization.

If the source document has a serialization document-property, the two sets of serialization properties are merged (properties in the document-property have precedence).

Description

The p:http-request allows you to send requests to some server and receive their response. You could use this, for instance, to access REST or other services on the web. Another use case is to have your pipeline play “web browser”: get the contents of a web page and interpret this or fill in some page with a form in the background. Although the step is generic, it will probably be used most (exclusively?) using the HTTP(S) protocol, so we’ll concentrate on this.

The HTTP(s) protocol is rather complex and, to support this, p:http-request is also complex. As a result of this, the description of p:http-request is long and may appear intimidating to users who are not familiar with the finer details of the HTTP(S) protocol. Luckily, simple interactions, like just requesting a single web page, are easy (see Basic usage) Let’s take it step by step (if there are parts you don't understand, chances are you don't need them). The HTTP(S) protocol itself is not explained, if you need more information about this, a good place to start would be on Wikipedia.

  1. The p:http-request step first constructs a request:

    • An HTTP(S) request is always to some URI (like https://xprocref.org/). You must specify this URI using the mandatory href option.

    • An HTTP(S) request has a method. Usual values are GET, POST, PUT, DELETE, and HEAD. You can specify this using the method option. Its default value is GET.

    • An HTTP(S) request has request headers: name/value pairs that contain additional information for the server. The main source for specifying request headers is the headers option. Some special handling applies, see Specifying request headers.

    • Once the request headers are known, some of this information is used by p:http-request for additional purposes, like determining the transfer encoding. See Usage of request headers for more information.

    • Any documents that must accompany the request can be supplied on the source port.

    • Some interactions require authorization (“logging in”). This is usually specified with the auth option. See Request authentication for more information.

    • It is possible to construct multipart request: requests where multiple documents are sent at once. See Multipart requests for more information.

    • Further fine-tuning of the request is done using parameters specified in the parameters option. See Parameters for more information.

  2. The request is sent to the server and p:http-request waits for a response. How long the step will wait before giving up can be specified using parameters specified in the parameters option. See Parameters for more information.

  3. A response is received an interpreted:

    • Whether a response is considered successful can be specified using the assert option. If not, error XC0126 is raised. See Asserting the request status for more information.

    • Further fine-tuning of the interpretation of the response is done using parameters specified in the parameters option. See Parameters for more information.

    • Any documents contained in the response appear on the result port.

    • Additional information about the response (its response headers, status code, etc.) appears on the report port, as a map. See The response result and report for more information.

Specifying request headers

An HTTP request has request headers: name/value pairs containing additional information for the server. See for instance Wikipedia for an overview.

The main source for p:http-request for constructing HTTP request headers is its headers option (see Viewing the request headers).

If a single document appears on the source port and we’re not constructing a multipart message, special rules apply:

  • If the (single) document appearing on the source port:

    • is an XML, HTML or text document,

    • and has a serialization document-property,

    • and this serialization document-property has an entry called encoding,

    a charset is appended to the created content-type header of the HTTP request (for more information about this charset parameter, see for instance here).

  • Any document-properties of the (single) document appearing on the source port that are in the http://www.w3.org/ns/xproc-http namespace will be added as request header, using their local name (their name without namespace) as the request header name.

    For a request parameter specified both in a document-property and in the map provided to the headers option, the one in the headers option takes precedence. This comparison is case-insensitive.

When constructing a multipart message, not only the request itself but also its separate documents can have request headers. For more information on how this can be specified see Multipart requests.

Usage of request headers

Once the request headers are constructed (see Specifying request headers), some of these are used for additional purposes:

  • If the value of the content-type request header starts with multipart/, a multipart request is constructed. (regardless of the number of documents appearing on the source port). See Multipart requests.

  • For non multipart messages, it is possible to override the media type (content-type document-property) of the body document. If a single document appears on the source port, we’re not constructing a multipart message and a content-type request header is specified, the value of the content-type request header overrides the value of the content-type document-property.

  • If a transfer-encoding request header is present, the request is sent using that particular encoding (for more information about transfer encodings see here). Examples of values are chunked, compress or gzip.

  • How authorization (“logging in”) is done, is specified using the authorization request header. However, the p:http-request step also has an auth option for specifying this (see Request authentication). If this option is specified, the authorization request header, if present, is ignored. Instead, the value of the authorization request header is determined exclusively by the value of the auth option.

Request authentication

Information about the authorization of a request (“logging in”) is sent to the server with the authorization request header. Experienced users could construct this request header themselves and add it to the step headers option (or use a document-property). However, in most cases, it is easier to pass the authorization information using the auth option and have p:http-request construct the authorization request header for you. If you use the auth option, any authorization request header passed in some other way is ignored.

The auth option contains the credentials (username, password, etc.) of the client and specifies what authentication method is used. It must be a map with string (xs:string) type keys. The following standard keys are defined:

Key

Value data type

Description

username

xs:string

The username for the request.

password

xs:string

The password associated with the username.

auth-method

xs:string

Specifies the authentication method to use. Standard values are Basic or Digest (see here for further information). Whether other authorization methods are supported and how to specify these is implementation-defined and therefore dependent on the XProc processor used.

send-authorization

xs:boolean

This controls the “authorization challenge”:

  • If this key is absent or its value is not true, a first request is sent without authorization information. If the server subsequently requests it, the request is resent with authorization information.

  • If this key’s value is true, the first request immediately contains the authorization information.

If an authorization fails, the request is not retried.

Any other key/value pairs for the auth option map are implementation-defined and therefore dependent on the XProc processor used.

Parameters

The parameters option provides information for fine tuning the construction of the request and/or handling the response. It must be a map with string (xs:string) type keys. The following standard keys are defined:

Key

Value data type

Description

override-content-type

xs:string

The XProc processor must know how to interpret the body of a server response, its data type. Normally this is done by looking at the content-type response header. If this, for instance, is set to application/xml, the response body is interpreted as an XML document. Of course this must succeed, if not, error XC0030 is raised.

If you specify an override-content-type parameter, its value is used instead of that in the content-type response header.

The information about the content-type response header that appears on the report port (see The response result and report) is not changed and still reflects the actual value received from the server.

http-version

xs:string

Specifies the HTTP version to use for the request. Its default value is implementation-defined and therefore depends on the XProc processor used. Most probably it will be 1.1.

accept-multipart

xs:boolean

If this parameter is present and has the value false, any multipart response will result in raising error XC0125. You can use this to prevent unexpected multipart responses wreak havoc in your pipeline.

override-content-encoding

xs:string

The XProc processor must know how the encoding of a server response (for instance: utf-8). Normally this is done by looking at the content-encoding response header. If you specify an override-content-encoding parameter, its value is used instead of that in the content-encoding response header.

The information about the content-encoding response header that appears on the report port (see The response result and report) is not changed and still reflects the actual value received from the server.

permit-expired-ssl-certificate

xs:boolean

If this parameter is present and has the value true, p:http-request does not reject a response where the server provides an expired SSL certificate.

permit-untrusted-ssl-certificate

xs:boolean

If this parameter is present and has the value true, p:http-request does not reject a response where the server provides an SSL certificate which is not trusted, for example, because the certificate authority (CA) is unknown.

follow-redirect

xs:integer

Sometimes a server responds with a redirect, meaning something like “please repeat the request to this different URI”. The follow-redirect parameter tells the XProc processor what to do when a redirect is received:

  • If its value is 0, redirects are not followed.

  • If its value is -1, redirects are followed indefinitely.

  • If its value is positive, at most this number of subsequent redirects are followed.

The default behaviour, when the follow-redirect parameter is not present, is implementation-defined and therefore dependent on the XProc processor used.

timeout

xs:integer

Specifies the number of seconds to wait for a response. If no response is received after approximately this number of seconds, the request is terminated and HTTP status 408 is assumed.

fail-on-timeout

xs:boolean

Sometimes a request results in a timeout. This can either happen by receiving a response with HTTP status 408 or because the number of seconds specified in the timeout parameter is exceeded. If a fail-on-timeout parameter is present and has the value true, this will result in raising error XC0078.

This might be confusing, because XProc also has a [p:]timeout attribute, useable on all steps, that tells the XProc processor how long a step invocation is allowed to take (also specified in seconds). What happens depends on whatever comes first:

  • If the number of seconds specified in the [p:]timeout attribute is exceeded, error XD0053 is raised (a generic timeout error).

    Be careful when you want to use this: whether a processor supports timeouts using the [p:]timeout attribute, and if it does, how precisely and precisely how the execution time of a step is measured, is implementation-defined and therefore dependent on the XProc processor used.

  • If fail-on-timeout is true and a timeout happens, error XC0078 is raised.

status-only

xs:boolean

If this parameter is present and its value is true, it indicates that the pipeline author is interested in the response code only. The result port will not emit anything. The map on the report port will return an empty map as value of its headers entry.

suppress-cookies

xs:boolean

If this parameter is present and its value is true, no cookies are sent with the request.

send-body-anyway

xs:boolean

By default, whether a body is sent with the request depends on the HTTP method used (the value of the method option). For instance, the GET method does not specify a body. When the GET method is used, by default any document(s) on the source port are ignored.

When the send-body-anyway parameter is present and its value is true, a request body will always be constructed, even if the HTTP method used does not specify this.

Any other key/value pairs for the parameters option map are implementation-defined and therefore dependent on the XProc processor used.

The response result and report

When an answer is received from the server, document(s) in the response body will appear as document(s) on the result port. Each document will be parsed according to its content-type. You can override this behaviour using the override-content-type parameter (see Parameters).

In case of a multipart response, each part will become a separate document appearing on the result port. Any response headers associated with a specific part are added to the document-properties of the resulting document.

The report port always returns a map with the following keys/entries:

Key

Value data type

Description

status-code

xs:integer

The HTTP status code for the request, for instance 200 (success) or 404 (failure).

base-uri

xs:anyURI

The URI of the request.

In case of HTTP redirection, this value may be different from the original request URI.

headers

map(xs:string, xs:string)

The HTTP headers returned for the request. Header names are in lower-case. The map may be empty.

Asserting the request status

Any request can fail, but what exactly failure is depends on the expectations of the receiver. The assert option of p:http-request takes an XPath expression that inspects the request results:

  • It must contain a valid (boolean) XPath expression.

  • This expression will be executed when a response is received.

  • The context item when executing the expression is the map that also appears on the report port (see The response result and report).

  • If the expression evaluates to false, the request is considered failed and error XC0126 is raised.

  • If the expression evaluates to true, the request is considered successful. No error is raised.

The default value for the assert option is .?status-code lt 400. Since the context item is the map on the report port, the dot operator . here refers to this map. The .?status-code part is one of the ways to access a map entry (another way to write this is .('status-code')). The referred map entry contains the received (integer) HTTP status code. According to its default interpretation, when less than 400, the response is considered a success. If it’s greater than or equal to 400, it is considered a failure and error XC0126 is raised.

Multipart requests

Multipart requests combine one or more sets of data into a single HTTP request. You use this for file uploads and/or transferring data of several types in one go. For instance, a web page that allows you to upload several images could use a single multipart request to sent all these images to the server. For more information, see for instance Wikipedia or the W3C multipart protocol description.

The p:http-request step constructs a multipart request if one or both of the following conditions is met:

  • Multiple documents appear on the source port.

  • The content-type request header (see Specifying request headers) starts with multipart/.

If no specific content-type request header is specified and the source receives multiple documents, the content type is set to multipart/mixed.

Multipart request must have a boundary marker: a string of characters that is inserted in between the message parts. This is critical, because this marker must not appear anywhere in the data itself. If it does, the request is considered malformed. The boundary marker as used by p:http-request is constructed as follows:

  • If the content-type request header contains a boundary parameter, this is used.

    For instance, by setting the content-type to multipart/mixed; boundary=gc0p4Jq0M2Yt08jU534c0p, the boundary marker becomes --gc0p4Jq0M2Yt08jU534c0p (the two hyphens in front are prescribed by the protocol).

  • If this is not the case, the boundary marker is implementation-defined and therefore dependent on the XProc processor used. Unfortunately, there is no guarantee this boundary marker does not appear in the data itself (which would make the request malformed).

When constructing the multipart message, each document on the source port is serialized (as if written to disk). If a document has a serialization document-property, this is used to determine the serialization format.

The separate documents in a multipart message can have request headers on their own. Examples of often used headers are id, description and disposition. Document-properties of documents on the source port that are in the http://www.w3.org/ns/xproc-http namespace will be used to construct request headers for that particular document, using their local name (their name without namespace) as the request header name.

Examples

Basic usage

The following example:

  • Uses p:http-request to ask for the home page of the https://xprocref.org website, just like a web browser. This fires an HTTPS GET request and waits for the answer.

    Notice that we have to supply a value for the source port, even if we don’t need it. In this case we simply set it to <p:empty>.

  • We strip the resulting HTML page to just its <head> (otherwise the result would be too big to display).

Pipeline document:

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0">

  <p:output port="result" sequence="true"/>

  <p:http-request href="https://xprocref.org">
    <p:with-input port="source">
      <p:empty/>
    </p:with-input>
  </p:http-request>

  <p:delete match="/h:html/h:body"/>

</p:declare-step>

Resulting HTML fragment:

<html xmlns="http://www.w3.org/1999/xhtml">
   <head>
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
      <meta name="viewport" content="width=device-width, initial-scale=1"/>
      <link href="css/bootstrap.min.css" rel="stylesheet"/>
      <link href="css/xprocref.css" rel="stylesheet"/>
      <script defer="" src="js/bootstrap.bundle.min.js"/>
      <title>XProc steps (3.0)</title>
      <link rel="shortcut icon" href="images/favicon.ico"/>
   </head>
</html>

Viewing the request headers

When we want to see what p:http-request sends as request headers, we need some developer website that tells us what the request headers are. There is such a server by Beeceptor: send some HTTP request to https://echo.free.beeceptor.com and what you receive is a JSON message containing information about your request.

Sending a simple GET request to this URI returns (results vary depending on your IP, operating system, browser, etc.):

{
  "method": "GET",
  "protocol": "https",
  "host": "echo.free.beeceptor.com",
  "path": "/",
  "ip": "84.29.5.211:52418",
  "headers": {
    "Host": "echo.free.beeceptor.com",
    "User-Agent": "Apache-HttpClient/4.5.10 (Java/17.0.12)",
    "Accept": "*/*",
    "Accept-Encoding": "gzip,deflate"
  },
  "parsedQueryParams": {}
}

The following example sends a simple HTTPS request to this server and uses the resulting JSON to construct an XML document showing the HTTP request headers sent:

  • The invocation of p:http-request just sends a GET request to https://echo.free.beeceptor.com.

  • The result is a JSON message that the XProc processor turns into a map. This is now our context item, accessible with the dot operator .

  • A sub-map in this map called headers contains the header information we’re interested in. We extract this part into a variable $headers.

  • The <p:for-each> loops over all keys in the $headers sub-map.

  • A p:identity step is used to construct a (single element) XML document, containing the request header name and value: <request-header name="…" value="…"/>

  • The <p:for-each> loop now emits a sequence of documents, one for each request header. A p:wrap-sequence step wraps this into an <http-request-headers> root element to produce a well-formed XML document.

<p:declare-step xmlns:map="http://www.w3.org/2005/xpath-functions/map" xmlns:p="http://www.w3.org/ns/xproc" version="3.0">

  <p:output port="result" sequence="true"/>

  <p:http-request href="https://echo.free.beeceptor.com">
    <p:with-input port="source">
      <p:empty/>
    </p:with-input>
  </p:http-request>

  <p:variable name="headers" as="map(*)" select=".?headers"/>
  <p:for-each>
    <p:with-input select="map:keys($headers)"/>
    <p:identity>
      <p:with-input>
        <request-header name="{.}" value="{$headers(.)}"/>
      </p:with-input>
    </p:identity>
  </p:for-each>
  <p:wrap-sequence wrapper="http-request-headers"/>

</p:declare-step>

Result document:

<http-request-headers>
   <request-header name="Host" value="echo.free.beeceptor.com"/>
   <request-header name="User-Agent" value="Apache-HttpClient/4.5.10 (Java/17.0.13)"/>
   <request-header name="Accept" value="*/*"/>
   <request-header name="Accept-Encoding" value="gzip,deflate"/>
</http-request-headers>

Adding a request header

The headers option can be used for additional request headers. In this example we add the bogus request header called xyz and set it to the value 123. The code to view the request headers is identical to that of Viewing the request headers:

<p:declare-step xmlns:map="http://www.w3.org/2005/xpath-functions/map" xmlns:p="http://www.w3.org/ns/xproc" version="3.0">

  <p:output port="result" sequence="true"/>

  <p:http-request href="https://echo.free.beeceptor.com" headers="map{'xyz': '123' }">
    <p:with-input port="source">
      <p:empty/>
    </p:with-input>
  </p:http-request>

  <p:variable name="headers" as="map(*)" select=".?headers"/>
  <p:for-each>
    <p:with-input select="map:keys($headers)"/>
    <p:identity>
      <p:with-input>
        <request-header name="{.}" value="{$headers(.)}"/>
      </p:with-input>
    </p:identity>
  </p:for-each>
  <p:wrap-sequence wrapper="http-request-headers"/>

</p:declare-step>

Result document:

<http-request-headers>
   <request-header name="Host" value="echo.free.beeceptor.com"/>
   <request-header name="User-Agent" value="Apache-HttpClient/4.5.10 (Java/17.0.13)"/>
   <request-header name="Accept" value="*/*"/>
   <request-header name="Accept-Encoding" value="gzip,deflate"/>
   <request-header name="Xyz" value="123"/>
</http-request-headers>

Notice that the name of the request header is capitalized into Xyz. Request header names are case-insensitive, but it is custom to capitalize them (start with an upper-case character).

Viewing the response headers

Inspecting the response headers can be done by using the map returned on the report port (see The response result and report). The code to view the response headers is almost identical to that of Viewing the request headers:

<p:declare-step xmlns:map="http://www.w3.org/2005/xpath-functions/map" xmlns:p="http://www.w3.org/ns/xproc" version="3.0">

  <p:output port="result" sequence="true"/>

  <p:http-request href="https://echo.free.beeceptor.com" name="request">
    <p:with-input port="source">
      <p:empty/>
    </p:with-input>
  </p:http-request>

  <p:variable name="response-headers" as="map(*)" select=".?headers" pipe="report@request"/>
  <p:for-each>
    <p:with-input select="map:keys($response-headers)"/>
    <p:identity>
      <p:with-input>
        <response-header name="{.}" value="{$response-headers(.)}"/>
      </p:with-input>
    </p:identity>
  </p:for-each>
  <p:wrap-sequence wrapper="http-response-headers"/> 

</p:declare-step>

Result document:

<http-response-headers>
   <response-header name="access-control-allow-origin" value="*"/>
   <response-header name="alt-svc" value="h3=&#34;:443&#34;; ma=2592000"/>
   <response-header name="content-type" value="application/json"/>
   <response-header name="date" value="Mon, 02 Dec 2024 12:40:24 GMT"/>
   <response-header name="vary" value="Accept-Encoding"/>
   <response-header name="transfer-encoding" value="chunked"/>
</http-response-headers>

Additional details

  • A relative value for the href option is resolved against the base URI of the element on which this option is specified. In most cases this will be the static base URI of your pipeline (the path where the XProc source containing the p:http-request is stored). This is very probably not what you want.

  • HTTP request header names are case-insensitive, but keys in maps are not. This means that you could specify the same request header multiple times in the headers option. For instance as Content-Type and content-type. If that happens, error XC0127 is raised.

  • When constructing multipart requests (see Multipart requests): multiple documents on the source port combined with a content-type header that does not start with multipart/ raises error XC0133.

Errors raised

Error code

Description

XC0003

It is a dynamic error if a “username” or a “password” key is present without specifying a value for the “auth-method” key, if the requested auth-method isn't supported, or the authentication challenge contains an authentication method that isn't supported.

XC0030

It is a dynamic error if the response body cannot be interpreted as requested (e.g. application/json to override application/xml content).

XC0078

It is a dynamic error if the value associated with the “fail-on-timeout” is associated with true() and a HTTP status code 408 is encountered.

XC0122

It is a dynamic error if the given method is not supported.

XC0123

It is a dynamic error if any key in the “auth” map is associated with a value that is not an instance of the required type.

XC0124

It is a dynamic error if any key in the “parameters” map is associated with a value that is not an instance of the required type.

XC0125

It is a dynamic error if the key “accept-multipart” as the value false() and a multipart response is detected.

XC0126

It is a dynamic error if the XPath expression in assert evaluates to false.

XC0127

It is a dynamic error if the headers map contains two keys that are the same when compared in a case-insensitive manner.

XC0128

It is a dynamic error if the URI’s scheme is unknown or not supported.

XC0129

It is a dynamic error if the requested HTTP version is not supported.

XC0131

It is a dynamic error if the processor cannot support the requested encoding.

XC0132

It is a dynamic error if the override content encoding cannot be supported.

XC0133

It is a dynamic error if more than one document appears on the source port and a content-type header is present and the content type specified is not a multipart content type.

XC0203

It is a dynamic error if the specified boundary is not valid (for example, if it begins with two hyphens “--”).

XD0079

It is a dynamic error if a supplied content-type is not a valid media type of the form “type/subtype+ext” or “type/subtype”.

Reference information

This description of the p:http-request step is for XProc version: 3.1. This is a required step (an XProc 3.1 processor must support this).

The formal specification for the p:http-request step can be found here.

The p:http-request step is part of categories:

The p:http-request step is also present in version: 3.0.