Interact using HTTP (or related protocols).
<p:declare-step type="p:http-request"> <input port="source" primary="true" content-types="any" sequence="true"/> <output port="result" primary="true" content-types="any" sequence="true"/> <output port="report" primary="false" content-types="application/json" sequence="true"/> <option name="href" as="xs:anyURI" required="true"/> <option name="assert" as="xs:string" required="false" select="'.?status-code lt 400'"/> <option name="auth" as="map(xs:string, item()+)?" required="false" select="()"/> <option name="headers" as="map(xs:string, xs:string)?" required="false" select="()"/> <option name="method" as="xs:string?" required="false" select="'GET'"/> <option name="parameters" as="map(xs:QName, item()*)?" required="false" select="()"/> <option name="serialization" as="map(xs:QName,item()*)?" required="false" select="()"/> </p:declare-step>
The p:http-request
step allows pipelines to interact with resources (for instance websites) over HTTP or related protocols.
Ports:
Port | Type | Primary? | Content types | Seq? | Description |
---|---|---|---|---|---|
|
|
|
|
| Document(s) used in constructing the request body. By default, source documents are used for HTTP methods that require a body (for instance |
|
|
|
|
| The request result document(s). See The response result and report. |
|
|
|
|
| A map containing information about the response. See The response result and report. |
Options:
The p:http-request
allows you to send requests to some server and receive their response. You could use this, for instance, to access REST or other
services on the web. Another use case is to have your pipeline play “web browser”: get the contents of a web page and interpret
this or fill in some page with a form in the background. Although the step is generic, it will probably be used most (exclusively?) using the
HTTP(S) protocol, so we’ll concentrate on this.
The HTTP(s) protocol is rather complex and, to support this, p:http-request
is also complex. As a result of this, the description of p:http-request
is
long and may appear intimidating to users who are not familiar with the finer details of the HTTP(S) protocol. Luckily, simple interactions,
like just requesting a single web page, are easy (see Basic usage) Let’s take it step by step (if there are parts
you don't understand, chances are you don't need them). The HTTP(S) protocol itself is not explained, if you need more information about this, a
good place to start would be on Wikipedia.
The p:http-request
step first constructs a request:
An HTTP(S) request is always to some URI (like https://xprocref.org/
). You must specify this URI using the mandatory href
option.
An HTTP(S) request has a method. Usual values are GET
, POST
, PUT
, DELETE
, and
HEAD
. You can specify this using the method
option. Its default value is GET
.
An HTTP(S) request has request headers: name/value pairs that contain additional information for the server. The main source for
specifying request headers is the headers
option. Some special handling applies, see Specifying request headers.
Once the request headers are known, some of this information is used by p:http-request
for additional purposes, like determining the
transfer encoding. See Usage of request headers for more information.
Any documents that must accompany the request can be supplied on the source
port.
Some interactions require authorization (“logging in”). This is usually specified with the auth
option. See Request authentication for more information.
It is possible to construct multipart request: requests where multiple documents are sent at once. See Multipart requests for more information.
Further fine-tuning of the request is done using parameters specified in the parameters
option. See Parameters for more information.
The request is sent to the server and p:http-request
waits for a response. How long the step will wait before giving up can be specified using
parameters specified in the parameters
option. See Parameters for more information.
A response is received an interpreted:
Whether a response is considered successful can be specified using the assert
option. If not, error XC0126
is raised. See Asserting the request status for more information.
Further fine-tuning of the interpretation of the response is done using parameters specified in the parameters
option. See Parameters for more information.
Any documents contained in the response appear on the result
port.
Additional information about the response (its response headers, status code, etc.) appears on the report
port, as a
map. See The response result and report for more information.
An HTTP request has request headers: name/value pairs containing additional information for the server. See for instance Wikipedia for an overview.
The main source for p:http-request
for constructing HTTP request headers is its headers
option (see Viewing the request headers).
If a single document appears on the source
port and we’re not constructing a
multipart message, special rules apply:
If the (single) document appearing on the source
port:
is an XML, HTML or text document,
and has a serialization
document-property,
and this serialization
document-property has an entry called encoding
,
a charset
is appended to the created content-type
header of the HTTP request (for more information about
this charset
parameter, see for instance here).
Any document-properties of the (single) document appearing on the source
port that are in the
http://www.w3.org/ns/xproc-http
namespace will be added as request header, using their local name (their name without
namespace) as the request header name.
For a request parameter specified both in a document-property and in the map provided to the headers
option, the one
in the headers
option takes precedence. This comparison is case-insensitive.
When constructing a multipart message, not only the request itself but also its separate documents can have request headers. For more information on how this can be specified see Multipart requests.
Once the request headers are constructed (see Specifying request headers), some of these are used for additional purposes:
If the value of the content-type
request header starts with multipart/
, a multipart request is constructed.
(regardless of the number of documents appearing on the source
port). See Multipart requests.
For non multipart messages, it is possible to override the media type (content-type
document-property) of the
body document. If a single document appears on the source
port, we’re not constructing a multipart message and a
content-type
request header is specified, the value of the content-type
request header overrides the value of
the content-type
document-property.
If a transfer-encoding
request header is present, the request is sent using that particular encoding (for more
information about transfer encodings see here). Examples of values are chunked
, compress
or gzip
.
How authorization (“logging in”) is done, is specified using the authorization
request header. However, the
p:http-request
step also has an auth
option for specifying this (see Request authentication). If this option is
specified, the authorization
request header, if present, is ignored. Instead, the value of the authorization
request header is determined exclusively by the value of the auth
option.
Information about the authorization of a request (“logging in”) is sent to the server with the authorization
request header. Experienced users could construct this request header themselves and add it to the step headers
option (or
use a document-property). However, in most cases, it is easier to pass the authorization information using the auth
option
and have p:http-request
construct the authorization
request header for you. If you use the auth
option, any
authorization
request header passed in some other way is ignored.
The auth
option contains the credentials (username, password, etc.) of the client and specifies what authentication
method is used. It must be a map with string (xs:string
) type keys. The following standard keys are defined:
Key | Value data type | Description |
---|---|---|
|
| The username for the request. |
|
| The password associated with the username. |
|
| Specifies the authentication method to use. Standard values are |
|
| This controls the “authorization challenge”:
|
If an authorization fails, the request is not retried.
Any other key/value pairs for the auth
option map are implementation-defined and therefore dependent on the XProc
processor used.
The parameters
option provides information for fine tuning the construction of the request and/or handling the response.
It must be a map with string (xs:string
) type keys. The following standard keys are defined:
Key | Value data type | Description |
---|---|---|
|
| The XProc processor must know how to interpret the body of a server response, its data type. Normally this is done by looking at
the If you specify an The information about the |
|
| Specifies the HTTP version to use for the request. Its default value is implementation-defined and therefore depends on the
XProc processor used. Most probably it will be |
|
| If this parameter is present and has the value |
|
| The XProc processor must know how the encoding of a server response (for instance: The information about the |
|
| If this parameter is present and has the value |
|
| If this parameter is present and has the value |
|
| Sometimes a server responds with a redirect, meaning something like “please repeat the request to
this different URI”. The
The default behaviour, when the |
|
| Specifies the number of seconds to wait for a response. If no response is received after approximately this number of seconds,
the request is terminated and HTTP status |
|
| Sometimes a request results in a timeout. This can either happen by receiving a response with HTTP status This might be confusing, because XProc also has a
|
|
| If this parameter is present and its value is |
|
| If this parameter is present and its value is |
|
| By default, whether a body is sent with the request depends on the HTTP method used (the value of the When the |
Any other key/value pairs for the parameters
option map are implementation-defined and therefore dependent on the XProc
processor used.
When an answer is received from the server, document(s) in the response body will appear as document(s) on the result
port.
Each document will be parsed according to its content-type. You can override this behaviour using the override-content-type
parameter (see Parameters).
In case of a multipart response, each part will become a separate document appearing on the result
port. Any response headers
associated with a specific part are added to the document-properties of the resulting document.
The report
port always returns a map with the following keys/entries:
Key | Value data type | Description |
---|---|---|
|
| The HTTP status code for the request, for instance |
|
| The URI of the request. In case of HTTP redirection, this value may be different from the original request URI. |
|
| The HTTP headers returned for the request. Header names are in lower-case. The map may be empty. |
Any request can fail, but what exactly failure is depends on the expectations of the receiver. The assert
option of p:http-request
takes an XPath expression that inspects the request results:
It must contain a valid (boolean) XPath expression.
This expression will be executed when a response is received.
The context item when executing the expression is the map that also appears on the report
port (see The response result and report).
If the expression evaluates to false
, the request is considered failed and error XC0126
is
raised.
If the expression evaluates to true
, the request is considered successful. No error is raised.
The default value for the assert
option is .?status-code lt 400
. Since the context item is the map on the
report
port, the dot operator .
here refers to this map. The .?status-code
part is one of the ways to
access a map entry (another way to write this is .('status-code')
). The referred map entry contains the received (integer) HTTP
status code. According to its default interpretation, when less than 400
, the response is considered a success. If it’s
greater than or equal to 400
, it is considered a failure and error XC0126
is raised.
Multipart requests combine one or more sets of data into a single HTTP request. You use this for file uploads and/or transferring data of several types in one go. For instance, a web page that allows you to upload several images could use a single multipart request to sent all these images to the server. For more information, see for instance Wikipedia or the W3C multipart protocol description.
The p:http-request
step constructs a multipart request if one or both of the following conditions is met:
Multiple documents appear on the source
port.
The content-type
request header (see Specifying request headers) starts with multipart/
.
If no specific content-type
request header is specified and the source
receives multiple documents, the content
type is set to multipart/mixed
.
Multipart request must have a boundary marker: a string of characters that is inserted in between the message parts.
This is critical, because this marker must not appear anywhere in the data itself. If it does, the request is considered
malformed. The boundary marker as used by p:http-request
is constructed as follows:
If the content-type
request header contains a boundary parameter, this is used.
For instance, by setting the content-type
to multipart/mixed; boundary=gc0p4Jq0M2Yt08jU534c0p
, the boundary
marker becomes --gc0p4Jq0M2Yt08jU534c0p
(the two hyphens in front are prescribed by the protocol).
If this is not the case, the boundary marker is implementation-defined and therefore dependent on the XProc processor used. Unfortunately, there is no guarantee this boundary marker does not appear in the data itself (which would make the request malformed).
When constructing the multipart message, each document on the source
port is serialized (as if written to disk). If a
document has a serialization
document-property, this is used to determine the serialization format.
The separate documents in a multipart message can have request headers on their own. Examples of often used headers are id
,
description
and disposition
. Document-properties of documents on the source
port that are in the
http://www.w3.org/ns/xproc-http
namespace will be used to construct request headers for that particular document, using their
local name (their name without namespace) as the request header name.
The following example:
Uses p:http-request
to ask for the home page of the https://xprocref.org website,
just like a web browser. This fires an HTTPS GET
request and waits for the answer.
Notice that we have to supply a value for the source
port, even if we don’t need it. In this case we simply set it
to <p:empty>
.
We strip the resulting HTML page to just its <head>
(otherwise the result would be too big to display).
Pipeline document:
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0"> <p:output port="result" sequence="true"/> <p:http-request href="https://xprocref.org"> <p:with-input port="source"> <p:empty/> </p:with-input> </p:http-request> <p:delete match="/h:html/h:body"/> </p:declare-step>
Resulting HTML fragment:
<html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/> <meta name="viewport" content="width=device-width, initial-scale=1"/> <link href="css/bootstrap.min.css" rel="stylesheet"/> <link href="css/xprocref.css" rel="stylesheet"/> <script defer="" src="js/bootstrap.bundle.min.js"/> <title>XProc steps (3.0)</title> <link rel="shortcut icon" href="images/favicon.ico"/> </head> </html>
When we want to see what p:http-request
sends as request headers, we need some developer website that tells us what the request headers are. There
is such a server by Beeceptor: send some HTTP request to https://echo.free.beeceptor.com
and what you receive is a JSON
message containing information about your request.
Sending a simple GET
request to this URI returns (results vary depending on your IP, operating system, browser, etc.):
{ "method": "GET", "protocol": "https", "host": "echo.free.beeceptor.com", "path": "/", "ip": "84.29.5.211:52418", "headers": { "Host": "echo.free.beeceptor.com", "User-Agent": "Apache-HttpClient/4.5.10 (Java/17.0.12)", "Accept": "*/*", "Accept-Encoding": "gzip,deflate" }, "parsedQueryParams": {} }
The following example sends a simple HTTPS request to this server and uses the resulting JSON to construct an XML document showing the HTTP request headers sent:
The invocation of p:http-request
just sends a GET
request to https://echo.free.beeceptor.com
.
The result is a JSON message that the XProc processor turns into a map. This is now our context item, accessible with the dot operator
.
A sub-map in this map called headers
contains the header information we’re interested in. We extract this part into
a variable $headers
.
The <p:for-each>
loops over all keys in the $headers
sub-map.
A p:identity
step is used to construct a (single element) XML document, containing the request header name and value:
<request-header name="…" value="…"/>
The <p:for-each>
loop now emits a sequence of documents, one for each request header. A p:wrap-sequence
step
wraps this into an <http-request-headers>
root element to produce a well-formed XML document.
<p:declare-step xmlns:map="http://www.w3.org/2005/xpath-functions/map" xmlns:p="http://www.w3.org/ns/xproc" version="3.0"> <p:output port="result" sequence="true"/> <p:http-request href="https://echo.free.beeceptor.com"> <p:with-input port="source"> <p:empty/> </p:with-input> </p:http-request> <p:variable name="headers" as="map(*)" select=".?headers"/> <p:for-each> <p:with-input select="map:keys($headers)"/> <p:identity> <p:with-input> <request-header name="{.}" value="{$headers(.)}"/> </p:with-input> </p:identity> </p:for-each> <p:wrap-sequence wrapper="http-request-headers"/> </p:declare-step>
Result document:
<http-request-headers> <request-header name="Host" value="echo.free.beeceptor.com"/> <request-header name="User-Agent" value="Apache-HttpClient/4.5.10 (Java/17.0.13)"/> <request-header name="Accept" value="*/*"/> <request-header name="Accept-Encoding" value="gzip,deflate"/> </http-request-headers>
The headers
option can be used for additional request headers. In this example we add the bogus request header called
xyz
and set it to the value 123
. The code to view the request headers is identical to that of Viewing the request headers:
<p:declare-step xmlns:map="http://www.w3.org/2005/xpath-functions/map" xmlns:p="http://www.w3.org/ns/xproc" version="3.0"> <p:output port="result" sequence="true"/> <p:http-request href="https://echo.free.beeceptor.com" headers="map{'xyz': '123' }"> <p:with-input port="source"> <p:empty/> </p:with-input> </p:http-request> <p:variable name="headers" as="map(*)" select=".?headers"/> <p:for-each> <p:with-input select="map:keys($headers)"/> <p:identity> <p:with-input> <request-header name="{.}" value="{$headers(.)}"/> </p:with-input> </p:identity> </p:for-each> <p:wrap-sequence wrapper="http-request-headers"/> </p:declare-step>
Result document:
<http-request-headers> <request-header name="Host" value="echo.free.beeceptor.com"/> <request-header name="User-Agent" value="Apache-HttpClient/4.5.10 (Java/17.0.13)"/> <request-header name="Accept" value="*/*"/> <request-header name="Accept-Encoding" value="gzip,deflate"/> <request-header name="Xyz" value="123"/> </http-request-headers>
Notice that the name of the request header is capitalized into Xyz
. Request header names are case-insensitive, but it is custom
to capitalize them (start with an upper-case character).
Inspecting the response headers can be done by using the map returned on the report
port (see The response result and report). The code to view the response headers is almost identical to that of Viewing the request headers:
<p:declare-step xmlns:map="http://www.w3.org/2005/xpath-functions/map" xmlns:p="http://www.w3.org/ns/xproc" version="3.0"> <p:output port="result" sequence="true"/> <p:http-request href="https://echo.free.beeceptor.com" name="request"> <p:with-input port="source"> <p:empty/> </p:with-input> </p:http-request> <p:variable name="response-headers" as="map(*)" select=".?headers" pipe="report@request"/> <p:for-each> <p:with-input select="map:keys($response-headers)"/> <p:identity> <p:with-input> <response-header name="{.}" value="{$response-headers(.)}"/> </p:with-input> </p:identity> </p:for-each> <p:wrap-sequence wrapper="http-response-headers"/> </p:declare-step>
Result document:
<http-response-headers> <response-header name="access-control-allow-origin" value="*"/> <response-header name="alt-svc" value="h3=":443"; ma=2592000"/> <response-header name="content-type" value="application/json"/> <response-header name="date" value="Mon, 02 Dec 2024 12:40:24 GMT"/> <response-header name="vary" value="Accept-Encoding"/> <response-header name="transfer-encoding" value="chunked"/> </http-response-headers>
A relative value for the href
option is resolved against the base URI of the element on which this option is specified. In
most cases this will be the static base URI of your pipeline (the path where the XProc source containing the p:http-request
is stored). This is very
probably not what you want.
HTTP request header names are case-insensitive, but keys in maps are not. This means that you could specify the same request header multiple
times in the headers
option. For instance as Content-Type
and content-type
. If that happens, error
XC0127
is raised.
When constructing multipart requests (see Multipart requests): multiple documents on the source
port combined with a
content-type
header that does not start with multipart/
raises error XC0133
.
Error code | Description |
---|---|
It is a dynamic error if a “ | |
It is a dynamic error if the response body cannot be interpreted as requested (e.g. | |
It is a dynamic error if the value associated with the “ | |
It is a dynamic error if the given method is not supported. | |
It is a dynamic error if any key in the “ | |
It is a dynamic error if any key in the “parameters” map is associated with a value that is not an instance of the required type. | |
It is a dynamic error if the key “ | |
It is a dynamic error if the XPath expression in | |
It is a dynamic error if the | |
It is a dynamic error if the URI’s scheme is unknown or not supported. | |
It is a dynamic error if the requested HTTP version is not supported. | |
It is a dynamic error if the processor cannot support the requested encoding. | |
It is a dynamic error if the override content encoding cannot be supported. | |
It is a dynamic error if more than one document appears on the | |
It is a dynamic error if the specified boundary is not valid (for example, if it begins with two hyphens “--”). | |
It is a dynamic error if a supplied content-type is not a valid media type of the form “ |
This description of the p:http-request
step is for XProc version: 3.1. This is a required step (an XProc 3.1 processor must support this).
The formal specification for the p:http-request
step can be found here.
The p:http-request
step is part of categories:
The p:http-request
step is also present in version:
3.0.