Distributed under the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
This library can be used to create applications that need to parse HTTP streams.
Warning
|
This is not an official Boost C++ library. It wasn’t reviewed and can’t be downloaded from www.boost.org. This library will be reviewed eventually. |
Boost.Http is a library that provides an incremental HTTP parser and a set of mini-parsers that can be used independently or combined [1]. A future version will also provide a message generator.
The highlights are:
-
Support for modern HTTP features.
-
Simple.
-
Portable (C++03 and very few other dependencies).
-
Just like Ryan Dahl’s HTTP parser, this parser does not make any syscalls or allocations. It also does not buffer data.
-
You can mutate the passed buffer [2].
-
It doesn’t steal control flow from your application [3]. Great for HTTP pipelining.
-
Matching and decoding tokens as separate steps [4].
1. Using
1.1. Requirements
-
CMake 3.1.0 or newer [5]. You can skip this requirement if you don’t intend to run the tests.
-
Boost 1.57 or more recent.
-
boost::asio::const_buffer
. -
boost::string_ref
.
-
-
asciidoctor [6]. You’ll also need pandoc if you want to generate the ePUB output.
2. Tutorial
2.1. Parsing (beginner)
In this tutorial, you’ll learn how to use this library to parse HTTP streams easily.
Note
|
We assume the reader has basic understanding of C++ and Boost.Asio. |
We start with the code that should resemble the structure of the program you’re about to code. And this structure is as follows:
#include <boost/http/reader/request.hpp>
#include <string>
#include <map>
namespace http = boost::http;
namespace asio = boost::asio;
struct my_socket_consumer
{
private:
http::reader::request request_reader;
std::string buffer;
std::string last_header;
public:
std::string method;
std::string request_target;
int version;
std::multimap<std::string, std::string> headers;
void on_socket_callback(asio::buffer data)
{
using namespace http::token;
using token::code;
buffer.push_back(data);
request_reader.set_buffer(buffer);
while (request_reader.code() != code::end_of_message) {
switch (request_reader.code()) {
case code::skip:
// do nothing
break;
case code::method:
method = request_reader.value<token::method>();
break;
case code::request_target:
request_target = request_reader.value<token::request_target>();
break;
case code::version:
version = request_reader.value<token::version>();
break;
case code::field_name:
case code::trailer_name:
last_header = request_reader.value<token::field_name>();
}
request_reader.next();
}
request_reader.next();
ready();
}
protected:
virtual void ready() = 0;
};
You’re building a piece of code that consumes HTTP from somewhere — the in — and spits it out in the form of C++ structured data to somewhere else — the out.
The in of your program is connected to the above piece of code through the
on_socket_callback
member function. The out of your program is connected to
the previous piece of code through the ready
overrideable member-function.
By now I shouldn’t be worried about your understanding of how you’ll connect the network I/O with the in of the program. The connection point is obvious by now. However, I’ll briefly explain the out connection point and then we can proceed to delve into the inout-inbetween (Danas) part of the program.
Once the ready
member function is called, the data for your request will be
available in the method
, request_target
and the other public
variables. From now on, I’ll focus my attention to the sole implementation of
my_socket_consumer::on_socket_callback
.
void my_socket_consumer::on_socket_callback(asio::buffer data)
{
//http::reader::request request_reader;
//std::string buffer;
//std::string last_header;
using namespace http::token;
using token::code;
buffer.push_back(data);
request_reader.set_buffer(buffer);
while (request_reader.code() != code::end_of_message) {
switch (request_reader.code()) {
case code::skip:
// do nothing
break;
case code::method:
method = request_reader.value<token::method>();
break;
case code::request_target:
request_target = request_reader.value<token::request_target>();
break;
case code::version:
version = request_reader.value<token::version>();
break;
case code::field_name:
case code::trailer_name:
last_header = request_reader.value<token::field_name>();
}
request_reader.next();
}
request_reader.next();
ready();
}
Try to keep in mind the three variables that will really orchestrate the flow:
request_reader
, buffer
and last_header
.
The whole work is about managing the buffer and managing the tokens.
The token access is very easy. As the parser is incremental, there is only one
token at a time. I don’t need to explain Boost.Http control-flow because the
control flow will be coded by you (a library, not a framework). You only have to
use code()
to check the current token and value<T>()
to extract its value.
Use next()
to advance a token.
Warning
|
There is only one caveat. The parser doesn’t buffer data and will decode the
token into a value (the This means you cannot extract the current value once you drop current buffer data. As a nice side effect, you spare CPU time for the tokens you do not need to decode (match’n’decoding as separate steps). |
The parser doesn’t buffer data, which means when we use the set_buffer
member
function, request_reader
only maintains a view to the passed buffer, which
we’ll refer to as the virtual buffer from now on.
In the virtual buffer, there is head/current and remaining/tail.
request_reader
doesn’t store a pointer/address/index to the real buffer. Once
a token is consumed, his bytes (head) are discarded from the virtual
buffer. When you mutate the real buffer, the virtual buffer is invalidated and
you must inform the parser using set_buffer
. However, the bytes discarded from
the virtual buffer shouldn’t appear again. You must keep track of the number of
discarded bytes to prepare the buffer to the next call to set_buffer
. The
previous code doesn’t handle that.
The new tool that you should be presented now is token_size()
. token_size()
will return the size in bytes of current/head.
Warning
|
There is no guarantee token_size() returns the same size as returned
by string_length(request_reader.value<T>()) . You need to use
token_size() to compute the number of discarded bytes.
|
void my_socket_consumer::on_socket_callback(asio::buffer data)
{
using namespace http::token;
using token::code;
buffer.push_back(data);
request_reader.set_buffer(buffer);
std::size_t nparsed = 0; //< NEW
while (request_reader.code() != code::end_of_message) {
switch (request_reader.code()) {
case code::skip:
// do nothing
break;
case code::method:
method = request_reader.value<token::method>();
break;
case code::request_target:
request_target = request_reader.value<token::request_target>();
break;
case code::version:
version = request_reader.value<token::version>();
break;
case code::field_name:
case code::trailer_name:
last_header = request_reader.value<token::field_name>();
}
nparsed += request_reader.token_size(); //< NEW
request_reader.next();
}
nparsed += request_reader.token_size(); //< NEW
request_reader.next();
buffer.erase(0, nparsed); //< NEW
ready();
}
nparsed
was easy. However, the while(request_reader.code() !=
code::end_of_message)
doesn’t seem right. It’s very error-prone to assume the
full HTTP message will be ready in a single call to on_socket_callback
. Error
handling must be introduced in the code.
void my_socket_consumer::on_socket_callback(asio::buffer data)
{
using namespace http::token;
using token::code;
buffer.push_back(data);
request_reader.set_buffer(buffer);
std::size_t nparsed = 0;
while (request_reader.code() != code::error_insufficient_data //< NEW
&& request_reader.code() != code::end_of_message) { //< NEW
switch (request_reader.code()) {
case code::skip:
// do nothing
break;
case code::method:
method = request_reader.value<token::method>();
break;
case code::request_target:
request_target = request_reader.value<token::request_target>();
break;
case code::version:
version = request_reader.value<token::version>();
break;
case code::field_name:
case code::trailer_name:
last_header = request_reader.value<token::field_name>();
}
nparsed += request_reader.token_size();
request_reader.next();
}
nparsed += request_reader.token_size();
request_reader.next();
buffer.erase(0, nparsed);
if (request_reader.code() == code::error_insufficient_data) //< NEW
return; //< NEW
ready();
}
Note
|
Don’t worry about token_size(code::error_insufficient_data) being added
to nparsed . This (error) "token" is defined to be 0-size (it fits perfectly
with the other rules).
|
Just because it’s easy and we’re already at it, let’s handle the other errors as well:
void my_socket_consumer::on_socket_callback(asio::buffer data)
{
using namespace http::token;
using token::code;
buffer.push_back(data);
request_reader.set_buffer(buffer);
std::size_t nparsed = 0;
while (request_reader.code() != code::error_insufficient_data
&& request_reader.code() != code::end_of_message) {
switch (request_reader.code()) {
case code::error_set_method: //< NEW
case code::error_use_another_connection: //< NEW
// Can only happen in response parsing code.
assert(false); //< NEW
case code::error_invalid_data: //< NEW
case code::error_no_host: //< NEW
case code::error_invalid_content_length: //< NEW
case code::error_content_length_overflow: //< NEW
case code::error_invalid_transfer_encoding: //< NEW
case code::error_chunk_size_overflow: //< NEW
throw "invalid HTTP data"; //< NEW
case code::skip:
// do nothing
break;
case code::method:
method = request_reader.value<token::method>();
break;
case code::request_target:
request_target = request_reader.value<token::request_target>();
break;
case code::version:
version = request_reader.value<token::version>();
break;
case code::field_name:
case code::trailer_name:
last_header = request_reader.value<token::field_name>();
}
nparsed += request_reader.token_size();
request_reader.next();
}
nparsed += request_reader.token_size();
request_reader.next();
buffer.erase(0, nparsed);
if (request_reader.code() == code::error_insufficient_data)
return;
ready();
}
And buffer management is complete. However, the code only demonstrated how to extract simple tokens. Field name and field value are simple tokens, but they are usually tied together into a complex structure.
void my_socket_consumer::on_socket_callback(asio::buffer data)
{
using namespace http::token;
using token::code;
buffer.push_back(data);
request_reader.set_buffer(buffer);
std::size_t nparsed = 0;
while (request_reader.code() != code::error_insufficient_data
&& request_reader.code() != code::end_of_message) {
switch (request_reader.code()) {
// ...
case code::skip:
break;
case code::method:
method = request_reader.value<token::method>();
break;
case code::request_target:
request_target = request_reader.value<token::request_target>();
break;
case code::version:
version = request_reader.value<token::version>();
break;
case code::field_name:
case code::trailer_name:
last_header = request_reader.value<token::field_name>();
break;
case code::field_value: //< NEW
case code::trailer_value: //< NEW
// NEW
headers.emplace(last_header,
request_reader.value<token::field_value>());
}
nparsed += request_reader.token_size();
request_reader.next();
}
nparsed += request_reader.token_size();
request_reader.next();
buffer.erase(0, nparsed);
if (request_reader.code() == code::error_insufficient_data)
return;
ready();
}
last_header
did the trick. Easy, but maybe we want to separate headers and
trailers (the HTTP headers that are sent after the message body). This task
can be accomplished by the use of structural tokens.
void my_socket_consumer::on_socket_callback(asio::buffer data)
{
// NEW:
// We have to declare `bool my_socket_consumer::use_trailers = false` and
// `std::multimap<std::string, std::string> my_socket_consumer::trailers`.
using namespace http::token;
using token::code;
buffer.push_back(data);
request_reader.set_buffer(buffer);
std::size_t nparsed = 0;
while (request_reader.code() != code::error_insufficient_data
&& request_reader.code() != code::end_of_message) {
switch (request_reader.code()) {
// ...
case code::skip:
break;
case code::method:
method = request_reader.value<token::method>();
break;
case code::request_target:
request_target = request_reader.value<token::request_target>();
break;
case code::version:
version = request_reader.value<token::version>();
break;
case code::field_name:
case code::trailer_name:
last_header = request_reader.value<token::field_name>();
break;
case code::field_value:
case code::trailer_value:
// NEW
(use_trailers ? trailers : headers)
.emplace(last_header,
request_reader.value<token::field_value>());
break;
case code::end_of_headers: //< NEW
use_trailers = true; //< NEW
}
nparsed += request_reader.token_size();
request_reader.next();
}
nparsed += request_reader.token_size();
request_reader.next();
buffer.erase(0, nparsed);
if (request_reader.code() == code::error_insufficient_data)
return;
ready();
}
Note
|
Maybe you had a gut feeling and thought that the previous code was too
strange. If Yes, I unnecessarily complicated the code here to introduce you the concept of structural tokens. They are very important and usually you’ll end up using them. Maybe this tutorial needs some revamping after the library evolved a few times. Also notice that here you can use either
|
Some of the structural tokens' properties are:
-
No
value<T>()
associated.value<T>()
extraction is a property exclusive of the data tokens. -
It might be 0-sized.
-
They are always emitted (e.g.
code::end_of_body
will be emitted beforecode::end_of_message
even if nocode::body_chunk
is present).
We were using the code::end_of_message
structural token since the initial
version of the code, so they aren’t completely alien. However, we were ignoring
one very important HTTP parsing feature for this time. It’s the last missing bit
before your understanding to use this library is complete. Our current code
lacks the ability to handle HTTP pipelining.
HTTP pipelining is the feature that allows HTTP clients to send HTTP requests
“in batch”. In other words, they may send several requests at once over the same
connection before the server creates a response to them. If the previous code
faces this situation, it’ll stop parsing on the first request and possibly wait
forever until the on_socket_callback
is called again with more data (yeap,
networking code can be hard with so many little details).
void my_socket_consumer::on_socket_callback(asio::buffer data)
{
using namespace http::token;
using token::code;
buffer.push_back(data);
request_reader.set_buffer(buffer);
std::size_t nparsed = 0;
while (request_reader.code() != code::error_insufficient_data
&& request_reader.code() != code::end_of_message) {
switch (request_reader.code()) {
// ...
case code::skip:
break;
case code::method:
use_trailers = false; //< NEW
headers.clear(); //< NEW
trailers.clear(); //< NEW
method = request_reader.value<token::method>();
break;
case code::request_target:
request_target = request_reader.value<token::request_target>();
break;
case code::version:
version = request_reader.value<token::version>();
break;
case code::field_name:
case code::trailer_name:
last_header = request_reader.value<token::field_name>();
break;
case code::field_value:
case code::trailer_value:
(use_trailers ? trailers : headers)
.emplace(last_header,
request_reader.value<token::field_value>());
break;
case code::end_of_headers:
use_trailers = true;
}
nparsed += request_reader.token_size();
request_reader.next();
}
nparsed += request_reader.token_size();
request_reader.next();
buffer.erase(0, nparsed);
if (request_reader.code() == code::error_insufficient_data)
return;
ready();
if (buffer.size() > 0) //< NEW
on_socket_callback(); //< NEW
}
There are HTTP libraries that could adopt a “synchronous” approach where the
user must immediately give a HTTP response once the ready()
callback is called
so the parsing code can parse the whole buffer until the end and we could just
put the ready()
call into the code::end_of_message
case.
There are HTTP libraries that follow ASIO active style and we expect the user to
call something like async_read_request
before it can read the next request. In
this case, the solution for HTTP pipelining would be different.
There are libraries that don’t follow ASIO style, but don’t force the user to
send HTTP responses immediately on the ready()
callback. In such cases,
synchronization/coordination of the response generation by the user and parse
resuming by the library is necessary.
This point can be rather diverse and the code for this tutorial only shows a rather quick’n’dirty solution. Any different solution to keep the parsing train at full-speed is left as an exercise to the reader.
The interesting point about the code here is to clear the state of the to-be-parsed message before each request-response pair. In the previous code, this was done binding the “method token arrived” event — the first token in a HTTP request — with such state cleanup.
By now, you’re ready to use this library in your projects. You may want to check Boost.Http own usage of the parser or the Tufão library as real-world and complete examples of this parser.
2.2. Parsing (advanced)
In this tutorial, you’ll learn how to use this library to parse HTTP streams easily.
The architecture of the library is broken into two classes of parsers, the content parsers and the structural parsers.
The content parsers handle non-structural elements, terminal tokens, all easy to
match and decode. They are stateless. They are mini-parsers for elements easy to
parse and by themselves don’t add much value to justify a library (don’t confuse
low value with valueless). They live in the boost::http::syntax
namespace. They are useful when you want to parse individual HTTP elements like
the range
header value. We won’t see them in this tutorial.
The structural parsers handle structured data formats (e.g. HTTP). To achieve
flexibility and performance requirements, they follow the incremental/pull
parser model (a bit like the more traditional Iterator design pattern as
described in the Gang-of-Four book, instead of C++ iterators). These parsers
live in the boost::http::reader
namespace. These are the parsers we will look
into now.
In the future, we may add support for HTTP/2.0 stream format, but for now, we are left with two structural parsers:
-
boost::http::reader::request
for HTTP/1.0 and HTTP/1.1 request messages. -
boost::http::reader::response
for HTTP/1.0 and HTTP/1.1 response messages.
Each structural parser is prepared to receive a continuous stream of messages (i.e. what NodeJS correctly refer to as keep-alive persistent streams). Because the structure of messages is flexible enough to be non-representable in simple non-allocating C++ structures, we don’t decode the whole stream as a single parsing result as this would force allocation. What we do instead is to feed the user with one token at a time and internally we keep a lightweight non-growing state required to decode further tokens.
We use the same token definition for HTTP requests and HTTP responses. The
tokens can be either of status (e.g. error
or skip
), structural (e.g.
boost::http::token::code::end_of_headers
) or data (e.g.
boost::http::token::code::field_name
) categories. Only tokens of the data
category have an associated value.
Each token is associated with a slice (possibly 0-sized if error
token or a
token from the structural category) of the byte stream. The process goes as
follow:
-
Set the buffer used by the reader.
-
Consume tokens.
-
Check
code()
return. -
Possibly call
value<T>()
to extract token value. -
Call
next()
.
-
-
Remove parsed data from the buffer.
-
You’ll need to keep state of parsed vs unparsed data by calling
token_size()
.
-
-
If the address of the unparsed data changes, the reader is invalidated, so to speak. You can restore its valid state by setting the buffer to null or to the new address of the unparsed data.
Enough with abstract info. Take the following part of an HTTP stream:
GET / HTTP/1.1\r\n Host: www.example.com\r\n \r\n
This stream can be broken in the following series of tokens (order preserved):
-
method
. -
request_target
. -
skip
. -
version
. -
field_name
. -
field_value
. -
skip
. -
end_of_headers
. -
end_of_body
. -
end_of_message
.
The parser is required to give you a token_size()
so you can remove parsed
data from the stream. However, the parser is not required to give the same
series of tokens for the same stream. The strucutral and data tokens will always
be emitted the same. However, the parser may choose to merge some status token
(e.g. skip
) with a data token (e.g. request_target
). Therefore, the
following series of tokens would also be possible for the same example given
previously:
-
method
. -
skip
. -
request_target
. -
version
. -
field_name
. -
skip
. -
field_value
. -
end_of_headers
. -
end_of_body
. -
end_of_message
.
This (non-)guarantee is to give freedom to vary the implementation. It’d be absurd to expect different implementations of this interface generating the same result byte by byte. You may expect different algorithms also in future versions.
Another useful feature of this non-guarantee is to make possible to discard
skip
tokens in the buffer, but merge them if the stream is received in the
buffer at once.
Just imagine documenting the guarantees of the token stream if we were to make it predictable. It’d be insane.
However, there is one guarantee that the reader object must provide. It must not
discard bytes of data tokens while the token is incomplete. To illustrate this
point, let’s go to an example. Given the current token is request_target
, you
have the following code.
assert(parser.code() == boost::http::token::code::request_target);
auto value = parser.value<boost::http::token::request_target>();
While we traverse the stream, the parser will only match tokens. We don’t expect the parser to also decode the tokens. The parser will only decode the tokens if necessary to further match the following tokens. And even when the parser decod’em, the intermediary results may be discarded. In other words, match and decode are separate steps and you can spare CPU time when you don’t need to decode certain elements.
The point is that the token value must be extracted directly from the byte stream and the parser is not allowed to buffer data about the stream (or the decoded values, for that matter). The implication of this rule gives a guarantee about the token order and its relationship to the bytem stream.
You can imagine the stream as having two views. The tokens and the byte streams. The token view spans windows over the byte view.
tokens: | method | skip | request_target | skip | version bytes: | GET | <SPC> | / | <SPC> HTTP/1. | 1 <CRLF>
The slice of data associated with a data token can grow larger than the equivalent bytes:
tokens: | method | request_target | skip | version bytes: | GET <SPC> | / | <SPC> | HTTP/1.1 <CRLF>
But it cannot shrink smaller than its associated bytes:
tokens: | method | skip | request_target | skip | version bytes: | GE | T <SPC> | / | <SPC> HTTP/1.1 | <CRLF>
So you have a token interface easy to inspect and you have a lot of freedom to
manage the underlying buffer. Let’s see the boost::http::reader::request
parser as used in Tufão:
void HttpServerRequest::onReadyRead()
{
if (priv->timeout)
priv->timer.start(priv->timeout);
priv->buffer += priv->socket.readAll();
priv->parser.set_buffer(asio::buffer(priv->buffer.data(),
priv->buffer.size()));
std::size_t nparsed = 0;
Priv::Signals whatEmit(0);
bool is_upgrade = false;
while(priv->parser.code() != http::token::code::error_insufficient_data) {
switch(priv->parser.code()) {
case http::token::code::error_set_method:
qFatal("unreachable");
break;
case http::token::code::error_use_another_connection:
qFatal("unreachable");
break;
case http::token::code::error_invalid_data:
case http::token::code::error_no_host:
case http::token::code::error_invalid_content_length:
case http::token::code::error_content_length_overflow:
case http::token::code::error_invalid_transfer_encoding:
case http::token::code::error_chunk_size_overflow:
priv->socket.close();
return;
case http::token::code::skip:
break;
case http::token::code::method:
{
clearRequest();
priv->responseOptions = 0;
auto value = priv->parser.value<http::token::method>();
QByteArray method(value.data(), value.size());
priv->method = std::move(method);
}
break;
case http::token::code::request_target:
{
auto value = priv->parser.value<http::token::request_target>();
QByteArray url(value.data(), value.size());
priv->url = std::move(url);
}
break;
case http::token::code::version:
{
auto value = priv->parser.value<http::token::version>();
if (value == 0) {
priv->httpVersion = HttpVersion::HTTP_1_0;
priv->responseOptions |= HttpServerResponse::HTTP_1_0;
} else {
priv->httpVersion = HttpVersion::HTTP_1_1;
priv->responseOptions |= HttpServerResponse::HTTP_1_1;
}
}
break;
case http::token::code::status_code:
qFatal("unreachable");
break;
case http::token::code::reason_phrase:
qFatal("unreachable");
break;
case http::token::code::field_name:
case http::token::code::trailer_name:
{
auto value = priv->parser.value<http::token::field_name>();
priv->lastHeader = QByteArray(value.data(), value.size());
}
break;
case http::token::code::field_value:
{
auto value = priv->parser.value<http::token::field_value>();
QByteArray header(value.data(), value.size());
priv->headers.insert(priv->lastHeader, std::move(header));
priv->lastHeader.clear();
}
break;
case http::token::code::trailer_value:
{
auto value = priv->parser.value<http::token::trailer_value>();
QByteArray header(value.data(), value.size());
priv->trailers.insert(priv->lastHeader, std::move(header));
priv->lastHeader.clear();
}
break;
case http::token::code::end_of_headers:
{
auto it = priv->headers.find("connection");
bool close_found = false;
bool keep_alive_found = false;
for (;it != priv->headers.end();++it) {
auto value = boost::string_ref(it->data(), it->size());
http::header_value_any_of(value, [&](boost::string_ref v) {
if (iequals(v, "close"))
close_found = true;
if (iequals(v, "keep-alive"))
keep_alive_found = true;
if (iequals(v, "upgrade"))
is_upgrade = true;
return false;
});
if (close_found)
break;
}
if (!close_found
&& (priv->httpVersion == HttpVersion::HTTP_1_1
|| keep_alive_found)) {
priv->responseOptions |= HttpServerResponse::KEEP_ALIVE;
}
whatEmit = Priv::READY;
}
break;
case http::token::code::body_chunk:
{
auto value = priv->parser.value<http::token::body_chunk>();
priv->body.append(asio::buffer_cast<const char*>(value),
asio::buffer_size(value));
whatEmit |= Priv::DATA;
}
break;
case http::token::code::end_of_body:
break;
case http::token::code::end_of_message:
priv->parser.set_buffer(asio::buffer(priv->buffer.data() + nparsed,
priv->parser.token_size()));
whatEmit |= Priv::END;
disconnect(&priv->socket, SIGNAL(readyRead()),
this, SLOT(onReadyRead()));
break;
}
nparsed += priv->parser.token_size();
priv->parser.next();
}
nparsed += priv->parser.token_size();
priv->parser.next();
priv->buffer.remove(0, nparsed);
if (is_upgrade) {
disconnect(&priv->socket, SIGNAL(readyRead()),
this, SLOT(onReadyRead()));
disconnect(&priv->socket, SIGNAL(disconnected()),
this, SIGNAL(close()));
disconnect(&priv->timer, SIGNAL(timeout()), this, SLOT(onTimeout()));
priv->body.swap(priv->buffer);
emit upgrade();
return;
}
if (whatEmit.testFlag(Priv::READY)) {
whatEmit &= ~Priv::Signals(Priv::READY);
this->disconnect(SIGNAL(data()));
this->disconnect(SIGNAL(end()));
emit ready();
}
if (whatEmit.testFlag(Priv::DATA)) {
whatEmit &= ~Priv::Signals(Priv::DATA);
emit data();
}
if (whatEmit.testFlag(Priv::END)) {
whatEmit &= ~Priv::Signals(Priv::END);
emit end();
return;
}
}
Boost.Http higher level message framework’s socket has a buffer of fixed size and cannot have the luxury of appending data every time. Both high level projects have many fundamental differences.
Boost.Http | Tufão |
---|---|
Boost.Asio active style. |
Qt event loop passive style. |
Boost usage allowed. |
It uses this header-only parser lib at Tufão build time and Tufão user will never need Boost again. |
Message-based framework which allows different backends to be plugged later keeping the same handlers. |
Tied to HTTP/1.1 embedded server. |
Callbacks and completion tokens. It may read more than asked for, but it’ll use
|
Combined with Qt network reactive programming style, it has a strange logic related to event signalling. |
Proper HTTP upgrade semantics. |
Strange HTTP upgrade semantics thanks to the immaturity of following NodeJS design decisions. |
It normalizes all header keys to lower case. |
Case insensitive string classes for the C++ counterpart of the HTTP field names structure. |
These are the main differences that I wanted to note. You can be sure this parser will fit you and it’ll be easy to use. And more importantly, easy to use right. NodeJS parser demands too much HTTP knowledge on the user behalf. And thanks to the NodeJS parser hard to use API, Tufão only was able to support proper HTTP pipelining once it migrated to Boost.Http parser (although Boost.Http managed to do lots of ninja techs to support it under NodeJS parser).
To sum up the data received handler structure, you need:
-
Get the buffer right with
parser.set_buffer(buf)
. -
Loop to consume —
parser.next()
— tokens whilehttp::token::code::error_insufficient_data
.-
Examine token with
parser.code()
. -
Maybe handle error.
-
Extract data with
parser.value<T>()
if a data token.
-
-
Remove parsed data.
There are a lot of different HTTP server/client models you can build on top of
this framework and the notification style you’re to use is entirely up
to you. Most likely, you’ll want to hook some actions when the
always-to-be-triggered delimiters category tokens (e.g.
boost::http::token::code::end_of_headers
) are reached.
2.3. Parsing HTTP upgrade
Given you already know the basics, parsing HTTP upgrade is trivial. Because the HTTP parser doesn’t take ownership of the buffer and you pretty much know up until which point the stream was parsed as HTTP.
All you gotta do is consume all the HTTP data (i.e. watch for
code::end_of_message
) and parse the rest of the buffer as the new
protocol. Here is the Tufão code to update an HTTP client to WebSocket:
inline bool WebSocketHttpClient::execute(QByteArray &chunk)
{
if (errored)
return false;
parser.set_buffer(asio::buffer(chunk.data(), chunk.size()));
std::size_t nparsed = 0;
while(parser.code() != http::token::code::error_insufficient_data) {
switch(parser.code()) {
case http::token::code::error_set_method:
qFatal("unreachable: we did call `set_method`");
break;
case http::token::code::error_use_another_connection:
errored = true;
return false;
case http::token::code::error_invalid_data:
errored = true;
return false;
case http::token::code::error_no_host:
qFatal("unreachable");
break;
case http::token::code::error_invalid_content_length:
errored = true;
return false;
case http::token::code::error_content_length_overflow:
errored = true;
return false;
case http::token::code::error_invalid_transfer_encoding:
errored = true;
return false;
case http::token::code::error_chunk_size_overflow:
errored = true;
return false;
case http::token::code::skip:
break;
case http::token::code::method:
qFatal("unreachable");
break;
case http::token::code::request_target:
qFatal("unreachable");
break;
case http::token::code::version:
if (parser.value<http::token::version>() == 0) {
errored = true;
return false;
}
break;
case http::token::code::status_code:
status_code = parser.value<http::token::status_code>();
if (status_code != 101) {
errored = true;
return false;
}
parser.set_method("GET");
break;
case http::token::code::reason_phrase:
break;
case http::token::code::field_name:
{
auto value = parser.value<http::token::field_name>();
lastHeader = QByteArray(value.data(), value.size());
}
break;
case http::token::code::field_value:
{
auto value = parser.value<http::token::field_value>();
QByteArray header(value.data(), value.size());
headers.insert(lastHeader, std::move(header));
lastHeader.clear();
}
break;
case http::token::code::end_of_headers:
break;
case http::token::code::body_chunk:
break;
case http::token::code::end_of_body:
break;
case http::token::code::trailer_name:
break;
case http::token::code::trailer_value:
break;
case http::token::code::end_of_message:
ready = true;
parser.set_buffer(asio::buffer(chunk.data() + nparsed,
parser.token_size()));
break;
}
nparsed += parser.token_size();
parser.next();
}
nparsed += parser.token_size();
parser.next();
chunk.remove(0, nparsed);
if (ready && headers.contains("Upgrade"))
return true;
return false;
}
2.4. Implementing Boost.Beast parser interface
In this tutorial, we’ll show you how to implement Boost.Beast parser interface using Boost.Http parser.
Note
|
No prior experience with Boost.Beast is required. |
Boost.Beast parser borrows much of its design from Ryan Dahl’s HTTP parser — the NodeJS parser. This is a design that I do know and used more than once in different projects. However, this is not the only design I know of. I see much of this design as an evolution of the language limitations that you find in the C language.
I’ve previously written a few complaints about the Ryan Dahl’s HTTP parser. Boost.Beast evolves from this design and takes a few different decisions. We’ll see if, and which, limitations the Boost.Beast parser still carries thanks to this inheritance.
2.4.1. Learning how to use the parser
The parser is represented by a basic_parser<isRequest, Derived>
class. The
parser is callback-based just like Ryan Dahl’s HTTP parser and it uses CRTP to
avoid the overhead of virtual function calls.
Note
|
DESIGN IMPLICATIONS
And here we can notice the first difference between Boost.Beast and Boost.Http parsers. If you design an algorithm to work on the parser object, this algorithm must be a template (and it carries the same drawbacks of a header-only library):
But if you use Boost.Http parser, this requirement vanishes:
|
To feed the parser with data, you call basic_parser::put
:
template<
class ConstBufferSequence>
std::size_t
put(
ConstBufferSequence const& buffers,
error_code& ec);
The parser will match and decode the tokens in the stream.
Note
|
DESIGN IMPLICATIONS
Match and decoding are always applied together. What does this mean? Not much,
given decoding HTTP/1.1 tokens is cheap and most of the time it reduces to
return a However, the implications of the fundamental model chosen (pull or push) give rise to larger divergences at this point already. In the Boost.Beast parser, the token is passed to the function callback registered. In the Boost.Http parser, the token is hold by the parser object itself. It might not seem much difference at the first glance, but consider the problem
of composability. If I want to write an algorithm to take a
Just as a concrete illustration of what I meant by the Boost.Http solution:
How would the solution look like using the Boost.Beast parser? Let’s draft something:
However, this solution is fully flawed. As I can’t emphasize enough that the problem is not about adding a “skip-this-set-of-HTTP-headers” function. The problem is about a fundamental building block which can solve more of the user needs. I could keep thinking about the different problems that could happen, but if you do not give a try to enter in the general problem and insist on a myopic vision, you’ll never grasp my message (just as an addicted to inductive reasoning will never understand someone who is using deductive reasoning). If all you have is a hammer, everything looks like a nail. We shall see more design implications later on as we continue this chapter. |
As the tokens are found, the user callbacks are called. The function returns the number of parsed bytes.
Note
|
DESIGN IMPLICATIONS
And as each sentence goes on, it seems that I need to explain more design implications. What if you want to reject messages as soon as one specific token is found? The point here is about avoiding unnecessary computation of parsing elements of a message that would be rejected anyway. For Boost.Http parser, the control flow is yours to take and…
— Aleister Crowley
A concrete example if you may:
As for Boost.Beast parser. There is an answer, but not with your current limited knowledge of the API. Let’s continue to present Boost.Beast API and come back at this “stop the world” problem later. |
The behaviour usually found in push parsers is to parse the stream until the end
of the feeded buffers and then return. This is the NodeJS’s parser approach from
which Boost.Beast takes much inspiration. However, Boost.Beast takes a slightly
different approach to this problem so it’s possible to parse only one token at a
time. The Boost.Beast solution is the eager
function:
void
eager(
bool v);
Normally the parser returns after successfully parsing a structured element (header, chunk header, or chunk body) even if there are octets remaining in the input. This is necessary when attempting to parse the header first, or when the caller wants to inspect information which may be invalidated by subsequent parsing, such as a chunk extension. The eager option controls whether the parser keeps going after parsing structured element if there are octets remaining in the buffer and no error occurs. This option is automatically set or cleared during certain stream operations to improve performance with no change in functionality.
The default setting is
false
.
Boost.Beast documentation
Note
|
DESIGN IMPLICATIONS
And now, back at the “stop the world” problem… Simply put, Boost.Beast solution is just a hackishy way to implement a pull parser — the parser approach consciously chosen by Boost.Http parser. Alternatively, you can just set the |
Continuing this inductive reasoning of “hey! a problem appeared, let’s write yet
another function, function_xyz
, to solve use case 777”, a number of other
functions are provided. One of them is header_limit
:
void
header_limit(
std::uint32_t v);
This function sets the maximum allowed size of the header including all field name, value, and delimiter characters and also including the CRLF sequences in the serialized input. If the end of the header is not found within the limit of the header size, the error
http::header_limit
is returned byhttp::basic_parser::put
.Setting the limit after any header octets have been parsed results in undefined behavior.
Boost.Beast documentation
Another function, body_limit
, is provided in the same spirit of
header_limit
. What if I have a use case to limit request-target
size? Then
Boost.Beast author will add function_xyz2
to use case 778.
Note
|
DESIGN IMPLICATIONS
What is the Boost.Http solution to this problem 🤔? This is broken into two possible cases.
It’ll work for any token (i.e. you don’t need one extra function for each
possible token which would just complicate the implementation and inflate the
object with a large |
With all this info, the Boost.Beast parser is mostly covered and we can delve into the implementation of such interface.
Note
|
DESIGN IMPLICATIONS
Now… let’s look at something different. Suppose the following scenario: You have an embedded project and the headers must not be stored (as it’d imply heap memory of complex data structures). You process options with an in situ algorithm out from the headers. In Boost.Http parser, I’m imagining something in these lines:
Boost.Beast solution is not hard to imagine too:
So… what does each design implies? As Boost.Beast parser always parse field name + field value together, if both fields sum up more than the buffer size, you’re out of luck. Both tokens must fit in the buffer together. Just as an exercise, let’s pursue the inductive reasoning applied to this
problem. We could split the Boost.Beast’s
But then we create another problem:
— Vinnie Falco
Boost.Beast documentation If you don’t see a problem already, let me unveil it for you. Now, most of the
uses of the parser, which want to store the HTTP headers in some sort of
Under the push parser model, these two cases are irreconcilable. Boost.Beast opts to solve the most common problem and this was a good design choice (let’s give credit where credit is due). However, Boost.Http parser is a good choice in any of these two cases. It only feeds one token at a time. And as Boost.Http message framework demonstrate, we can use the first bytes of the buffer to store the HTTP field name. And just to present a more readable alternative, you could play with copies of
the reader object made in the stack of the
Remember… principles. I can attack other specific cases. As an exercise, try to find a few yourself. |
2.4.2. Implementing the Boost.Beast interface
Note
|
As we previously seen, there are several functions in Boost.Beast parser that
are just boilerplate inherited (e.g. We’ll skip some of this boilerplate as it is not of our interest. Our purpose with this tutorial was to show design implications derived from the choices of the fundamental models. |
template<bool isRequest, class Derived>
class basic_parser;
// This template specialization is wrong, but is kept for simplification
// purposes.
template<bool isRequest, class Derived>
class basic_parser<true, Derived>
{
public:
template<
class ConstBufferSequence>
std::size_t
put(
ConstBufferSequence const& buffers,
error_code& ec)
{
// WARNING: the real implementation will have more trouble because of
// the `ConstBufferSequence` concept, but for the reason of simplicity,
// we don't show the real code here.
reader.set_buffer(buffers);
error_code ec;
while (reader.code() != code::error_insufficient_data) {
switch (reader.code()) {
case code::skip:
break;
case code::method:
method = reader.value<token::method>;
break;
case code::request_target:
target = reader.value<token::request_target>;
break;
case code::version:
static_cast<Derived&>(*this)
.on_request_impl(/*the enum code*/, method, target,
reader.value<token::version>(), ec);
if (ec) {
// TODO: extra code to enter in error state
return reader.parsed_count();
}
break;
// ...
case code::end_of_headers:
static_cast<Derived&>(*this).on_header_impl(ec);
if (ec) {
// TODO: extra code to enter in error state
return reader.parsed_count();
}
break;
// ...
}
reader.next();
}
return reader.parsed_count();
}
private:
boost::http::reader::request reader;
// It's possible and easy to create an implementation that doesn't allocate
// memory. Just keep a copy of `reader` within the `put` function body and
// you can go back. As `reader` is just an integer-based state machine with
// a few indexes, the copy is cheap. I'm sorry I don't have the time to code
// the demonstration right now.
std::string method;
std::string target;
};
A final note I want to add is that I plan more improvements to the parser. Just as Boost.Beast parser is an evolution of the wrong model chosen for the problem, my parser still has room to evolve. But from my judgment, this parser already is better than Boost.Beast parser can ever be (i.e. the problems I presented here are unfixable in Boost.Beast design… not to mention that Boost.Beast parser has almost the double amount of member-functions to solve the same problem [8]).
3. Design choices
3.1. FAQ
-
How robust is this parser?
It doesn’t try to parse URLs at all. It’ll ensure that only valid characters (according to HTTP request target BNF rule) are present, but invalid sequences are accepted.
This parser is a little (but not too much) more liberal in what accepts and it’ll accept invalid sequences for rarely used elements that don’t impact upper layers of the application. The reason to accept such non-conformant sequences is a simpler algorithm that can be more performant (e.g. we only check for invalid chars, but not invalid sequences). Therefore, it’s advised to reframe the message if you intend to forward it to some other participant. Not doing so, might be a security issue if the participant you are forwarding your message to is know to show improper behaviour when parsing invalid streams. There are several references within the RFC7230 where similar decisions are suggested (e.g. if you receive redundant Content-Length header field, you must merge it into one before forwarding or reject the message as a whole).
-
Why not make a parser using Boost.Spirit?
Boost.Spirit needs backtracking to implement the OR operator. Boost.Spirit can’t build a state machine which would allow you to continue parsing from the suspended point/byte. Thanks to these characteristics, it can’t be used in our HTTP parser. Also, we don’t see much benefit in pursuing this effort.
-
What is the recommended buffer size?
A buffer of size 7990 is recommended (suggested request line of 8000 by section 3.1.1 of RFC7230 minus spaces, minus http version information minus the minimum size of the other token in the request line). However, the real suggested buffer size should be how long names you expect to have on your own servers.
-
What are the differences between
reader::request
andreader::response
?-
response
has thevoid set_method(view_type method)
member-function.WarningThis member-function MUST be called for each HTTP response message being parsed in the stream. -
response
has thevoid puteof()
member-function. -
code()
member function return value has different guarantees in each class. -
template<class T> typename T::type value() const
member function accepts different input template arguments in each class.
-
3.2. Roadmap
-
Parsers combinators.
-
Incremental message generator.
-
Iterator adaptors.
4. Reference
All declarations from this library resides within the boost::http
namespace. For brevity, this prefix is not repeated on the documentation.
4.1. Summary
4.1.1. Classes
-
Tokens
-
Structural parsers
4.1.2. Class Templates
4.1.3. Free Functions
-
Header processing
4.1.4. Enumerations
4.1.5. Headers
4.2. Detailed
4.2.1. token::code::value
#include <boost/http/token.hpp>
namespace token {
struct code
{
enum value
{
error_insufficient_data,
error_set_method,
error_use_another_connection,
error_invalid_data,
error_no_host,
error_invalid_content_length,
error_content_length_overflow,
error_invalid_transfer_encoding,
error_chunk_size_overflow,
skip,
method,
request_target,
version,
status_code,
reason_phrase,
field_name,
field_value,
end_of_headers,
body_chunk,
end_of_body,
trailer_name,
trailer_value,
end_of_message
};
};
} // namespace token
error_insufficient_data
-
token_size()
of this token will always be zero.
4.2.2. token::symbol::value
#include <boost/http/token.hpp>
namespace token {
struct symbol
{
enum value
{
error,
skip,
method,
request_target,
version,
status_code,
reason_phrase,
field_name,
field_value,
end_of_headers,
body_chunk,
end_of_body,
trailer_name,
trailer_value,
end_of_message
};
static value convert(code::value);
};
} // namespace token
4.2.3. token::category::value
#include <boost/http/token.hpp>
namespace token {
struct category
{
enum value
{
status,
data,
structural
};
static value convert(code::value);
static value convert(symbol::value);
};
} // namespace token
4.2.4. token::skip
#include <boost/http/token.hpp>
namespace token {
struct skip
{
static const token::code::value code = token::code::skip;
};
} // namespace token
Used to skip unneeded bytes so user can keep buffer small when asking for more data.
4.2.5. token::field_name
#include <boost/http/token.hpp>
namespace token {
struct field_name
{
typedef boost::string_ref type;
static const token::code::value code = token::code::field_name;
};
} // namespace token
4.2.6. token::field_value
#include <boost/http/token.hpp>
namespace token {
struct field_value
{
typedef boost::string_ref type;
static const token::code::value code = token::code::field_value;
};
} // namespace token
4.2.7. token::body_chunk
#include <boost/http/token.hpp>
namespace token {
struct body_chunk
{
typedef asio::const_buffer type;
static const token::code::value code = token::code::body_chunk;
};
} // namespace token
4.2.8. token::end_of_headers
#include <boost/http/token.hpp>
namespace token {
struct end_of_headers
{
static const token::code::value code = token::code::end_of_headers;
};
} // namespace token
4.2.9. token::end_of_body
#include <boost/http/token.hpp>
namespace token {
struct end_of_body
{
static const token::code::value code = token::code::end_of_body;
};
} // namespace token
4.2.10. token::trailer_name
#include <boost/http/token.hpp>
namespace token {
struct trailer_name
{
typedef boost::string_ref type;
static const token::code::value code = token::code::trailer_name;
};
} // namespace token
Note
|
This token is “implicitly convertible” to |
4.2.11. token::trailer_value
#include <boost/http/token.hpp>
namespace token {
struct trailer_value
{
typedef boost::string_ref type;
static const token::code::value code = token::code::trailer_value;
};
} // namespace token
Note
|
This token is “implicitly convertible” to |
4.2.12. token::end_of_message
#include <boost/http/token.hpp>
namespace token {
struct end_of_message
{
static const token::code::value code = token::code::end_of_message;
};
} // namespace token
4.2.13. token::method
#include <boost/http/token.hpp>
namespace token {
struct method
{
typedef boost::string_ref type;
static const token::code::value code = token::code::method;
};
} // namespace token
4.2.14. token::request_target
#include <boost/http/token.hpp>
namespace token {
struct request_target
{
typedef boost::string_ref type;
static const token::code::value code = token::code::request_target;
};
} // namespace token
4.2.15. token::version
#include <boost/http/token.hpp>
namespace token {
struct version
{
typedef int type;
static const token::code::value code = token::code::version;
};
} // namespace token
4.2.16. token::status_code
#include <boost/http/token.hpp>
namespace token {
struct status_code
{
typedef uint_least16_t type;
static const token::code::value code = token::code::status_code;
};
} // namespace token
4.2.17. token::reason_phrase
#include <boost/http/token.hpp>
namespace token {
struct reason_phrase
{
typedef boost::string_ref type;
static const token::code::value code = token::code::reason_phrase;
};
} // namespace token
4.2.18. reader::request
#include <boost/http/reader/request.hpp>
This class represents an HTTP/1.1
(and HTTP/1.0
) incremental parser. It’ll
use the token definitions found in token::code::value
.
You may want to check the basic parsing tutorial to learn
the basics.
Important
|
Once the parser enters in an error state (and the error is different than
If you want to reuse the same reader object to parse another stream, just call
|
4.2.18.1. Member types
typedef std::size_t size_type
-
Type used to represent sizes.
typedef const char value_type
-
Type used to represent the value of a single element in the buffer.
typedef value_type *pointer
-
Pointer-to-value type.
typedef boost::string_ref view_type
-
Type used to refer to non-owning string slices.
4.2.18.2. Member functions
request()
-
Constructor.
void reset()
-
After a call to this function, the object has the same internal state as an object that was just constructed.
token::code::value code() const
-
Use it to inspect current token. Returns code.
NoteThe following values are never returned:
-
token::code::error_set_method
. -
token::code::error_use_another_connection
. -
token::code::status_code
. -
token::code::reason_phrase
.
-
token::symbol::value symbol() const
-
Use it to inspect current token. Returns symbol.
NoteThe following values are never returned:
-
token::symbol::status_code
. -
token::symbol::reason_phrase
.
-
token::category::value category() const
-
Use it to inspect current token. Returns category.
size_type token_size() const
-
Returns the size of current token.
NoteAfter you call
next()
, you’re free to remove, from the buffer, the amount of bytes equals to the value returned here.If you do remove the parsed data from the buffer, the address of the data shouldn’t change (i.e. you must not invalidate the pointers/iterators to old unparsed data). If you do change the address of old unparsed data, call
set_buffer
before using this object again.Examplestd::size_t nparsed = reader.token_size(); reader.next(); buffer.erase(0, nparsed); reader.set_buffer(buffer);
WarningDo not use string_length(reader.value<T>())
to compute the token size.string_length(reader.value<T>())
andreader.token_size()
may differ. Check the advanced parsing tutorial for more details. template<class T> typename T::type value() const
-
Extracts the value of current token and returns it.
T
must be one of:-
token::method
. -
token::request_target
. -
token::version
. -
token::field_name
. -
token::field_value
. -
token::body_chunk
.WarningThe assert(code() == T::code)
precondition is assumed.NoteThis parser doesn’t buffer data. The value is extracted directly from buffer.
-
token::code::value expected_token() const
-
Returns the expected token code.
Useful when the buffer has been exhausted and
code() == token::code::error_insufficient_data
. Use it to respond with “URL/HTTP-header/… too long” or another error-handling strategy.WarningThe returned value is a heuristic, not a truth. If your buffer is too small, the buffer will be exhausted with too little info to know which element is expected for sure.
For instance,
expected_token()
might returntoken::code::field_name
, but when you have enough info in the buffer, the actual token happens to betoken::code::end_of_headers
. void next()
-
Consumes the current token and advances in the buffer.
NoteGiven the current token is complete (i.e. code() != token::code::error_insufficient_data
), a call to this function always consumes the current token. void set_buffer(asio::const_buffer inbuffer)
-
Sets buffer to inbuffer.
Noteinbuffer should hold the data at the same point of unparsed data from the internal buffer from before this call.
Examplestd::size_t nparsed = reader.token_size(); // now unparsed data becomes ahead // of `buffer.begin()` reader.next(); reader.set_buffer(buffer + nparsed);
WarningThe reader object follows the HTTP stream orchestrated by the continuous flow of set_buffer()
andnext()
. You should treat this region as read-only. For instance, if I pass"header-a: something"
to the reader and then change the contents to"header-a: another thing"
, there are no guarantees about the reader object behaviour. You can safely change only the contents of the buffer region not yet exposed toreader
throughreader.set_buffer(some_buffer)
(i.e. the region outside ofsome_buffer
never seen byreader
).NoteYou’re free to pass larger buffers at will.
You’re also free to pass a buffer just as big as current token (i.e.
token_size()
). In other words, you’re free to shrink the buffer if the new buffer is at least as big as current token.TipIf you want to free the buffer while maintaining the reader object valid, just set the buffer to current token size, call
next()
and then set buffer to an empty buffer.Do notice that this will consume current token as well. And as values are decoded directly from the buffer, this strategy is the only choice.
Examplereader.set_buffer(boost::asio::buffer(buffer, reader.token_size())); reader.next(); reader.set_buffer(boost::asio::const_buffer()); buffer.clear();
size_type parsed_count() const
-
Returns the number of bytes parsed since
set_buffer
was last called.TipYou can use it to go away with the
nparsed
variable shown in the principles on parsing tutorial. I’m sorry about the “you must keep track of the number of discarded bytes” lie I told you before, but as one great explainer once told:As I look upon you… it occurs to me that you may not have the necessary level of maturity to handle the truth.
— Scott Meyers
C++ and Beyond 2012: Universal References in C++11That lie was useful to explain some core concepts behind this library.
4.2.18.3. See also
4.2.19. reader::response
#include <boost/http/reader/response.hpp>
This class represents an HTTP/1.1
(and HTTP/1.0
) incremental parser. It’ll
use the token definitions found in token::code::value
.
You may want to check the basic parsing tutorial to learn
the basics.
Important
|
Once the parser enters in an error state (and the error is different than
If you want to reuse the same reader object to parse another stream, just call
|
4.2.19.1. Member types
typedef std::size_t size_type
-
Type used to represent sizes.
typedef const char value_type
-
Type used to represent the value of a single element in the buffer.
typedef value_type *pointer
-
Pointer-to-value type.
typedef boost::string_ref view_type
-
Type used to refer to non-owning string slices.
4.2.19.2. Member functions
response()
-
Constructor.
void set_method(view_type method)
-
Use it to inform the request method of the request message associated with this response message. This is necessary internally to compute the body size. If you do not call this function when
code() == token::code::status_code
, thentoken::code::error_set_method
will be the next token.WarningThe assert(code() == token::code::status_code)
precondition is assumed. void reset()
-
After a call to this function, the object has the same internal state as an object that was just constructed.
void puteof()
-
If the connection is closed, call this function.
HTTP/1.0
used this event to signalizetoken::code::end_of_body
. token::code::value code() const
-
Use it to inspect current token. Returns code.
NoteThe following values are never returned:
-
token::code::error_no_host
. -
token::code::method
. -
token::code::request_target
.
-
token::symbol::value symbol() const
-
Use it to inspect current token. Returns symbol.
NoteThe following values are never returned:
-
token::symbol::method
. -
token::symbol::request_target
.
-
token::category::value category() const
-
Use it to inspect current token. Returns category.
size_type token_size() const
-
Returns the size of current token.
NoteAfter you call
next()
, you’re free to remove, from the buffer, the amount of bytes equals to the value returned here.If you do remove the parsed data from the buffer, the address of the data shouldn’t change (i.e. you must not invalidate the pointers/iterators to old unparsed data). If you do change the address of old unparsed data, call
set_buffer
before using this object again.Examplestd::size_t nparsed = reader.token_size(); reader.next(); buffer.erase(0, nparsed); reader.set_buffer(buffer);
WarningDo not use string_length(reader.value<T>())
to compute the token size.string_length(reader.value<T>())
andreader.token_size()
may differ. Check the advanced parsing tutorial for more details. template<class T> typename T::type value() const
-
Extracts the value of current token and returns it.
T
must be one of:-
token::status_code
. -
token::version
. -
token::reason_phrase
. -
token::field_name
. -
token::field_value
. -
token::body_chunk
.WarningThe assert(code() == T::code)
precondition is assumed.NoteThis parser doesn’t buffer data. The value is extracted directly from buffer.
-
token::code::value expected_token() const
-
Returns the expected token code.
Useful when the buffer has been exhausted and
code() == token::code::error_insufficient_data
. Use it to log error to cout or another error-handling strategy.WarningThe returned value is a heuristic, not a truth. If your buffer is too small, the buffer will be exhausted with too little info to know which element is expected for sure.
For instance,
expected_token()
might returntoken::code::field_name
, but when you have enough info in the buffer, the actual token happens to betoken::code::end_of_headers
. void next()
-
Consumes the current token and advances in the buffer.
NoteGiven the current token is complete (i.e. code() != token::code::error_insufficient_data
), a call to this function always consumes the current token. void set_buffer(asio::const_buffer inbuffer)
-
Sets buffer to inbuffer.
Noteinbuffer should hold the data at the same point of unparsed data from the internal buffer from before this call.
Examplestd::size_t nparsed = reader.token_size(); // now unparsed data becomes ahead // of `buffer.begin()` reader.next(); reader.set_buffer(buffer + nparsed);
WarningThe reader object follows the HTTP stream orchestrated by the continuous flow of set_buffer()
andnext()
. You should treat this region as read-only. For instance, if I pass"header-a: something"
to the reader and then change the contents to"header-a: another thing"
, there are no guarantees about the reader object behaviour. You can safely change only the contents of the buffer region not yet exposed toreader
throughreader.set_buffer(some_buffer)
(i.e. the region outside ofsome_buffer
never seen byreader
).NoteYou’re free to pass larger buffers at will.
You’re also free to pass a buffer just as big as current token (i.e.
token_size()
). In other words, you’re free to shrink the buffer if the new buffer is at least as big as current token.TipIf you want to free the buffer while maintaining the reader object valid, just set the buffer to current token size, call
next()
and then set buffer to an empty buffer.Do notice that this will consume current token as well. And as values are decoded directly from the buffer, this strategy is the only choice.
Examplereader.set_buffer(boost::asio::buffer(buffer, reader.token_size())); reader.next(); reader.set_buffer(boost::asio::const_buffer()); buffer.clear();
size_type parsed_count() const
-
Returns the number of bytes parsed since
set_buffer
was last called.TipYou can use it to go away with the
nparsed
variable shown in the principles on parsing tutorial. I’m sorry about the “you must keep track of the number of discarded bytes” lie I told you before, but as one great explainer once told:As I look upon you… it occurs to me that you may not have the necessary level of maturity to handle the truth.
— Scott Meyers
C++ and Beyond 2012: Universal References in C++11That lie was useful to explain some core concepts behind this library.
4.2.19.3. See also
4.2.20. syntax::chunk_size
#include <boost/http/syntax/chunk_size.hpp>
namespace syntax {
template<class CharT>
struct chunk_size {
typedef basic_string_ref<CharT> view_type;
BOOST_SCOPED_ENUM_DECLARE_BEGIN(result)
{
invalid,
ok,
overflow
}
BOOST_SCOPED_ENUM_DECLARE_END(result)
static std::size_t match(view_type view);
template<class Target>
static result decode(view_type in, Target &out);
};
} // namespace syntax
4.2.21. syntax::content_length
#include <boost/http/syntax/content_length.hpp>
namespace syntax {
template<class CharT>
struct content_length {
typedef basic_string_ref<CharT> view_type;
BOOST_SCOPED_ENUM_DECLARE_BEGIN(result)
{
invalid,
ok,
overflow
}
BOOST_SCOPED_ENUM_DECLARE_END(result)
template<class Target>
static result decode(view_type in, Target &out);
};
} // namespace syntax
4.2.22. syntax::strict_crlf
#include <boost/http/syntax/crlf.hpp>
namespace syntax {
template<class CharT>
struct strict_crlf {
typedef basic_string_ref<CharT> view_type;
static std::size_t match(view_type view);
};
} // namespace syntax
4.2.23. syntax::liberal_crlf
#include <boost/http/syntax/crlf.hpp>
namespace syntax {
template<class CharT>
struct liberal_crlf {
typedef basic_string_ref<CharT> view_type;
BOOST_SCOPED_ENUM_DECLARE_BEGIN(result)
{
crlf,
lf,
insufficient_data,
invalid_data,
}
BOOST_SCOPED_ENUM_DECLARE_END(result)
static result match(view_type view);
};
} // namespace syntax
4.2.24. syntax::field_name
#include <boost/http/syntax/field_name.hpp>
namespace syntax {
template<class CharT>
struct field_name {
typedef basic_string_ref<CharT> view_type;
static std::size_t match(view_type view);
};
} // namespace syntax
4.2.25. syntax::left_trimmed_field_value
#include <boost/http/syntax/field_value.hpp>
namespace syntax {
template<class CharT>
struct left_trimmed_field_value {
typedef basic_string_ref<CharT> view_type;
static std::size_t match(view_type view);
};
} // namespace syntax
4.2.26. syntax::ows
#include <boost/http/syntax/ows.hpp>
namespace syntax {
template<class CharT>
struct ows {
typedef basic_string_ref<CharT> view_type;
static std::size_t match(view_type view);
};
} // namespace syntax
4.2.27. syntax::reason_phrase
#include <boost/http/syntax/reason_phrase.hpp>
namespace syntax {
template<class CharT>
struct reason_phrase {
typedef basic_string_ref<CharT> view_type;
static std::size_t match(view_type view);
};
} // namespace syntax
4.2.28. syntax::status_code
#include <boost/http/syntax/status_code.hpp>
namespace syntax {
template<class CharT>
struct status_code {
typedef basic_string_ref<CharT> view_type;
static std::size_t match(view_type view);
static uint_least16_t decode(view_type view);
};
} // namespace syntax
4.2.29. header_value_any_of
#include <boost/http/algorithm/header/header_value_any_of.hpp>
template<class StringRef, class Predicate>
bool header_value_any_of(const StringRef &header_value, const Predicate &p)
Checks if unary predicate p returns true
for at least one element from the
comma-separated list defined by the header_value HTTP field value.
Note
|
This algorithm is liberal in what it accepts and it will skip invalid
elements. An invalid element is a sequence, possibly empty, containing no other
character than optional white space (i.e. '\x20' or '\t' ).
|
4.2.29.1. Template parameters
StringRef
-
It MUST fulfill the requirements of the
StringRef
concept (i.e.boost::basic_string_ref
). Predicate
-
A type whose instances are callable and have the following signature:
bool(StringRef)
4.2.29.2. Parameters
const StringRef &header_value
-
The HTTP field value.
const Predicate &p
-
The functor predicate that will be called for the elements found on the comma-separated list.
Optional white space (only at the beginning and at the end) is trimmed before applying the element to p.
4.2.29.3. Return value
true
if the p returns true
for at least one element from the list and
false
otherwise. This also means that you’ll get the return value false
for
empty lists.
4.2.30. <boost/http/token.hpp>
Import the following symbols:
4.2.31. <boost/http/algorithm/header/header_value_any_of.hpp>
Import the following symbols:
4.2.36. <boost/http/syntax/crlf.hpp>
Import the following symbols:
4.2.38. <boost/http/syntax/field_value.hpp>
Import the following symbols: