Distributed under the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)

This library can be used to create applications that need to parse HTTP streams.

Warning
This is not an official Boost C++ library. It wasn’t reviewed and can’t be downloaded from www.boost.org. This library will be reviewed eventually.

Boost.Http is a library that provides an incremental HTTP parser and a set of mini-parsers that can be used independently or combined [1]. A future version will also provide a message generator.

The highlights are:

  • Support for modern HTTP features.

  • Simple.

  • Portable (C++03 and very few other dependencies).

  • Just like Ryan Dahl’s HTTP parser, this parser does not make any syscalls or allocations. It also does not buffer data.

  • You can mutate the passed buffer [2].

  • It doesn’t steal control flow from your application [3]. Great for HTTP pipelining.

  • Matching and decoding tokens as separate steps [4].

1. Using

1.1. Requirements

  • CMake 3.1.0 or newer [5]. You can skip this requirement if you don’t intend to run the tests.

  • Boost 1.57 or more recent.

    • boost::asio::const_buffer.

    • boost::string_ref.

  • asciidoctor [6]. You’ll also need pandoc if you want to generate the ePUB output.

2. Tutorial

2.1. Parsing (beginner)

In this tutorial, you’ll learn how to use this library to parse HTTP streams easily.

Note
We assume the reader has basic understanding of C++ and Boost.Asio.

We start with the code that should resemble the structure of the program you’re about to code. And this structure is as follows:

#include <boost/http/reader/request.hpp>
#include <string>
#include <map>

namespace http = boost::http;
namespace asio = boost::asio;

struct my_socket_consumer
{
private:
    http::reader::request request_reader;
    std::string buffer;

    std::string last_header;

public:
    std::string method;
    std::string request_target;
    int version;

    std::multimap<std::string, std::string> headers;

    void on_socket_callback(asio::buffer data)
    {
        using namespace http::token;
        using token::code;

        buffer.push_back(data);
        request_reader.set_buffer(buffer);

        while (request_reader.code() != code::end_of_message) {
            switch (request_reader.code()) {
            case code::skip:
                // do nothing
                break;
            case code::method:
                method = request_reader.value<token::method>();
                break;
            case code::request_target:
                request_target = request_reader.value<token::request_target>();
                break;
            case code::version:
                version = request_reader.value<token::version>();
                break;
            case code::field_name:
            case code::trailer_name:
                last_header = request_reader.value<token::field_name>();
            }
            request_reader.next();
        }
        request_reader.next();

        ready();
    }

protected:
    virtual void ready() = 0;
};

You’re building a piece of code that consumes HTTP from somewhere — the in — and spits it out in the form of C++ structured data to somewhere else — the out.

The in of your program is connected to the above piece of code through the on_socket_callback member function. The out of your program is connected to the previous piece of code through the ready overrideable member-function.

By now I shouldn’t be worried about your understanding of how you’ll connect the network I/O with the in of the program. The connection point is obvious by now. However, I’ll briefly explain the out connection point and then we can proceed to delve into the inout-inbetween (Danas) part of the program.

Once the ready member function is called, the data for your request will be available in the method, request_target and the other public variables. From now on, I’ll focus my attention to the sole implementation of my_socket_consumer::on_socket_callback.

The awaited prize
void my_socket_consumer::on_socket_callback(asio::buffer data)
{
    //http::reader::request request_reader;
    //std::string buffer;
    //std::string last_header;

    using namespace http::token;
    using token::code;

    buffer.push_back(data);
    request_reader.set_buffer(buffer);

    while (request_reader.code() != code::end_of_message) {
        switch (request_reader.code()) {
        case code::skip:
            // do nothing
            break;
        case code::method:
            method = request_reader.value<token::method>();
            break;
        case code::request_target:
            request_target = request_reader.value<token::request_target>();
            break;
        case code::version:
            version = request_reader.value<token::version>();
            break;
        case code::field_name:
        case code::trailer_name:
            last_header = request_reader.value<token::field_name>();
        }
        request_reader.next();
    }
    request_reader.next();

    ready();
}

Try to keep in mind the three variables that will really orchestrate the flow: request_reader, buffer and last_header.

The whole work is about managing the buffer and managing the tokens.

The token access is very easy. As the parser is incremental, there is only one token at a time. I don’t need to explain Boost.Http control-flow because the control flow will be coded by you (a library, not a framework). You only have to use code() to check the current token and value<T>() to extract its value. Use next() to advance a token.

Warning

There is only one caveat. The parser doesn’t buffer data and will decode the token into a value (the value<T>() member function) directly from the buffer data.

This means you cannot extract the current value once you drop current buffer data. As a nice side effect, you spare CPU time for the tokens you do not need to decode (match’n’decoding as separate steps).

The parser doesn’t buffer data, which means when we use the set_buffer member function, request_reader only maintains a view to the passed buffer, which we’ll refer to as the virtual buffer from now on.

In the virtual buffer, there is head/current and remaining/tail. request_reader doesn’t store a pointer/address/index to the real buffer. Once a token is consumed, his bytes (head) are discarded from the virtual buffer. When you mutate the real buffer, the virtual buffer is invalidated and you must inform the parser using set_buffer. However, the bytes discarded from the virtual buffer shouldn’t appear again. You must keep track of the number of discarded bytes to prepare the buffer to the next call to set_buffer. The previous code doesn’t handle that.

The new tool that you should be presented now is token_size(). token_size() will return the size in bytes of current/head.

Warning
There is no guarantee token_size() returns the same size as returned by string_length(request_reader.value<T>()). You need to use token_size() to compute the number of discarded bytes.
void my_socket_consumer::on_socket_callback(asio::buffer data)
{
    using namespace http::token;
    using token::code;

    buffer.push_back(data);
    request_reader.set_buffer(buffer);

    std::size_t nparsed = 0; //< NEW

    while (request_reader.code() != code::end_of_message) {
        switch (request_reader.code()) {
        case code::skip:
            // do nothing
            break;
        case code::method:
            method = request_reader.value<token::method>();
            break;
        case code::request_target:
            request_target = request_reader.value<token::request_target>();
            break;
        case code::version:
            version = request_reader.value<token::version>();
            break;
        case code::field_name:
        case code::trailer_name:
            last_header = request_reader.value<token::field_name>();
        }

        nparsed += request_reader.token_size(); //< NEW
        request_reader.next();
    }
    nparsed += request_reader.token_size(); //< NEW
    request_reader.next();
    buffer.erase(0, nparsed); //< NEW

    ready();
}

nparsed was easy. However, the while(request_reader.code() != code::end_of_message) doesn’t seem right. It’s very error-prone to assume the full HTTP message will be ready in a single call to on_socket_callback. Error handling must be introduced in the code.

void my_socket_consumer::on_socket_callback(asio::buffer data)
{
    using namespace http::token;
    using token::code;

    buffer.push_back(data);
    request_reader.set_buffer(buffer);

    std::size_t nparsed = 0;

    while (request_reader.code() != code::error_insufficient_data //< NEW
           && request_reader.code() != code::end_of_message) { //< NEW
        switch (request_reader.code()) {
        case code::skip:
            // do nothing
            break;
        case code::method:
            method = request_reader.value<token::method>();
            break;
        case code::request_target:
            request_target = request_reader.value<token::request_target>();
            break;
        case code::version:
            version = request_reader.value<token::version>();
            break;
        case code::field_name:
        case code::trailer_name:
            last_header = request_reader.value<token::field_name>();
        }

        nparsed += request_reader.token_size();
        request_reader.next();
    }
    nparsed += request_reader.token_size();
    request_reader.next();
    buffer.erase(0, nparsed);

    if (request_reader.code() == code::error_insufficient_data) //< NEW
        return; //< NEW

    ready();
}
Note
Don’t worry about token_size(code::error_insufficient_data) being added to nparsed. This (error) "token" is defined to be 0-size (it fits perfectly with the other rules).

Just because it’s easy and we’re already at it, let’s handle the other errors as well:

void my_socket_consumer::on_socket_callback(asio::buffer data)
{
    using namespace http::token;
    using token::code;

    buffer.push_back(data);
    request_reader.set_buffer(buffer);

    std::size_t nparsed = 0;

    while (request_reader.code() != code::error_insufficient_data
           && request_reader.code() != code::end_of_message) {
        switch (request_reader.code()) {
        case code::error_set_method: //< NEW
        case code::error_use_another_connection: //< NEW
            // Can only happen in response parsing code.
            assert(false); //< NEW
        case code::error_invalid_data: //< NEW
        case code::error_no_host: //< NEW
        case code::error_invalid_content_length: //< NEW
        case code::error_content_length_overflow: //< NEW
        case code::error_invalid_transfer_encoding: //< NEW
        case code::error_chunk_size_overflow: //< NEW
            throw "invalid HTTP data"; //< NEW
        case code::skip:
            // do nothing
            break;
        case code::method:
            method = request_reader.value<token::method>();
            break;
        case code::request_target:
            request_target = request_reader.value<token::request_target>();
            break;
        case code::version:
            version = request_reader.value<token::version>();
            break;
        case code::field_name:
        case code::trailer_name:
            last_header = request_reader.value<token::field_name>();
        }

        nparsed += request_reader.token_size();
        request_reader.next();
    }
    nparsed += request_reader.token_size();
    request_reader.next();
    buffer.erase(0, nparsed);

    if (request_reader.code() == code::error_insufficient_data)
        return;

    ready();
}

And buffer management is complete. However, the code only demonstrated how to extract simple tokens. Field name and field value are simple tokens, but they are usually tied together into a complex structure.

void my_socket_consumer::on_socket_callback(asio::buffer data)
{
    using namespace http::token;
    using token::code;

    buffer.push_back(data);
    request_reader.set_buffer(buffer);

    std::size_t nparsed = 0;

    while (request_reader.code() != code::error_insufficient_data
           && request_reader.code() != code::end_of_message) {
        switch (request_reader.code()) {
        // ...
        case code::skip:
            break;
        case code::method:
            method = request_reader.value<token::method>();
            break;
        case code::request_target:
            request_target = request_reader.value<token::request_target>();
            break;
        case code::version:
            version = request_reader.value<token::version>();
            break;
        case code::field_name:
        case code::trailer_name:
            last_header = request_reader.value<token::field_name>();
            break;
        case code::field_value: //< NEW
        case code::trailer_value: //< NEW
            // NEW
            headers.emplace(last_header,
                            request_reader.value<token::field_value>());
        }

        nparsed += request_reader.token_size();
        request_reader.next();
    }
    nparsed += request_reader.token_size();
    request_reader.next();
    buffer.erase(0, nparsed);

    if (request_reader.code() == code::error_insufficient_data)
        return;

    ready();
}

last_header did the trick. Easy, but maybe we want to separate headers and trailers (the HTTP headers that are sent after the message body). This task can be accomplished by the use of structural tokens.

void my_socket_consumer::on_socket_callback(asio::buffer data)
{
    // NEW:
    // We have to declare `bool my_socket_consumer::use_trailers = false` and
    // `std::multimap<std::string, std::string> my_socket_consumer::trailers`.

    using namespace http::token;
    using token::code;

    buffer.push_back(data);
    request_reader.set_buffer(buffer);

    std::size_t nparsed = 0;

    while (request_reader.code() != code::error_insufficient_data
           && request_reader.code() != code::end_of_message) {
        switch (request_reader.code()) {
        // ...
        case code::skip:
            break;
        case code::method:
            method = request_reader.value<token::method>();
            break;
        case code::request_target:
            request_target = request_reader.value<token::request_target>();
            break;
        case code::version:
            version = request_reader.value<token::version>();
            break;
        case code::field_name:
        case code::trailer_name:
            last_header = request_reader.value<token::field_name>();
            break;
        case code::field_value:
        case code::trailer_value:
            // NEW
            (use_trailers ? trailers : headers)
                .emplace(last_header,
                         request_reader.value<token::field_value>());
            break;
        case code::end_of_headers: //< NEW
            use_trailers = true; //< NEW
        }

        nparsed += request_reader.token_size();
        request_reader.next();
    }
    nparsed += request_reader.token_size();
    request_reader.next();
    buffer.erase(0, nparsed);

    if (request_reader.code() == code::error_insufficient_data)
        return;

    ready();
}
Note

Maybe you had a gut feeling and thought that the previous code was too strange. If trailer_name is a separate token, why don’t we use request_reader.value<token::trailer_name>() (same to trailer_value) and go away with structural tokens?

Yes, I unnecessarily complicated the code here to introduce you the concept of structural tokens. They are very important and usually you’ll end up using them. Maybe this tutorial needs some revamping after the library evolved a few times.

Also notice that here you can use either request_reader.value<token::field_name>() or request_reader.value<token::trailer_name>() to extract this token value. It is as if trailer_name is “implicitly convertible” to field_name, so to speak. This feature makes the life of users who don’t need to differentiate headers and trailers much easier (with no drawback to the users who do need to differentiate them).

Some of the structural tokens' properties are:

  • No value<T>() associated. value<T>() extraction is a property exclusive of the data tokens.

  • It might be 0-sized.

  • They are always emitted (e.g. code::end_of_body will be emitted before code::end_of_message even if no code::body_chunk is present).

We were using the code::end_of_message structural token since the initial version of the code, so they aren’t completely alien. However, we were ignoring one very important HTTP parsing feature for this time. It’s the last missing bit before your understanding to use this library is complete. Our current code lacks the ability to handle HTTP pipelining.

HTTP pipelining is the feature that allows HTTP clients to send HTTP requests “in batch”. In other words, they may send several requests at once over the same connection before the server creates a response to them. If the previous code faces this situation, it’ll stop parsing on the first request and possibly wait forever until the on_socket_callback is called again with more data (yeap, networking code can be hard with so many little details).

void my_socket_consumer::on_socket_callback(asio::buffer data)
{
    using namespace http::token;
    using token::code;

    buffer.push_back(data);
    request_reader.set_buffer(buffer);

    std::size_t nparsed = 0;

    while (request_reader.code() != code::error_insufficient_data
           && request_reader.code() != code::end_of_message) {
        switch (request_reader.code()) {
        // ...
        case code::skip:
            break;
        case code::method:
            use_trailers = false; //< NEW
            headers.clear(); //< NEW
            trailers.clear(); //< NEW

            method = request_reader.value<token::method>();
            break;
        case code::request_target:
            request_target = request_reader.value<token::request_target>();
            break;
        case code::version:
            version = request_reader.value<token::version>();
            break;
        case code::field_name:
        case code::trailer_name:
            last_header = request_reader.value<token::field_name>();
            break;
        case code::field_value:
        case code::trailer_value:
            (use_trailers ? trailers : headers)
                .emplace(last_header,
                         request_reader.value<token::field_value>());
            break;
        case code::end_of_headers:
            use_trailers = true;
        }

        nparsed += request_reader.token_size();
        request_reader.next();
    }
    nparsed += request_reader.token_size();
    request_reader.next();
    buffer.erase(0, nparsed);

    if (request_reader.code() == code::error_insufficient_data)
        return;

    ready();

    if (buffer.size() > 0) //< NEW
        on_socket_callback(); //< NEW
}

There are HTTP libraries that could adopt a “synchronous” approach where the user must immediately give a HTTP response once the ready() callback is called so the parsing code can parse the whole buffer until the end and we could just put the ready() call into the code::end_of_message case.

There are HTTP libraries that follow ASIO active style and we expect the user to call something like async_read_request before it can read the next request. In this case, the solution for HTTP pipelining would be different.

There are libraries that don’t follow ASIO style, but don’t force the user to send HTTP responses immediately on the ready() callback. In such cases, synchronization/coordination of the response generation by the user and parse resuming by the library is necessary.

This point can be rather diverse and the code for this tutorial only shows a rather quick’n’dirty solution. Any different solution to keep the parsing train at full-speed is left as an exercise to the reader.

The interesting point about the code here is to clear the state of the to-be-parsed message before each request-response pair. In the previous code, this was done binding the “method token arrived” event — the first token in a HTTP request — with such state cleanup.

By now, you’re ready to use this library in your projects. You may want to check Boost.Http own usage of the parser or the Tufão library as real-world and complete examples of this parser.

2.2. Parsing (advanced)

In this tutorial, you’ll learn how to use this library to parse HTTP streams easily.

The architecture of the library is broken into two classes of parsers, the content parsers and the structural parsers.

The content parsers handle non-structural elements, terminal tokens, all easy to match and decode. They are stateless. They are mini-parsers for elements easy to parse and by themselves don’t add much value to justify a library (don’t confuse low value with valueless). They live in the boost::http::syntax namespace. They are useful when you want to parse individual HTTP elements like the range header value. We won’t see them in this tutorial.

The structural parsers handle structured data formats (e.g. HTTP). To achieve flexibility and performance requirements, they follow the incremental/pull parser model (a bit like the more traditional Iterator design pattern as described in the Gang-of-Four book, instead of C++ iterators). These parsers live in the boost::http::reader namespace. These are the parsers we will look into now.

In the future, we may add support for HTTP/2.0 stream format, but for now, we are left with two structural parsers:

  • boost::http::reader::request for HTTP/1.0 and HTTP/1.1 request messages.

  • boost::http::reader::response for HTTP/1.0 and HTTP/1.1 response messages.

Each structural parser is prepared to receive a continuous stream of messages (i.e. what NodeJS correctly refer to as keep-alive persistent streams). Because the structure of messages is flexible enough to be non-representable in simple non-allocating C++ structures, we don’t decode the whole stream as a single parsing result as this would force allocation. What we do instead is to feed the user with one token at a time and internally we keep a lightweight non-growing state required to decode further tokens.

We use the same token definition for HTTP requests and HTTP responses. The tokens can be either of status (e.g. error or skip), structural (e.g. boost::http::token::code::end_of_headers) or data (e.g. boost::http::token::code::field_name) categories. Only tokens of the data category have an associated value.

Each token is associated with a slice (possibly 0-sized if error token or a token from the structural category) of the byte stream. The process goes as follow:

  1. Set the buffer used by the reader.

  2. Consume tokens.

    1. Check code() return.

    2. Possibly call value<T>() to extract token value.

    3. Call next().

  3. Remove parsed data from the buffer.

    1. You’ll need to keep state of parsed vs unparsed data by calling token_size().

  4. If the address of the unparsed data changes, the reader is invalidated, so to speak. You can restore its valid state by setting the buffer to null or to the new address of the unparsed data.

Enough with abstract info. Take the following part of an HTTP stream:

GET / HTTP/1.1\r\n
Host: www.example.com\r\n
\r\n

This stream can be broken in the following series of tokens (order preserved):

  • method.

  • request_target.

  • skip.

  • version.

  • field_name.

  • field_value.

  • skip.

  • end_of_headers.

  • end_of_body.

  • end_of_message.

The parser is required to give you a token_size() so you can remove parsed data from the stream. However, the parser is not required to give the same series of tokens for the same stream. The strucutral and data tokens will always be emitted the same. However, the parser may choose to merge some status token (e.g. skip) with a data token (e.g. request_target). Therefore, the following series of tokens would also be possible for the same example given previously:

  • method.

  • skip.

  • request_target.

  • version.

  • field_name.

  • skip.

  • field_value.

  • end_of_headers.

  • end_of_body.

  • end_of_message.

This (non-)guarantee is to give freedom to vary the implementation. It’d be absurd to expect different implementations of this interface generating the same result byte by byte. You may expect different algorithms also in future versions.

Another useful feature of this non-guarantee is to make possible to discard skip tokens in the buffer, but merge them if the stream is received in the buffer at once.

Just imagine documenting the guarantees of the token stream if we were to make it predictable. It’d be insane.

However, there is one guarantee that the reader object must provide. It must not discard bytes of data tokens while the token is incomplete. To illustrate this point, let’s go to an example. Given the current token is request_target, you have the following code.

assert(parser.code() == boost::http::token::code::request_target);
auto value = parser.value<boost::http::token::request_target>();

While we traverse the stream, the parser will only match tokens. We don’t expect the parser to also decode the tokens. The parser will only decode the tokens if necessary to further match the following tokens. And even when the parser decod’em, the intermediary results may be discarded. In other words, match and decode are separate steps and you can spare CPU time when you don’t need to decode certain elements.

The point is that the token value must be extracted directly from the byte stream and the parser is not allowed to buffer data about the stream (or the decoded values, for that matter). The implication of this rule gives a guarantee about the token order and its relationship to the bytem stream.

You can imagine the stream as having two views. The tokens and the byte streams. The token view spans windows over the byte view.

tokens: | method | skip  | request_target | skip          | version
bytes:  |  GET   | <SPC> |     /          | <SPC> HTTP/1. | 1 <CRLF>

The slice of data associated with a data token can grow larger than the equivalent bytes:

tokens: |  method    | request_target | skip  | version
bytes:  |  GET <SPC> |     /          | <SPC> | HTTP/1.1 <CRLF>

But it cannot shrink smaller than its associated bytes:

tokens: | method | skip    | request_target | skip           | version
bytes:  |  GE    | T <SPC> |     /          | <SPC> HTTP/1.1 | <CRLF>

So you have a token interface easy to inspect and you have a lot of freedom to manage the underlying buffer. Let’s see the boost::http::reader::request parser as used in Tufão:

void HttpServerRequest::onReadyRead()
{
    if (priv->timeout)
        priv->timer.start(priv->timeout);

    priv->buffer += priv->socket.readAll();
    priv->parser.set_buffer(asio::buffer(priv->buffer.data(),
                                         priv->buffer.size()));

    std::size_t nparsed = 0;
    Priv::Signals whatEmit(0);
    bool is_upgrade = false;

    while(priv->parser.code() != http::token::code::error_insufficient_data) {
        switch(priv->parser.code()) {
        case http::token::code::error_set_method:
            qFatal("unreachable");
            break;
        case http::token::code::error_use_another_connection:
            qFatal("unreachable");
            break;
        case http::token::code::error_invalid_data:
        case http::token::code::error_no_host:
        case http::token::code::error_invalid_content_length:
        case http::token::code::error_content_length_overflow:
        case http::token::code::error_invalid_transfer_encoding:
        case http::token::code::error_chunk_size_overflow:
            priv->socket.close();
            return;
        case http::token::code::skip:
            break;
        case http::token::code::method:
            {
                clearRequest();
                priv->responseOptions = 0;
                auto value = priv->parser.value<http::token::method>();
                QByteArray method(value.data(), value.size());
                priv->method = std::move(method);
            }
            break;
        case http::token::code::request_target:
            {
                auto value = priv->parser.value<http::token::request_target>();
                QByteArray url(value.data(), value.size());
                priv->url = std::move(url);
            }
            break;
        case http::token::code::version:
            {
                auto value = priv->parser.value<http::token::version>();
                if (value == 0) {
                    priv->httpVersion = HttpVersion::HTTP_1_0;
                    priv->responseOptions |= HttpServerResponse::HTTP_1_0;
                } else {
                    priv->httpVersion = HttpVersion::HTTP_1_1;
                    priv->responseOptions |= HttpServerResponse::HTTP_1_1;
                }
            }
            break;
        case http::token::code::status_code:
            qFatal("unreachable");
            break;
        case http::token::code::reason_phrase:
            qFatal("unreachable");
            break;
        case http::token::code::field_name:
        case http::token::code::trailer_name:
            {
                auto value = priv->parser.value<http::token::field_name>();
                priv->lastHeader = QByteArray(value.data(), value.size());
            }
            break;
        case http::token::code::field_value:
            {
                auto value = priv->parser.value<http::token::field_value>();
                QByteArray header(value.data(), value.size());
                priv->headers.insert(priv->lastHeader, std::move(header));
                priv->lastHeader.clear();
            }
            break;
        case http::token::code::trailer_value:
            {
                auto value = priv->parser.value<http::token::trailer_value>();
                QByteArray header(value.data(), value.size());
                priv->trailers.insert(priv->lastHeader, std::move(header));
                priv->lastHeader.clear();
            }
            break;
        case http::token::code::end_of_headers:
            {
                auto it = priv->headers.find("connection");
                bool close_found = false;
                bool keep_alive_found = false;
                for (;it != priv->headers.end();++it) {
                    auto value = boost::string_ref(it->data(), it->size());
                    http::header_value_any_of(value, [&](boost::string_ref v) {
                        if (iequals(v, "close"))
                            close_found = true;

                        if (iequals(v, "keep-alive"))
                            keep_alive_found = true;

                        if (iequals(v, "upgrade"))
                            is_upgrade = true;

                        return false;
                    });
                    if (close_found)
                        break;
                }
                if (!close_found
                    && (priv->httpVersion == HttpVersion::HTTP_1_1
                        || keep_alive_found)) {
                    priv->responseOptions |= HttpServerResponse::KEEP_ALIVE;
                }
                whatEmit = Priv::READY;
            }
            break;
        case http::token::code::body_chunk:
            {
                auto value = priv->parser.value<http::token::body_chunk>();
                priv->body.append(asio::buffer_cast<const char*>(value),
                                  asio::buffer_size(value));
                whatEmit |= Priv::DATA;
            }
            break;
        case http::token::code::end_of_body:
            break;
        case http::token::code::end_of_message:
            priv->parser.set_buffer(asio::buffer(priv->buffer.data() + nparsed,
                                                 priv->parser.token_size()));
            whatEmit |= Priv::END;
            disconnect(&priv->socket, SIGNAL(readyRead()),
                       this, SLOT(onReadyRead()));
            break;
        }

        nparsed += priv->parser.token_size();
        priv->parser.next();
    }
    nparsed += priv->parser.token_size();
    priv->parser.next();
    priv->buffer.remove(0, nparsed);

    if (is_upgrade) {
        disconnect(&priv->socket, SIGNAL(readyRead()),
                   this, SLOT(onReadyRead()));
        disconnect(&priv->socket, SIGNAL(disconnected()),
                   this, SIGNAL(close()));
        disconnect(&priv->timer, SIGNAL(timeout()), this, SLOT(onTimeout()));

        priv->body.swap(priv->buffer);
        emit upgrade();
        return;
    }

    if (whatEmit.testFlag(Priv::READY)) {
        whatEmit &= ~Priv::Signals(Priv::READY);
        this->disconnect(SIGNAL(data()));
        this->disconnect(SIGNAL(end()));
        emit ready();
    }

    if (whatEmit.testFlag(Priv::DATA)) {
        whatEmit &= ~Priv::Signals(Priv::DATA);
        emit data();
    }

    if (whatEmit.testFlag(Priv::END)) {
        whatEmit &= ~Priv::Signals(Priv::END);
        emit end();
        return;
    }
}

Boost.Http higher level message framework’s socket has a buffer of fixed size and cannot have the luxury of appending data every time. Both high level projects have many fundamental differences.

Boost.Http Tufão

Boost.Asio active style.

Qt event loop passive style.

Boost usage allowed.

It uses this header-only parser lib at Tufão build time and Tufão user will never need Boost again.

Message-based framework which allows different backends to be plugged later keeping the same handlers.

Tied to HTTP/1.1 embedded server.

Callbacks and completion tokens. It may read more than asked for, but it’ll use read_state to keep user informed.

Combined with Qt network reactive programming style, it has a strange logic related to event signalling.

Proper HTTP upgrade semantics.

Strange HTTP upgrade semantics thanks to the immaturity of following NodeJS design decisions.

It normalizes all header keys to lower case.

Case insensitive string classes for the C++ counterpart of the HTTP field names structure.

These are the main differences that I wanted to note. You can be sure this parser will fit you and it’ll be easy to use. And more importantly, easy to use right. NodeJS parser demands too much HTTP knowledge on the user behalf. And thanks to the NodeJS parser hard to use API, Tufão only was able to support proper HTTP pipelining once it migrated to Boost.Http parser (although Boost.Http managed to do lots of ninja techs to support it under NodeJS parser).

To sum up the data received handler structure, you need:

  1. Get the buffer right with parser.set_buffer(buf).

  2. Loop to consume — parser.next() — tokens while http::token::code::error_insufficient_data.

    1. Examine token with parser.code().

    2. Maybe handle error.

    3. Extract data with parser.value<T>() if a data token.

  3. Remove parsed data.

There are a lot of different HTTP server/client models you can build on top of this framework and the notification style you’re to use is entirely up to you. Most likely, you’ll want to hook some actions when the always-to-be-triggered delimiters category tokens (e.g. boost::http::token::code::end_of_headers) are reached.

2.3. Parsing HTTP upgrade

Given you already know the basics, parsing HTTP upgrade is trivial. Because the HTTP parser doesn’t take ownership of the buffer and you pretty much know up until which point the stream was parsed as HTTP.

All you gotta do is consume all the HTTP data (i.e. watch for code::end_of_message) and parse the rest of the buffer as the new protocol. Here is the Tufão code to update an HTTP client to WebSocket:

inline bool WebSocketHttpClient::execute(QByteArray &chunk)
{
    if (errored)
        return false;

    parser.set_buffer(asio::buffer(chunk.data(), chunk.size()));

    std::size_t nparsed = 0;

    while(parser.code() != http::token::code::error_insufficient_data) {
        switch(parser.code()) {
        case http::token::code::error_set_method:
            qFatal("unreachable: we did call `set_method`");
            break;
        case http::token::code::error_use_another_connection:
            errored = true;
            return false;
        case http::token::code::error_invalid_data:
            errored = true;
            return false;
        case http::token::code::error_no_host:
            qFatal("unreachable");
            break;
        case http::token::code::error_invalid_content_length:
            errored = true;
            return false;
        case http::token::code::error_content_length_overflow:
            errored = true;
            return false;
        case http::token::code::error_invalid_transfer_encoding:
            errored = true;
            return false;
        case http::token::code::error_chunk_size_overflow:
            errored = true;
            return false;
        case http::token::code::skip:
            break;
        case http::token::code::method:
            qFatal("unreachable");
            break;
        case http::token::code::request_target:
            qFatal("unreachable");
            break;
        case http::token::code::version:
            if (parser.value<http::token::version>() == 0) {
                errored = true;
                return false;
            }

            break;
        case http::token::code::status_code:
            status_code = parser.value<http::token::status_code>();
            if (status_code != 101) {
                errored = true;
                return false;
            }

            parser.set_method("GET");
            break;
        case http::token::code::reason_phrase:
            break;
        case http::token::code::field_name:
            {
                auto value = parser.value<http::token::field_name>();
                lastHeader = QByteArray(value.data(), value.size());
            }
            break;
        case http::token::code::field_value:
            {
                auto value = parser.value<http::token::field_value>();
                QByteArray header(value.data(), value.size());
                headers.insert(lastHeader, std::move(header));
                lastHeader.clear();
            }
            break;
        case http::token::code::end_of_headers:
            break;
        case http::token::code::body_chunk:
            break;
        case http::token::code::end_of_body:
            break;
        case http::token::code::trailer_name:
            break;
        case http::token::code::trailer_value:
            break;
        case http::token::code::end_of_message:
            ready = true;
            parser.set_buffer(asio::buffer(chunk.data() + nparsed,
                                           parser.token_size()));
            break;
        }

        nparsed += parser.token_size();
        parser.next();
    }
    nparsed += parser.token_size();
    parser.next();
    chunk.remove(0, nparsed);

    if (ready && headers.contains("Upgrade"))
        return true;

    return false;
}

2.4. Implementing Boost.Beast parser interface

In this tutorial, we’ll show you how to implement Boost.Beast parser interface using Boost.Http parser.

Note
No prior experience with Boost.Beast is required.

Boost.Beast parser borrows much of its design from Ryan Dahl’s HTTP parser — the NodeJS parser. This is a design that I do know and used more than once in different projects. However, this is not the only design I know of. I see much of this design as an evolution of the language limitations that you find in the C language.

I’ve previously written a few complaints about the Ryan Dahl’s HTTP parser. Boost.Beast evolves from this design and takes a few different decisions. We’ll see if, and which, limitations the Boost.Beast parser still carries thanks to this inheritance.

2.4.1. Learning how to use the parser

The parser is represented by a basic_parser<isRequest, Derived> class. The parser is callback-based just like Ryan Dahl’s HTTP parser and it uses CRTP to avoid the overhead of virtual function calls.

Note
DESIGN IMPLICATIONS

And here we can notice the first difference between Boost.Beast and Boost.Http parsers.

If you design an algorithm to work on the parser object, this algorithm must be a template (and it carries the same drawbacks of a header-only library):

template<class Derived>
void do_stuff(boost::beast::http::basic_parser<true, Derived> o);

But if you use Boost.Http parser, this requirement vanishes:

void do_stuff(boost::http::reader::request o);

To feed the parser with data, you call basic_parser::put:

template<
    class ConstBufferSequence>
std::size_t
put(
    ConstBufferSequence const& buffers,
    error_code& ec);

The parser will match and decode the tokens in the stream.

Note
DESIGN IMPLICATIONS

Match and decoding are always applied together. What does this mean? Not much, given decoding HTTP/1.1 tokens is cheap and most of the time it reduces to return a string_view of the associated buffer region.

However, the implications of the fundamental model chosen (pull or push) give rise to larger divergences at this point already.

In the Boost.Beast parser, the token is passed to the function callback registered. In the Boost.Http parser, the token is hold by the parser object itself.

It might not seem much difference at the first glance, but consider the problem of composability. If I want to write an algorithm to take a boost::http::reader::request object to read a field_name token and possibly skip it together with the associated field_value (maybe that field isn’t of your interest), you only need to wrap the parser object if you’re using the Boost.Http solution [7]. As for the Boost.Beast parser, you’re left with a long and hackishy inheritance chain that I don’t even want to imagine right now if you were to compose the algorithms.

Just as a concrete illustration of what I meant by the Boost.Http solution:

template<class T /* = http::reader::request */>
void socket_consumer<T>::parse(asio::buffer buf) { /* ... */ }

http::token::code::value my_parser_wrapper::code() const
{
    return code_;
}

void my_parser_wrapper::next()
{
    // Here I should use a case-insensitive comparison function.
    // For simplicity, this step is omitted.

    wrapped_parser.next();
    code_ = wrapped_parser.code();
    if (code_ == code::field_name
        && (wrapped_parser.value<token::field_name>()
            == "connection")) {
        code_ = code::skip;
        skip_next_value = true;
    } else if (code_ == code::field_value
               && skip_next_value) {
        code_ = code::skip;
        skip_next_value = false;
    }
}

How would the solution look like using the Boost.Beast parser? Let’s draft something:

template<class Derived>
class my_parser_wrapper
    : public basic_parser<true, my_parser_wrapper<Derived>>
{
public:
    void
    on_field_impl(field f, string_view name, string_view value, error_code& ec)
    {
        // Here I should use a case-insensitive comparison function.
        // For simplicity, this step is omitted.

        if (name == "connection") {
            return;
        }

        static_cast<Derived&>(*this).on_field_impl(f, name, value, ec);
    }

    void on_header_impl(error_code& ec)
    {
        static_cast<Derived&>(*this).on_header_impl(ec);
    }

    // ...
};

template<template <typename> class parser>
class custom_parser
    : public parser<custom_parser>
{
    // Here we have our original handler.
    // It'd be our `my_socket_consumer` from the Boost.Http example.
};

custom_parser<my_parser_wrapper>;

However, this solution is fully flawed. As my_parser_wrapper inherits directly from basic_parser, we cannot inject some my_parser_wrapper2 which applies another transformation in the chain.

I can’t emphasize enough that the problem is not about adding a “skip-this-set-of-HTTP-headers” function. The problem is about a fundamental building block which can solve more of the user needs. I could keep thinking about the different problems that could happen, but if you do not give a try to enter in the general problem and insist on a myopic vision, you’ll never grasp my message (just as an addicted to inductive reasoning will never understand someone who is using deductive reasoning). If all you have is a hammer, everything looks like a nail. We shall see more design implications later on as we continue this chapter.

As the tokens are found, the user callbacks are called. The function returns the number of parsed bytes.

Note
DESIGN IMPLICATIONS

And as each sentence goes on, it seems that I need to explain more design implications.

What if you want to reject messages as soon as one specific token is found? The point here is about avoiding unnecessary computation of parsing elements of a message that would be rejected anyway.

For Boost.Http parser, the control flow is yours to take and…​

Do what thou wilt shall be the whole of the Law.

— Aleister Crowley

A concrete example if you may:

// ...

case code::field_value:
    if (last_header_was_x && is_good(reader.value<token::field_value>())) {
        // stop the world (e.g. `return` or `throw`)
    }

// ...

As for Boost.Beast parser. There is an answer, but not with your current limited knowledge of the API. Let’s continue to present Boost.Beast API and come back at this “stop the world” problem later.

The behaviour usually found in push parsers is to parse the stream until the end of the feeded buffers and then return. This is the NodeJS’s parser approach from which Boost.Beast takes much inspiration. However, Boost.Beast takes a slightly different approach to this problem so it’s possible to parse only one token at a time. The Boost.Beast solution is the eager function:

void
eager(
    bool v);

Normally the parser returns after successfully parsing a structured element (header, chunk header, or chunk body) even if there are octets remaining in the input. This is necessary when attempting to parse the header first, or when the caller wants to inspect information which may be invalidated by subsequent parsing, such as a chunk extension. The eager option controls whether the parser keeps going after parsing structured element if there are octets remaining in the buffer and no error occurs. This option is automatically set or cleared during certain stream operations to improve performance with no change in functionality.

The default setting is false.

— Vinnie Falco
Boost.Beast documentation
Note
DESIGN IMPLICATIONS

And now, back at the “stop the world” problem…​

Simply put, Boost.Beast solution is just a hackishy way to implement a pull parser — the parser approach consciously chosen by Boost.Http parser.

Alternatively, you can just set the error_code& ec on the callback implementation to stop parsing, but this wouldn’t solve all the use cases (the reason why eager is provided).

Continuing this inductive reasoning of “hey! a problem appeared, let’s write yet another function, function_xyz, to solve use case 777”, a number of other functions are provided. One of them is header_limit:

void
header_limit(
    std::uint32_t v);

This function sets the maximum allowed size of the header including all field name, value, and delimiter characters and also including the CRLF sequences in the serialized input. If the end of the header is not found within the limit of the header size, the error http::header_limit is returned by http::basic_parser::put.

Setting the limit after any header octets have been parsed results in undefined behavior.

— Vinnie Falco
Boost.Beast documentation

Another function, body_limit, is provided in the same spirit of header_limit. What if I have a use case to limit request-target size? Then Boost.Beast author will add function_xyz2 to use case 778.

Note
DESIGN IMPLICATIONS

What is the Boost.Http solution to this problem 🤔? This is broken into two possible cases.

  1. The whole token is in the buffer: In such case you just need to check token_size.

  2. The buffer has been exhausted and no token is there: Here, just check expected_token.

It’ll work for any token (i.e. you don’t need one extra function for each possible token which would just complicate the implementation and inflate the object with a large Settings object of some sort).

With all this info, the Boost.Beast parser is mostly covered and we can delve into the implementation of such interface.

Note
DESIGN IMPLICATIONS

Now…​ let’s look at something different. Suppose the following scenario:

You have an embedded project and the headers must not be stored (as it’d imply heap memory of complex data structures). You process options with an in situ algorithm out from the headers. In Boost.Http parser, I’m imagining something in these lines:

enum last_header_code {
    OUT_OF_INTEREST_SET,
    FOO,
    BAR,
    FOOBAR
};

// ...

case code::field_name:
    {
        auto v = reader.value<token::field_name>();
        if (iequals(v, "foo")) {
            last_header = FOO;
        } else if (iequals(v, "bar")) {
            last_header = BAR;
        } else if (iequals(v, "foobar")) {
            last_header = FOOBAR;
        } else {
            last_header = OUT_OF_INTEREST_SET;
        }
        break;
    }
case code::field_value:
    if (last_header == FOO) {
        foo = process_foo(reader.value<token::field_value>());
    } else if (last_header == BAR) {
        bar = process_bar(reader.value<token::field_value>());
    } else if (last_header == FOOBAR) {
        foobar = process_foobar(reader.value<token::field_value>());
    }
    break;

// ...

Boost.Beast solution is not hard to imagine too:

// ...

void on_field_impl(field, string_view name, string_view value, error_code& ec)
{
    if (iequals(name, "foo")) {
        foo = process_foo(value);
    } else if (iequals(name, "bar")) {
        bar = process_bar(value);
    } else if (iequals(name, "foobar")) {
        foobar = process_foobar(value);
    }
}

// ...

So…​ what does each design implies? As Boost.Beast parser always parse field name + field value together, if both fields sum up more than the buffer size, you’re out of luck. Both tokens must fit in the buffer together.

Just as an exercise, let’s pursue the inductive reasoning applied to this problem. We could split the Boost.Beast’s on_field_impl callback into two:

void
on_field_name_impl(
    field f,
    string_view name,
    error_code& ec);

void
on_field_value_impl(
    string_view value,
    error_code& ec);

But then we create another problem:

[…​] it is the responsibility of the derived class to copy any information it needs before returning from the callback […​]

— Vinnie Falco
Boost.Beast documentation

If you don’t see a problem already, let me unveil it for you. Now, most of the uses of the parser, which want to store the HTTP headers in some sort of std::multimap structure will have to perform one extra allocation:

void on_field_name_impl(field, string_view name, error_code& ec)
{
    last_header = to_string(name);
}

Under the push parser model, these two cases are irreconcilable. Boost.Beast opts to solve the most common problem and this was a good design choice (let’s give credit where credit is due).

However, Boost.Http parser is a good choice in any of these two cases. It only feeds one token at a time. And as Boost.Http message framework demonstrate, we can use the first bytes of the buffer to store the HTTP field name.

And just to present a more readable alternative, you could play with copies of the reader object made in the stack of the my_socket_consumer::on_socket_callback function. This way, you have a point in time and you can make the parser “go back”. The copies are cheap because the reader object is just an integer-based state machine with a few indexes. The idea behind this solution is to mirror current Boost.Beast behaviour — field name and field value are always kept together in the buffer.

Remember…​ principles. I can attack other specific cases. As an exercise, try to find a few yourself.

2.4.2. Implementing the Boost.Beast interface

Note

As we previously seen, there are several functions in Boost.Beast parser that are just boilerplate inherited (e.g. eager) thanks to the choice of the wrong fundamental model (i.e. pull vs push).

We’ll skip some of this boilerplate as it is not of our interest. Our purpose with this tutorial was to show design implications derived from the choices of the fundamental models.

template<bool isRequest, class Derived>
class basic_parser;

// This template specialization is wrong, but is kept for simplification
// purposes.
template<bool isRequest, class Derived>
class basic_parser<true, Derived>
{
public:
    template<
        class ConstBufferSequence>
    std::size_t
    put(
        ConstBufferSequence const& buffers,
        error_code& ec)
    {
        // WARNING: the real implementation will have more trouble because of
        // the `ConstBufferSequence` concept, but for the reason of simplicity,
        // we don't show the real code here.
        reader.set_buffer(buffers);

        error_code ec;

        while (reader.code() != code::error_insufficient_data) {
            switch (reader.code()) {
            case code::skip:
                break;
            case code::method:
                method = reader.value<token::method>;
                break;
            case code::request_target:
                target = reader.value<token::request_target>;
                break;
            case code::version:
                static_cast<Derived&>(*this)
                    .on_request_impl(/*the enum code*/, method, target,
                                     reader.value<token::version>(), ec);
                if (ec) {
                    // TODO: extra code to enter in error state
                    return reader.parsed_count();
                }
                break;

            // ...

            case code::end_of_headers:
                static_cast<Derived&>(*this).on_header_impl(ec);
                if (ec) {
                    // TODO: extra code to enter in error state
                    return reader.parsed_count();
                }
                break;

            // ...
            }
            reader.next();
        }

        return reader.parsed_count();
    }

private:
    boost::http::reader::request reader;

    // It's possible and easy to create an implementation that doesn't allocate
    // memory. Just keep a copy of `reader` within the `put` function body and
    // you can go back. As `reader` is just an integer-based state machine with
    // a few indexes, the copy is cheap. I'm sorry I don't have the time to code
    // the demonstration right now.
    std::string method;
    std::string target;
};

A final note I want to add is that I plan more improvements to the parser. Just as Boost.Beast parser is an evolution of the wrong model chosen for the problem, my parser still has room to evolve. But from my judgment, this parser already is better than Boost.Beast parser can ever be (i.e. the problems I presented here are unfixable in Boost.Beast design…​ not to mention that Boost.Beast parser has almost the double amount of member-functions to solve the same problem [8]).

3. Design choices

3.1. FAQ

  1. How robust is this parser?

    It doesn’t try to parse URLs at all. It’ll ensure that only valid characters (according to HTTP request target BNF rule) are present, but invalid sequences are accepted.

    This parser is a little (but not too much) more liberal in what accepts and it’ll accept invalid sequences for rarely used elements that don’t impact upper layers of the application. The reason to accept such non-conformant sequences is a simpler algorithm that can be more performant (e.g. we only check for invalid chars, but not invalid sequences). Therefore, it’s advised to reframe the message if you intend to forward it to some other participant. Not doing so, might be a security issue if the participant you are forwarding your message to is know to show improper behaviour when parsing invalid streams. There are several references within the RFC7230 where similar decisions are suggested (e.g. if you receive redundant Content-Length header field, you must merge it into one before forwarding or reject the message as a whole).

  2. Why not make a parser using Boost.Spirit?

    Boost.Spirit needs backtracking to implement the OR operator. Boost.Spirit can’t build a state machine which would allow you to continue parsing from the suspended point/byte. Thanks to these characteristics, it can’t be used in our HTTP parser. Also, we don’t see much benefit in pursuing this effort.

  3. What is the recommended buffer size?

    A buffer of size 7990 is recommended (suggested request line of 8000 by section 3.1.1 of RFC7230 minus spaces, minus http version information minus the minimum size of the other token in the request line). However, the real suggested buffer size should be how long names you expect to have on your own servers.

  4. What are the differences between reader::request and reader::response?

    • response has the void set_method(view_type method) member-function.

      Warning
      This member-function MUST be called for each HTTP response message being parsed in the stream.
    • response has the void puteof() member-function.

    • code() member function return value has different guarantees in each class.

    • template<class T> typename T::type value() const member function accepts different input template arguments in each class.

3.2. Roadmap

  • Parsers combinators.

  • Incremental message generator.

  • Iterator adaptors.

4. Reference

All declarations from this library resides within the boost::http namespace. For brevity, this prefix is not repeated on the documentation.

4.2. Detailed

4.2.1. token::code::value

#include <boost/http/token.hpp>
namespace token {

struct code
{
    enum value
    {
        error_insufficient_data,
        error_set_method,
        error_use_another_connection,
        error_invalid_data,
        error_no_host,
        error_invalid_content_length,
        error_content_length_overflow,
        error_invalid_transfer_encoding,
        error_chunk_size_overflow,
        skip,
        method,
        request_target,
        version,
        status_code,
        reason_phrase,
        field_name,
        field_value,
        end_of_headers,
        body_chunk,
        end_of_body,
        trailer_name,
        trailer_value,
        end_of_message
    };
};

} // namespace token
error_insufficient_data

token_size() of this token will always be zero.

4.2.2. token::symbol::value

#include <boost/http/token.hpp>
namespace token {

struct symbol
{
    enum value
    {
        error,

        skip,

        method,
        request_target,
        version,
        status_code,
        reason_phrase,
        field_name,
        field_value,

        end_of_headers,

        body_chunk,

        end_of_body,

        trailer_name,
        trailer_value,

        end_of_message
    };

    static value convert(code::value);
};

} // namespace token

4.2.3. token::category::value

#include <boost/http/token.hpp>
namespace token {

struct category
{
    enum value
    {
        status,
        data,
        structural
    };

    static value convert(code::value);
    static value convert(symbol::value);
};

} // namespace token

4.2.4. token::skip

#include <boost/http/token.hpp>
namespace token {

struct skip
{
    static const token::code::value code = token::code::skip;
};

} // namespace token

Used to skip unneeded bytes so user can keep buffer small when asking for more data.

4.2.5. token::field_name

#include <boost/http/token.hpp>
namespace token {

struct field_name
{
    typedef boost::string_ref type;
    static const token::code::value code = token::code::field_name;
};

} // namespace token

4.2.6. token::field_value

#include <boost/http/token.hpp>
namespace token {

struct field_value
{
    typedef boost::string_ref type;
    static const token::code::value code = token::code::field_value;
};

} // namespace token

4.2.7. token::body_chunk

#include <boost/http/token.hpp>
namespace token {

struct body_chunk
{
    typedef asio::const_buffer type;
    static const token::code::value code = token::code::body_chunk;
};

} // namespace token

4.2.8. token::end_of_headers

#include <boost/http/token.hpp>
namespace token {

struct end_of_headers
{
    static const token::code::value code = token::code::end_of_headers;
};

} // namespace token

4.2.9. token::end_of_body

#include <boost/http/token.hpp>
namespace token {

struct end_of_body
{
    static const token::code::value code = token::code::end_of_body;
};

} // namespace token

4.2.10. token::trailer_name

#include <boost/http/token.hpp>
namespace token {

struct trailer_name
{
    typedef boost::string_ref type;
    static const token::code::value code = token::code::trailer_name;
};

} // namespace token
Note

This token is “implicitly convertible” to field_name, so to speak. In other words, you can treat it as field_name at value extraction time (i.e. the reader::{request,response}::value<T>() function).

4.2.11. token::trailer_value

#include <boost/http/token.hpp>
namespace token {

struct trailer_value
{
    typedef boost::string_ref type;
    static const token::code::value code = token::code::trailer_value;
};

} // namespace token
Note

This token is “implicitly convertible” to field_value, so to speak. In other words, you can treat it as field_value at value extraction time (i.e. the reader::{request,response}::value<T>() function).

4.2.12. token::end_of_message

#include <boost/http/token.hpp>
namespace token {

struct end_of_message
{
    static const token::code::value code = token::code::end_of_message;
};

} // namespace token

4.2.13. token::method

#include <boost/http/token.hpp>
namespace token {

struct method
{
    typedef boost::string_ref type;
    static const token::code::value code = token::code::method;
};

} // namespace token

4.2.14. token::request_target

#include <boost/http/token.hpp>
namespace token {

struct request_target
{
    typedef boost::string_ref type;
    static const token::code::value code = token::code::request_target;
};

} // namespace token

4.2.15. token::version

#include <boost/http/token.hpp>
namespace token {

struct version
{
    typedef int type;
    static const token::code::value code = token::code::version;
};

} // namespace token

4.2.16. token::status_code

#include <boost/http/token.hpp>
namespace token {

struct status_code
{
    typedef uint_least16_t type;
    static const token::code::value code = token::code::status_code;
};

} // namespace token

4.2.17. token::reason_phrase

#include <boost/http/token.hpp>
namespace token {

struct reason_phrase
{
    typedef boost::string_ref type;
    static const token::code::value code = token::code::reason_phrase;
};

} // namespace token

4.2.18. reader::request

#include <boost/http/reader/request.hpp>

This class represents an HTTP/1.1 (and HTTP/1.0) incremental parser. It’ll use the token definitions found in token::code::value. You may want to check the basic parsing tutorial to learn the basics.

Important

Once the parser enters in an error state (and the error is different than token::code::error_insufficient_data), the internal buffer is said to be in an invalidated state. Therefore, the parser won’t access the data anymore and the user is free to invalidate the data (e.g. resize/free it) without calling set_buffer() or reset() first.

If you want to reuse the same reader object to parse another stream, just call reset().

4.2.18.1. Member types
typedef std::size_t size_type

Type used to represent sizes.

typedef const char value_type

Type used to represent the value of a single element in the buffer.

typedef value_type *pointer

Pointer-to-value type.

typedef boost::string_ref view_type

Type used to refer to non-owning string slices.

4.2.18.2. Member functions
request()

Constructor.

void reset()

After a call to this function, the object has the same internal state as an object that was just constructed.

token::code::value code() const

Use it to inspect current token. Returns code.

Note

The following values are never returned:

  • token::code::error_set_method.

  • token::code::error_use_another_connection.

  • token::code::status_code.

  • token::code::reason_phrase.

token::symbol::value symbol() const

Use it to inspect current token. Returns symbol.

Note

The following values are never returned:

  • token::symbol::status_code.

  • token::symbol::reason_phrase.

token::category::value category() const

Use it to inspect current token. Returns category.

size_type token_size() const

Returns the size of current token.

Note

After you call next(), you’re free to remove, from the buffer, the amount of bytes equals to the value returned here.

If you do remove the parsed data from the buffer, the address of the data shouldn’t change (i.e. you must not invalidate the pointers/iterators to old unparsed data). If you do change the address of old unparsed data, call set_buffer before using this object again.

Example
std::size_t nparsed = reader.token_size();
reader.next();
buffer.erase(0, nparsed);
reader.set_buffer(buffer);
Warning
Do not use string_length(reader.value<T>()) to compute the token size. string_length(reader.value<T>()) and reader.token_size() may differ. Check the advanced parsing tutorial for more details.
template<class T> typename T::type value() const

Extracts the value of current token and returns it.

T must be one of:

  • token::method.

  • token::request_target.

  • token::version.

  • token::field_name.

  • token::field_value.

  • token::body_chunk.

    Warning
    The assert(code() == T::code) precondition is assumed.
    Note
    This parser doesn’t buffer data. The value is extracted directly from buffer.
token::code::value expected_token() const

Returns the expected token code.

Useful when the buffer has been exhausted and code() == token::code::error_insufficient_data. Use it to respond with “URL/HTTP-header/…​ too long” or another error-handling strategy.

Warning

The returned value is a heuristic, not a truth. If your buffer is too small, the buffer will be exhausted with too little info to know which element is expected for sure.

For instance, expected_token() might return token::code::field_name, but when you have enough info in the buffer, the actual token happens to be token::code::end_of_headers.

void next()

Consumes the current token and advances in the buffer.

Note
Given the current token is complete (i.e. code() != token::code::error_insufficient_data), a call to this function always consumes the current token.
void set_buffer(asio::const_buffer inbuffer)

Sets buffer to inbuffer.

Note

inbuffer should hold the data at the same point of unparsed data from the internal buffer from before this call.

Example
std::size_t nparsed = reader.token_size();

// now unparsed data becomes ahead
// of `buffer.begin()`
reader.next();

reader.set_buffer(buffer + nparsed);
Warning
The reader object follows the HTTP stream orchestrated by the continuous flow of set_buffer() and next(). You should treat this region as read-only. For instance, if I pass "header-a: something" to the reader and then change the contents to "header-a: another thing", there are no guarantees about the reader object behaviour. You can safely change only the contents of the buffer region not yet exposed to reader through reader.set_buffer(some_buffer) (i.e. the region outside of some_buffer never seen by reader).
Note

You’re free to pass larger buffers at will.

You’re also free to pass a buffer just as big as current token (i.e. token_size()). In other words, you’re free to shrink the buffer if the new buffer is at least as big as current token.

Tip

If you want to free the buffer while maintaining the reader object valid, just set the buffer to current token size, call next() and then set buffer to an empty buffer.

Do notice that this will consume current token as well. And as values are decoded directly from the buffer, this strategy is the only choice.

Example
reader.set_buffer(boost::asio::buffer(buffer, reader.token_size()));
reader.next();
reader.set_buffer(boost::asio::const_buffer());
buffer.clear();
size_type parsed_count() const

Returns the number of bytes parsed since set_buffer was last called.

Tip

You can use it to go away with the nparsed variable shown in the principles on parsing tutorial. I’m sorry about the “you must keep track of the number of discarded bytes” lie I told you before, but as one great explainer once told:

As I look upon you…​ it occurs to me that you may not have the necessary level of maturity to handle the truth.

— Scott Meyers
C++ and Beyond 2012: Universal References in C++11

That lie was useful to explain some core concepts behind this library.

4.2.19. reader::response

#include <boost/http/reader/response.hpp>

This class represents an HTTP/1.1 (and HTTP/1.0) incremental parser. It’ll use the token definitions found in token::code::value. You may want to check the basic parsing tutorial to learn the basics.

Important

Once the parser enters in an error state (and the error is different than token::code::error_insufficient_data), the internal buffer is said to be in an invalidated state. Therefore, the parser won’t access the data anymore and the user is free to invalidate the data (e.g. resize/free it) without calling set_buffer() or reset() first.

If you want to reuse the same reader object to parse another stream, just call reset().

4.2.19.1. Member types
typedef std::size_t size_type

Type used to represent sizes.

typedef const char value_type

Type used to represent the value of a single element in the buffer.

typedef value_type *pointer

Pointer-to-value type.

typedef boost::string_ref view_type

Type used to refer to non-owning string slices.

4.2.19.2. Member functions
response()

Constructor.

void set_method(view_type method)

Use it to inform the request method of the request message associated with this response message. This is necessary internally to compute the body size. If you do not call this function when code() == token::code::status_code, then token::code::error_set_method will be the next token.

Warning
The assert(code() == token::code::status_code) precondition is assumed.
void reset()

After a call to this function, the object has the same internal state as an object that was just constructed.

void puteof()

If the connection is closed, call this function. HTTP/1.0 used this event to signalize token::code::end_of_body.

token::code::value code() const

Use it to inspect current token. Returns code.

Note

The following values are never returned:

  • token::code::error_no_host.

  • token::code::method.

  • token::code::request_target.

token::symbol::value symbol() const

Use it to inspect current token. Returns symbol.

Note

The following values are never returned:

  • token::symbol::method.

  • token::symbol::request_target.

token::category::value category() const

Use it to inspect current token. Returns category.

size_type token_size() const

Returns the size of current token.

Note

After you call next(), you’re free to remove, from the buffer, the amount of bytes equals to the value returned here.

If you do remove the parsed data from the buffer, the address of the data shouldn’t change (i.e. you must not invalidate the pointers/iterators to old unparsed data). If you do change the address of old unparsed data, call set_buffer before using this object again.

Example
std::size_t nparsed = reader.token_size();
reader.next();
buffer.erase(0, nparsed);
reader.set_buffer(buffer);
Warning
Do not use string_length(reader.value<T>()) to compute the token size. string_length(reader.value<T>()) and reader.token_size() may differ. Check the advanced parsing tutorial for more details.
template<class T> typename T::type value() const

Extracts the value of current token and returns it.

T must be one of:

  • token::status_code.

  • token::version.

  • token::reason_phrase.

  • token::field_name.

  • token::field_value.

  • token::body_chunk.

    Warning
    The assert(code() == T::code) precondition is assumed.
    Note
    This parser doesn’t buffer data. The value is extracted directly from buffer.
token::code::value expected_token() const

Returns the expected token code.

Useful when the buffer has been exhausted and code() == token::code::error_insufficient_data. Use it to log error to cout or another error-handling strategy.

Warning

The returned value is a heuristic, not a truth. If your buffer is too small, the buffer will be exhausted with too little info to know which element is expected for sure.

For instance, expected_token() might return token::code::field_name, but when you have enough info in the buffer, the actual token happens to be token::code::end_of_headers.

void next()

Consumes the current token and advances in the buffer.

Note
Given the current token is complete (i.e. code() != token::code::error_insufficient_data), a call to this function always consumes the current token.
void set_buffer(asio::const_buffer inbuffer)

Sets buffer to inbuffer.

Note

inbuffer should hold the data at the same point of unparsed data from the internal buffer from before this call.

Example
std::size_t nparsed = reader.token_size();

// now unparsed data becomes ahead
// of `buffer.begin()`
reader.next();

reader.set_buffer(buffer + nparsed);
Warning
The reader object follows the HTTP stream orchestrated by the continuous flow of set_buffer() and next(). You should treat this region as read-only. For instance, if I pass "header-a: something" to the reader and then change the contents to "header-a: another thing", there are no guarantees about the reader object behaviour. You can safely change only the contents of the buffer region not yet exposed to reader through reader.set_buffer(some_buffer) (i.e. the region outside of some_buffer never seen by reader).
Note

You’re free to pass larger buffers at will.

You’re also free to pass a buffer just as big as current token (i.e. token_size()). In other words, you’re free to shrink the buffer if the new buffer is at least as big as current token.

Tip

If you want to free the buffer while maintaining the reader object valid, just set the buffer to current token size, call next() and then set buffer to an empty buffer.

Do notice that this will consume current token as well. And as values are decoded directly from the buffer, this strategy is the only choice.

Example
reader.set_buffer(boost::asio::buffer(buffer, reader.token_size()));
reader.next();
reader.set_buffer(boost::asio::const_buffer());
buffer.clear();
size_type parsed_count() const

Returns the number of bytes parsed since set_buffer was last called.

Tip

You can use it to go away with the nparsed variable shown in the principles on parsing tutorial. I’m sorry about the “you must keep track of the number of discarded bytes” lie I told you before, but as one great explainer once told:

As I look upon you…​ it occurs to me that you may not have the necessary level of maturity to handle the truth.

— Scott Meyers
C++ and Beyond 2012: Universal References in C++11

That lie was useful to explain some core concepts behind this library.

4.2.20. syntax::chunk_size

#include <boost/http/syntax/chunk_size.hpp>
namespace syntax {

template<class CharT>
struct chunk_size {
    typedef basic_string_ref<CharT> view_type;

    BOOST_SCOPED_ENUM_DECLARE_BEGIN(result)
    {
        invalid,
        ok,
        overflow
    }
    BOOST_SCOPED_ENUM_DECLARE_END(result)

    static std::size_t match(view_type view);

    template<class Target>
    static result decode(view_type in, Target &out);
};

} // namespace syntax

4.2.21. syntax::content_length

#include <boost/http/syntax/content_length.hpp>
namespace syntax {

template<class CharT>
struct content_length {
    typedef basic_string_ref<CharT> view_type;

    BOOST_SCOPED_ENUM_DECLARE_BEGIN(result)
    {
        invalid,
        ok,
        overflow
    }
    BOOST_SCOPED_ENUM_DECLARE_END(result)

    template<class Target>
    static result decode(view_type in, Target &out);
};

} // namespace syntax

4.2.22. syntax::strict_crlf

#include <boost/http/syntax/crlf.hpp>
namespace syntax {

template<class CharT>
struct strict_crlf {
    typedef basic_string_ref<CharT> view_type;

    static std::size_t match(view_type view);
};

} // namespace syntax

4.2.23. syntax::liberal_crlf

#include <boost/http/syntax/crlf.hpp>
namespace syntax {

template<class CharT>
struct liberal_crlf {
    typedef basic_string_ref<CharT> view_type;

    BOOST_SCOPED_ENUM_DECLARE_BEGIN(result)
    {
        crlf,
        lf,
        insufficient_data,
        invalid_data,
    }
    BOOST_SCOPED_ENUM_DECLARE_END(result)

    static result match(view_type view);
};

} // namespace syntax

4.2.24. syntax::field_name

#include <boost/http/syntax/field_name.hpp>
namespace syntax {

template<class CharT>
struct field_name {
    typedef basic_string_ref<CharT> view_type;

    static std::size_t match(view_type view);
};

} // namespace syntax

4.2.25. syntax::left_trimmed_field_value

#include <boost/http/syntax/field_value.hpp>
namespace syntax {

template<class CharT>
struct left_trimmed_field_value {
    typedef basic_string_ref<CharT> view_type;

    static std::size_t match(view_type view);
};

} // namespace syntax

4.2.26. syntax::ows

#include <boost/http/syntax/ows.hpp>
namespace syntax {

template<class CharT>
struct ows {
    typedef basic_string_ref<CharT> view_type;

    static std::size_t match(view_type view);
};

} // namespace syntax

4.2.27. syntax::reason_phrase

#include <boost/http/syntax/reason_phrase.hpp>
namespace syntax {

template<class CharT>
struct reason_phrase {
    typedef basic_string_ref<CharT> view_type;

    static std::size_t match(view_type view);
};

} // namespace syntax

4.2.28. syntax::status_code

#include <boost/http/syntax/status_code.hpp>
namespace syntax {

template<class CharT>
struct status_code {
    typedef basic_string_ref<CharT> view_type;

    static std::size_t match(view_type view);

    static uint_least16_t decode(view_type view);
};

} // namespace syntax

4.2.29. header_value_any_of

#include <boost/http/algorithm/header/header_value_any_of.hpp>
template<class StringRef, class Predicate>
bool header_value_any_of(const StringRef &header_value, const Predicate &p)

Checks if unary predicate p returns true for at least one element from the comma-separated list defined by the header_value HTTP field value.

Note
This algorithm is liberal in what it accepts and it will skip invalid elements. An invalid element is a sequence, possibly empty, containing no other character than optional white space (i.e. '\x20' or '\t').
4.2.29.1. Template parameters
StringRef

It MUST fulfill the requirements of the StringRef concept (i.e. boost::basic_string_ref).

Predicate

A type whose instances are callable and have the following signature:

bool(StringRef)
4.2.29.2. Parameters
const StringRef &header_value

The HTTP field value.

const Predicate &p

The functor predicate that will be called for the elements found on the comma-separated list.

Optional white space (only at the beginning and at the end) is trimmed before applying the element to p.

4.2.29.3. Return value

true if the p returns true for at least one element from the list and false otherwise. This also means that you’ll get the return value false for empty lists.

4.2.31. <boost/http/algorithm/header/header_value_any_of.hpp>

Import the following symbols:

4.2.32. <boost/http/reader/request.hpp>

Import the following symbols:

4.2.33. <boost/http/reader/response.hpp>

Import the following symbols:

4.2.34. <boost/http/syntax/chunk_size.hpp>

Import the following symbols:

4.2.35. <boost/http/syntax/content_length.hpp>

Import the following symbols:

4.2.36. <boost/http/syntax/crlf.hpp>

Import the following symbols:

4.2.37. <boost/http/syntax/field_name.hpp>

Import the following symbols:

4.2.38. <boost/http/syntax/field_value.hpp>

Import the following symbols:

4.2.39. <boost/http/syntax/ows.hpp>

Import the following symbols:

4.2.40. <boost/http/syntax/reason_phrase.hpp>

Import the following symbols:

4.2.41. <boost/http/syntax/status_code.hpp>

Import the following symbols:


1. parser combinators algorithms are on the way to ease it up even further. Check the talk from Scott Wlaschin for more.
2. The larger explanation being: incomplete tokens are kept in buffer, so you are not required to allocate them on a secondary buffer, but you’re still allowed to mutate/move/grow the buffer (as opposed to…​ say…​ some C++ iterators that would get invalidated once the container changes).
3. parsing is done one token at a time no matter how much data is buffered
4. Proper extraction of header fields (some popular parsers like NodeJS’s will force you to know protocol details and parser internals to manually remove leading and trailing whitespace from field values). This point is not only about header field extraction, but really about a parser that indeed abstract and understand the protocol and that is easy to use right by people that don’t know HTTP.
5. for tests
6. For the documentation.
7. If I continue to develop the Boost.Http’s message framework, that’s the solution that will be adopted to this particular problem.
8. Consult Occam’s razor.