TLS and ZeroMQ

· September 16, 2012

It’s pretty straightforward to use synchronous encryption over ZeroMQ - just a case of encrypting and decrypting at each end with some previously shared key. Asynchronous encryption is a bit more interesting, as it allows signing for message integrity and authenticity, as well as data hiding. There have been some good examples of crypto over Pub/Sub (notably Salt), but not a lot of examples of direct messaging.

The de-facto library for this sort of work is OpenSSL, but this has a couple of problems. The first is that usually openssl manages the TCP connection itself, which could be an option for some ZeroMQ cases, but doesn’t fit if the user wanted to use a different transport, or an unusual topology. TLS or SSL also require a handshake at the start of the communication, which means we may have to send messages back and forth without there being any application data.

For the first part, OpenSSL includes support for usage as a filter thanks to it’s BIO IO abstraction layer. Memory BIOs allow storing the data that would be written to or read from a network so that the sending and receving can be handled elsewhere. Bert JW Regeer has previously blogged about using OpenSSL in an evented environment with the model, which I thought was a great base for use with ZeroMQ. Below, and in a Github repo, I’ve built an example of pushing encrypted messages between two applications using ZeroMQ and OpenSSL with memory bios.

As a quick note, for this example I generated a self-signed certificate to use for the communication:

openssl genrsa -des3 -out server.key 1024
openssl req -new -key server.key -out server.csr
openssl rsa -in server.key.org -out server.key
openssl x509 -req -days 365 -in server.csr -signkey server.key -out server.crt

The code consists of a client, a server, and a class that handles generic TLS over ZeroMQ. The client code runs in a loop as we will need to send and receive as part of the handshake process. We push application data to our TLSZMQ object, and check whether it needs to write data to the network - in our case as ZeroMQ message - or whether there is an application data to process. When we receive replies via ZeroMQ, we push that into the object. In this case we’re just sending a ‘hello world’ message and printing the result.

The server code is slightly more complicated, as we have to initialise with our certificate details, and we want to be able to support multiple clients. As we are using a ROUTER socket, we can take the identity out of the message parts before the delimiter, and use the furthest back as the connection identifier. This means we’re encrypting between client -> server, even if it’s client (ssl) -> hop -> hop -> server (ssl). That said, I suspect a large number of uses of this kind of encryption will actually be going over an inner hop, with the rest unencrypted on a private network, e.g. client -> hop (ssl) -> hop (ssl) -> server.

Each identity gets a new TLSZMQ object, which is stored in a std::map keyed agains the identity. Each message that comes in we push to the appropriate TLSZMQ object (creating one if we have a new connection), then checking whether we can recv application data or whether the object needs to write to the network, exactly as with the client.

Finally, the meat of the work is in the TLSZMQ class. This class is a bit longer, so it’s worth breaking it down a little. We start of with the constructors. We use two - one for clients, one for the servers. The differences are which connection methods we use - SSLv3_client_method or SSLv3_server_method (we could also use TLSv1), and then importantly we set the state. SSL_set_connect_state tells the library to reach out to a server to establish a connect, SSL_set_accept_state instructs it to expect an inbound connection. Of course, as we are using ZeroMQ we can connect or bind and start services in any order.

The constructor calls the init functions, which setup the OpenSSL library. It’s split into two parts as we need to attach the certificates to the context in the server version - note that we should be just creating a context once per program initialisation, but in this case I was a bit lazy! The first section just inits the general library and loads error strings, before creating a context with the passed in method. The second section creates the BIO i/o abstractions, using the mem BIO type that allows us to treat use it as a filter. We use the SSL_set_bio function to instruct the library to use them.

The main update loop is ticked at various points by the client and server code. This addresses the communication with the SSL functionality via the BIO. We have four variables we’re using to push data in and out - from the app to the library, and from the the library to zeromq. In the update loop we check for network data (e.g. data from the other side of the SSL connection) and BIO_write it, which pushes it into memory for use. If there is data from the application to be encrypted and transmitted we push it in with SSL_write. Then we call the netread and netwrite functions which handle the other parts.

Net_write_ and net_read_ work pretty much the same we - we use a buffer and read information from either the memory BIO (destined to be sent over ZeroMQ) or from the SSL (destined for the application). We loop over all the sections of the data, 1k at a time, and push it into a ZeroMQ message after ready for sending.

As part of that, we check any error messages. If we get a WANT_READ, or a NONE error we just continue. We’ll hit these, for example, when we first try and write application data when we haven’t completed the handshake.

Finally, we have a few functions we allow pushing data into and pulling it out of the object.

When we run these, there’s enough debug output in to show the handshake. If we look at the output, we can see the -1s from the application data failing to write, and the read and writes from the BIO as the handshake messages go between client and server. The “12” written below is the application message, and the 90 is the encrypted “Got it!”

DEBUG: -1 written to SSL
DEBUG: 95 read from BIO
DEBUG: 627 written to BIO
DEBUG: -1 written to SSL
DEBUG: 228 read from BIO
DEBUG: -1 written to SSL
DEBUG: 91 written to BIO
DEBUG: 12 written to SSL
DEBUG: 90 read from BIO
DEBUG: 90 written to BIO
Received: Got it!

If we run the server, we see the other side.

DEBUG: 95 written to BIO
DEBUG: 627 read from BIO
DEBUG: 228 written to BIO
DEBUG: 91 read from BIO
DEBUG: 90 written to BIO
Received: hello world!
DEBUG: 8 written to SSL
DEBUG: 90 read from BIO

The code is a bit of a quick fix, and it doesn’t handle multi-part messages particularly well. How that should work is likely to be an app-specific decision, but as a starting point just returning some sort of array of decoded parts would be a good start! Hopefully this will give anyone looking to implement something more robust a few pointers! The code is up on github.