From smart grids to personal monitoring, more and more devices are being developed with network or WiFi connectivity and being expected to deliver data to more powerful machines to process, compute and understand. One of the areas messaging is growing in importance is in the distribution of sensor data.
The proliferation of mobile phones has long since overtaken that of traditional PCs or servers in being the fastest growing segment of networked devices. In many emerging markets, mobile internet access is the sole version available to its users, and everywhere there is pressure to communicate in a timely and effective way to devices which are constantly changing location and often on a limited quality of connection with high latency.
“The Numbers Are Really Big · Insane, I mean. The billion-plus phones sold per year. The number of active subscriptions, which is greater than half of the human population. The number of new Android devices that check in with Google every day. The line-ups outside Apple stores for every new iOS device. The hundreds of thousands of apps. The ridiculous number of new ones that flow into Android Market every day. Everywhere I look, I see something astounding.” - Tim Bray1
Some broker systems have been specifically designed to address these types of devices, such as IBMs Worklight, which targets many different platforms, and extends their Message Broker product. For specific mobile operating systems, millions of GCM (Google Cloud Messaging) and APNS (Apple Push Notification Service) notifications are sent from app developers to users of Android and iOS systems every day, each constituting a simple message.
“Each device establishes an accredited and encrypted IP connection with the service and receives notifications over this persistent connection. If a notification for an application arrives when that application is not running, the device alerts the user that the application has data waiting for it.” - Apple Developer Documentation
This messaging however is still primarily human-targeted: the events will be eventually consumed by a person individually in many cases. For real volume, the interesting area to watch is pure machine-to-machine communication.
The Internet Of Things
Back in the late 90s, Andy Stanford-Clark of IBM and Arlen Nipper of Arcom developed MQTT the MQ Telemetry Transport. The work was originally done to enable communication for industrial control systems for the oil and gas industry, where sensors were often located in challenging environments, and whatever ran on them needed to be resilient, light-weight and bandwidth efficient. By using a pub/sub pattern with a broker, the team were able to make a very flexible system, but one that could withstand the limitations of the environment and the constraints of the hardware it had to run on at the data producing endpoints.
The protocol uses TCP as its transport, but limits its own headers to just a few bytes in most cases, with minimal network chatter to preserve the resources of small devices. It also has an interesting feature called “Last Will and Testament” that allows a device to register a message with the broker which will be sent if the heartbeat from the device disappears.
These ideas fit nicely into IBM’s Smarter Planet initiative, on which Stanford-Clark was working. This tied into the growing need for services for intelligent power grids, medical devices such as pacemakers, systems that run on mobile phones, and a variety of other smaller gadgets, which needed appropriately sized solutions.
This type of usage goes towards an idea referred to as the internet of things - where ambient devices provide new data based on things that happen, or are sensed, in the real world. Something like intelligent power monitoring, where devices measure the electricity usage at various points through a home and report to a central server, can be a huge win for the home owner, but only if it’s relatively easy for them to review. This means the data needs to be collected - exactly the type of thing MQTT was designed for.
Even where the resources are somewhat less constrained, it can make sense. In 2011, Facebook released Facebook Messenger, a mobile app based around Facebook’s chat functionality. They chose to use MQTT rather than a more straightforward HTTP poll.
“With just a few weeks until launch, we ended up building a new mechanism that maintains a persistent connection to our servers. To do this without killing battery life, we used a protocol called MQTT that we had experimented with in Beluga. MQTT is specifically designed for applications like sending telemetry data to and from space probes, so it is designed to use bandwidth and batteries sparingly. By maintaining an MQTT connection and routing messages through our chat pipeline, we were able to often achieve phone-to-phone delivery in the hundreds of milliseconds, rather than multiple seconds.”
- Lucy Zhang2
By designing a protocol specifically for this type of information, IBM and the MQTT contributors bought messaging into scope for many more devices, which made it easier to design and develop these type of systems.
Other protocols have seen success in sensor data and machine to machine (M2M) communication as well, from XMPP to ZeroMQ and AMQP. Smith Electric3, manufacturer of electric vehicles, used a cloud based AMQP broker, StormMQ, to handle AMQP traffic from the vehicles it had sold all over the world, reporting tens of thousands of sensor readings a second from their onboard telemetry systems. By pushing this data as messages into a messaging system capable of routing and flexible delivery, Smith Electric gained the ability not only to consume the feed for their own analytics and dashboards, but to expose it directly to customers who perform their own monitoring and analysis.
This usage of machine to machine communication could vastly outstrip the levels of data generated by social networks and financial systems. Sensors deployed to retrieve this type of information can generate an incredible amount of information, and processing that information requires systems that can work with the data in motion, not just on batches of stored data.
“[A] Boeing jet generates 10 terabytes of information per engine every 30 minutes of flight, according to Stephen Brobst, the CTO of Teradata. So for a single six-hour, cross-country flight from New York to Los Angeles on a twin-engine Boeing 737 — the plane used by many carriers on this route — the total amount of data generated would be a massive 240 terabytes of data.”4
By using stream operators such as aggregates, filters, combinators and mappings over the messages, intermediate processors can begin to simplify their communication and allow developers to work in terms of events rather than streams. A sensor doesn’t have to continuously report changes every second, but can trigger messages only when certain conditions are met. This type of functionality both minimises power use - as speaking to wireless networks can be very costly in terms of energy - but also allows more complex and intelligent applications to be built.
Tim Bray, Mobile Market Share - http://www.tbray.org/ongoing/When/201x/2010/07/30/Mobile-Market-Share ↩
Lucy Zhang, Building Facebook Messenger - https://www.facebook.com/notes/facebook-engineering/building-facebook-messenger/10150259350998920 ↩