Finding the perfect IoT database

By Ben Slater, Chief Product Officer at Instaclustr.

  • Saturday, 4th July 2020 Posted 4 years ago in by Phil Alsop

Depending on which forecasts you prefer, the number of IoT devices in use by 2025 will range anywhere between a modest 41.6bn to a whopping 75bn – a vast increase from the 14.2bn connected devices in use last year. Even at the lower end of such estimates, the IoT is expected to generate nearly 80 zettabytes of data per year. This tidal wave of information poses several data management challenges for firms looking to gain value from their IoT networks.

In particular, given the specific nature of the data that IoT sensors generate, firms need databases that are tailored to match in terms of scalability, flexibility, and availability. Scalability, in particular, cost effective scalability has proved extremely tricky for many commercial database solutions. This, coupled with poor experiences driven by vendor lock-in, has resulted in an increasing number of businesses turning to open source instead. However, with so many options, it can be hard to know which is best. For IT managers, the first step is to critically reassess their database needs and identify how the architecture IoT databases need to diverge from more traditional business operations.

ACID test

The reason that IoT databases need to be architected differently is that sensor data is very different from human-generated data that businesses are more used to dealing with. While human-generated data will often rise and fall unexpectedly (for instance, social media usage has spiked hugely in the last month), IoT data is typically far more of a steady stream, which allows for easier capacity planning.

More importantly, a key advantage of IoT sensor data is that it doesn’t involve the same type of complex data transactions as traditional enterprise business applications. This means key data transactions protocols known as ACID (atomicity, consistency, isolation, and durability) do not need to be applied in the same way they would for, say, an online banking transaction.

For example, when transferring money online a typical design pattern is for multiple tables to be updated as a single database transaction, which either succeeds or fails. In contrast, for IoT data, data can typically be represented in a single table (device ID, metric name, timestamp and value) and only ever needs to be inserted (and potentially expired) rather than requiring updates and deletes. Databases designed around these relatively simple data structures and usage patterns can provide significantly greater scalability, reliability and cost effectiveness than databases required to implement full ACID transactions and SQL querying.

Four principles

Thanks to the phenomenal rate at which the IoT is expanding, IoT databases need to be designed with four core criteria in mind:

·         Flexible: Having a database that is as flexible as possible is essential. Companies will most likely find that they need a solution which can accommodate a range of different data types and structures without having fixed or predefined schemas in place – especially as the type of data being collected can shift significantly over time. However, while still requiring flexibility, the days of choosing a single “jack of all trades” database for an entire enterprise are also behind us.

·         Available: Given the flood of data needing to be ingested, it’s critical that databases avoid downtime wherever possible. This means designing a system without a single point of failure. For further protection, a distributed messaging system – such as Apache Kafka – is available to store data until the database can process the backlog or additional nodes are added to the cluster.

·         Scalable: Probably the single most important consideration for any IoT database. Companies need to know that, as their network grows, they don’t suddenly find themselves with a database unable to keep up as this can have huge and unpleasant impacts on business operations.

·         Cheap writes: Finally, databases need to offer cheap writes. It’s common for IoT data to be stored and only read once, if at all. Therefore, its important for write capacity to  be as cost-effective as possible and for the database to be optimised for high write throughput.

Prioritising these four requirements will inevitably lead towards Open Source solutions which have been built with these specific qualities in mind. For example, if we look at the masterless architecture of Apache Cassandra, is has been purposely designed so that nodes can be added and subbed seamlessly, allowing it to easily be scaled up to accommodate increased volume with zero downtime. Additionally, the distributed architecture of Cassandra means there is no single point of failure. It is 100% available regardless of whether a rack – or even an entire data centre – fails. The database is protected and can continue to run smoothly.

A perfect partnership

The spectacular growth in IoT devices, and the flood of data they will generate, are set to render many of the proprietary database solutions obsolete. In the face of a tidal wave of information, it’s essential for companies to critically reassess their database selection criteria and pick a solution which is tailored to meet the unique challenges posed by the IoT. Adopting open source is the best way to ensure that firms have the requisite flexibility, availability, scalability, and fault tolerance to cope, while also avoiding the nasty vendor lock-in surprises that can accompany commercial options.