Out of the warehouse, past the lake and into the future: A modern approach to analytics

By Justin Borgman, CEO at Starburst Data.

  • Friday, 23rd July 2021 Posted 3 years ago in by Phil Alsop

We are in the midst of a data revolution. Never has there been a time where humans are spending so much time online, with some studies suggesting internet usage has doubled in the UK in the past year – with help from the pandemic, of course. The behaviours and habits that we’ve adopted online during this time have not only sparked huge changes in how businesses operate, but it’s also had a huge impact on how businesses collect, store, and analyse their data.

Retailers, for example, may have previously relied on brick-and-mortar sales and footfall data to make decisions about when and where to launch new products. But since the pandemic closed the doors for many physical stores, those models of behaviour, machine learning patterns, and predictive analysis models have been rendered irrelevant. Instead, retailers have had to adapt. Some may have accumulated a whole new set of data as they adapted to the online world, while others may have moved their data around, requiring a refresh of analytical tools to keep on top of business demands.

It should go without saying that analytics is data’s killer app – but the journey to truly impactful insights is still a challenge for most organisations. In the past, organisations tried to move all of their data to one structured repository, analysing data in a neat, tidy container. But in reality, it doesn’t quite work like this and the pandemic has added extra hurdles that industries were simply not prepared for. For most organisations, some data is in the cloud, some on-prem, with most of it in different geographical locations – our survey in collaboration with Red Hat, surveying IT decisions makers and data management professionals across the globe confirms that data remains significantly distributed, with half (49%) of enterprises storing data in five or more data storage platforms and 54% claiming that they will have data on more than five platforms in the next year.

Despite the pandemic shifting how companies store, manage and think about their data, pre-pandemic data isn’t entirely irrelevant, but it’s almost useless to analyse without the right access to new data which must be collected and stored appropriately to be able to apply analytics. This is a process that can potentially take months, and with enterprises having significantly less time to gain insights on their data before it’s outdated, every second between querying data and gaining insights counts.

The warehouse v lake conundrum

For a long time, it was believed that effective, fast analytics required having data stored in a single place – otherwise known as the ‘single source of truth’. A network of complex, and often brittle, pipelines would move data from source systems and create new copies of that data within this central enterprise-wide data warehouse. That all sounded good on paper and, of course, suited companies like Oracle and Teradata who naturally benefited from the lock-in associated with having everyone’s data captive in their proprietary systems. Over time, vendor lock-in leads to rising prices – a potentially expensive solution for companies during this uncertain economy.

That model is not only extremely expensive, it’s also much slower than it looks. Time to insight is not a measure of query response time, it’s a measure of the entire end to end process. Traditional data warehousing approaches rely on cumbersome extract, transform, and load processes that can take as long as 18 months to gather and format all of the data required before the first analytic queries can even be performed.

We’ve now reached a point where accessing a single source of truth for data has become virtually impossible because of the sheer volume, variety, and velocity of data being generated. This poses a real problem for organisations that need fast data access now more than ever – a key finding from our survey found that 58% of respondents said that data access has become more critical throughout the pandemic.

Enter, X Analytics: a modern approach to data analytics

The next step in the evolution of data and analytics is what’s known as X Analytics. Coined by Gartner, X Analytics refers to the ability to run analytics on a wide range of structured and unstructured data, combining new data sources with existing or core data. In essence, it is a key that unlocks the doors to all your data, wherever it might be stored. This gives business intelligence analysts and data scientists a better overview of their data for analysis, allowing them to understand how behaviour has changed and what patterns have remained, meaning that faster decisions can be made to drive new business.

Rather than focusing on a single source of truth of data, X Analytics is about having a single point of access for all your data. Fresh data and fast insights – two key aspects that have been lifelines in the pandemic for analysts and data scientists – are what X Analytics is all about. It bridges traditional, long-winded approaches of accessing data and instead implements query layers that function as a single entry point, meaning that data can be accessed quickly from anywhere.

How EMIS is leveraging X Analytics

Many organisations are already benefitting from X Analytics, and this new way of accessing data. One organisation that knows all too well the challenges of the pandemic and the need to access critical data quickly is EMIS Health. As a technology provider for the NHS, responsible for providing patient information to local pharmacies, EMIS knew that it needed to adapt in order to meet demands throughout the pandemic.

Given that clinical data is continuously streamed into a cloud data lake, EMIS needed to modernise its data architecture to work on accessing it from anywhere and as a single query point. Key to this was creating optionality for their data by separating compute and storage so that compute could be scaled up and then a specialised, portable layer for data consumption was created that could be used across EMIS data sources, based on the fact that data will be moving around. Eventually this evolved as the EMIS-X Analytics suite, allowing users to run BI analytics in almost real-time, without the need to copy and analyse data which could take several weeks.

During the pandemic, EMIS used its X Analytics platform to maintain an online visualisation delineating the most common reported symptoms of COVID-19 that updates daily as new self-reported patient data streams into its cloud data lake.

Getting X-cited for the future

Whilst we’ve weathered the immediate threat of the pandemic, the economy remains in an unstable position and the next challenge for organisations will be to build back better. During this time, data will be integral and now is the opportunity for organisation to capitalise on the insights they have accumulated throughout the pandemic. In fact, businesses that efficiently mine all of their data for insights will be the ones that adapt and evolve best. With X Analytics, businesses will have the capabilities to not only make faster decisions for their business today, but also to survive and thrive in the future.