Observability is not one size fits all

Roles for data in your application development and infrastructure. By Iain Chidgey, Vice President EMEA, Sumo Logic

Friday, 15th October 2021 Posted 4 years ago in by Phil Alsop

Observability has become more important for developers and for enterprise IT teams. More companies have adopted cloud infrastructure and microservices to create their applications in order to deliver updates faster and be flexible around digital services. However, these shifts on the architecture and design sides have an impact on how to keep up with what is going on within those applications.

Observability takes data outputs from an application, and then uses that data to understand how the application is performing. The standard definition for observability today is that developers combine their application logs, metrics and tracing in order to get the full picture of application performance. However, it is not as simple as just having sets of data coming in.

Instead, it’s worth looking at how you can improve your use of observability data to achieve your goals. This can be more specific than simply gathering data over time, then using it when something goes wrong. You can improve this process by setting up more specific goals, based on the data you have coming in continuously from your applications and cloud infrastructure.

Getting your goals in place

The first area for observability is around application development and reliability. Using data from your applications’ continuous integration and continuous deployment (CI/CD) pipelines, you should be able to see how quickly you are rolling out updates and any problems that come up over time. However, this is still a reactive process. Instead, how can you use that data to spot potential problems - or ways to improve - in advance of a problem coming up?

One of the big challenges for many application development teams and site reliability engineers is, paradoxically, not when something goes down. After all, getting to the root cause here should be easier because the broken component should be obvious - for example, it could be DNS, a faulty change that should be rolled back, a cloud service that is not available, or a network connectivity problem.

However, some of the hardest problems to deal with are not so black and white. Instead, services can degrade over time, working but not at the level expected. To solve this involves some more preparations, based on setting up the right metrics for each group of components. Once these metrics are in place, you can use your observability data to track that performance over time and then be more proactive on fixing problems.

Another area for a more specialist approach around observability is around Kubernetes. More developers have turned to containers to host their microservices applications, and Kubernetes is the de facto standard for managing those containers. For observability, getting data out of your Kubernetes and containers will help show how those applications are performing. This involves bringing sources of data together including Prometheus, FluentD, Fluentbit, and Falco, or using a framework like OpenTelemetry.

With this data, you can connect any microservice performance issues and errors directly to user experience and then make the right changes. To achieve this, it is essential to understand end-to-end user transactions, uncover latency issues and see which services are impacted. Distributed Transaction tracing provides the telemetry to connect the monitoring of key performance indicators to the real experience of your users.

Getting a full picture

One area that is growing in importance for companies is how to track multi-cloud deployments. Multi-cloud can mean different things to different organisations - from departments choosing their own cloud providers to meet specific needs, through to companies planning their expansion across multiple markets with the most appropriate partner in a country or region, getting data from each cloud provider is only part of the story. Alongside this raw data, it’s essential to work on how you normalise this data and get it into one place for tracking.

While there are many cloud services that are either compatible with each other or broadly similar in what they provide - for example, commodity services like block storage or compute - there will always be some differences in how the cloud providers operate, and there will be specific tools available that provide alternative services. Comparing information across cloud providers and getting consistent insight into performance is therefore a task that observability data can support.

Similarly, software development teams can get more value out of their observability data. Like the old tale of the cobbler, software development teams can be better at providing data for others to use before they think about applying data to their own processes. Instead, data from software development tools like Jenkins, JIRA, GitHub, BitBucket, OpsGenie, PagerDuty and others can be evaluated alongside cloud and application component data. Rather than looking at this data solely for alerting or performance, you can use the data

from the CI/CD pipeline to look for opportunities to optimise the whole process. This is a developing area as application and software teams create multiple pipelines to process their work through the organisation – for enterprises, having tens or even hundreds of different CI/CD pipelines in place is not uncommon. Consolidating data from those pipelines can then show you how well you are currently performing, and benchmark data can be used to spot opportunities to improve performance at the team level.

The last area for improvement is around edge and web observability, where we can demonstrate all the hard work that has gone into delivering better services. As an example, when we order something from an online retailer, we don’t consider all the moving parts that go into our purchase - we simply look at whether the service works, and the retailer takes on all the responsibility. When something fails, customers don’t think about the third parties involved.

As we create our applications or services, how are they delivered to customers and how can we improve the delivery of that content? How can we ensure that all that good work internally then gets to the customer at speed and avoids potential performance problems? To get this insight, we can use observability data from the content delivery networks and cloud providers that support the delivery of those services. By seeing how services are experienced by customers, we can spot potential problems outside the application itself and take steps to improve.

Observability data holds up a mirror to our systems, and helps us understand what is going on based on the outputs delivered. However, there are more opportunities to make more of the data that we have coming through across our operations in multiple areas, from business or software development processes through to areas like security. By looking at the full picture and consolidating all our data together, observability can help us improve across the organisation as a whole.