Metrics that matter for DevOps success

By Jori Ramakers, Director of CX Strategy, Tricentis.

  • Tuesday, 26th September 2023 Posted 1 year ago in by Phil Alsop

Google Cloud’s DevOps Research and Assessment (DORA) metrics have become elusive gold standards for companies striving for ever-higher performance across the continuous integration (CI) and continuous deployment (CD) pipeline. In under a decade, these metrics have revolutionised the way DevOps teams operate, helping them understand baseline performance and the biggest opportunities for improvement.  The markers for DevOps success are therefore well recognised; organisations need to demonstrate rapid release cycles, high rates of success among software deployments, low defect percentages, and minimal downtime in the event of an outage.

However, despite knowing where they want to get to, for many quality assurance (QA) leaders, the knowledge about exactly how to take their organisation to the next level remains elusive. DevOps is a team sport, and each functional area must examine and understand their role in improving performance across each broader metric if they want to improve their outcomes.  In this article, we’ll look at each of the DORA metrics and break down how QA leaders can support meeting each of these criteria. 

#1: Deployment frequency

As you progress towards DevOps maturity, you are likely implementing CI/CD best practices, working with smaller changes, and reducing technical debt to increase the number of times your team deploys changes to production each day.

If you are able to test changes quickly, you offer your team valuable, actional feedback that they can incorporate in near-real time, ultimately enabling the delivery of higher quality software, faster. The quicker your developers receive timely feedback about the quality of their code, the faster they can fix those bugs and deploy.

QA teams should build towards continuous, automated testing at every stage of the CI/ CD pipeline, working with developers to understand what is covered by earlier unit and integration testing and establish which functional tests are needed. You should ensure that you have visibility into all testing and can track the overall test plan and results against requirements. Together, these steps will improve your team’s confidence in pushing smaller, more frequent changes into production.

#2: Lead time for changes

While deployment frequency measures the cadence of new code being deployed to production, lead time for changes measures how quickly that code is deployed. It is the best way to understand the velocity of software delivery, which is critical for high-change DevOps environments. Shorter lead times for changes ensure that businesses can make critical changes and deliver new features to their customers faster.

By centrally managing and orchestrating all testing activities, QA teams can support this. Including developers’ contributions as well as functional and performance testing that occurs across the CI/CD pipeline can optimise testing efficiency, create faster feedback loops, and ultimately improve both release confidence and velocity. Teams can simplify this process by implementing test management that pulls in results across automation tools and integrates with requirements and issue tracking software.

#3: Change failure rate

There is a clear parallel between detecting potential failures before they reach production and an improvement against this metric. In other words, an ounce of prevention is worth a pound of cure. Detecting critical defects in your team’s code before they’re pushed to production minimises the need for hotfixes, rollbacks, fix forwards, and patches.

By implementing test management, your QA teams can trace results of tests executed in the CI/CD pipeline back to requirements, ensuring sufficient test coverage and minimising the risk of failure. It’s nearly impossible to achieve 100% coverage, so focus your test plan on high-value, high-impact areas.

But why track all these “coverage” metrics? Suppose your team has achieved 100% automation, but with one click, the site goes down: you didn’t have 100% coverage, you just automated 100% of an insufficient test suite. To truly understand coverage, teams should collectively assess multiple coverage metrics.

#4: Mean time to restore service (MTTR)

Many companies struggle to have a clear definition of “outage” because frequently, partial outages may or may not be counted. Per Atlassian, MTTR should include both the repair time and any testing time recorded in restoring service. In other words, the clock doesn’t stop on this metric until the system is fully functional again. If you’re tracking the previous three DORA metrics listed above, you are already contributing to MTTR by ensuring that you’ve minimised the chance of outage-causing defects slipping into production, improving testing velocity, and establishing fast feedback loops that will streamline any testing that’s required as part of the repair process.

It may be difficult to track granular QA metrics that support MTTR. However, reporting and analytics are critical to drive high performance against this DORA metric, which will enable site reliability engineering (SRE) teams and ops teams with quickly identified defects affecting production deployments, and provide a valuable input to QA teams as they continually optimise automated testing for faster deployments.

To maintain high visibility into your test results within CI pipelines, implement a test management tool that analyses quality and release readiness and shares them across the CI/CD pipeline through robust reporting and integration capabilities. Through these integrations, your team can automatically share defects and code changes across QA and developer tools. Consistent and timely feedback can both reduce higher-impact defect leakage and minimise MTTR when defects do occur.

What sets QA teams at high-performing DevOps organisations apart?

The highest-performing DevOps organisations view quality as a shared responsibility. Their QA teams no longer work in their own silo, but set their goals in terms of larger DevOps success, and collaborate closely with development, compliance, performance, and site reliability engineering / operations teams. 

These teams have implemented continuous testing practices, run tests and delivered feedback as early in the CI/CD process as possible, increased release confidence, and reduced production defects. Gaining data-driven insights helps them deliver a more effective test automation strategy; automating individual tests isn’t enough. Continuous testing for DevOps comes through careful planning, scaling, and orchestration.

Reaching the highest performance level isn’t just about improving your internal efficiencies though; it’s about the knock-on effects, like giving stakeholders more confidence in your team’s workflows, and in overall release readiness, ultimately powering continuous improvement across the software delivery pipeline. In this way, QA plays a critical leading role in improving your performance against the DORA metrics. Implementing a continuous testing strategy that supports DevOps success is a significant undertaking, but when done right, delivers a remarkable payoff.