Accelerating MLOps requires a mindset shift

There aren’t any hotter topics in the software development world than machine learning operations, or MLOps for short. By Kai Hilton-Jones, Senior Director of Solutions Engineering EMEA, GitHub.

Tuesday, 16th March 2021 Posted 4 years ago in by Phil Alsop

According to leading analysts the market for MLOps is set to be worth as much as $4 billion by 2024 in the US alone. And it was one of just nine growth areas to make the cut in Deloitte’s Tech Trends 2021 report.

But let’s back up a minute. What exactly is MLOps? And more importantly, why do we need it?

Understanding the need for a new approach to developing machine learning models

AI, while still arguably a nascent technology, is maturing fast. Enterprises now recognise the power of machine learning and its ability to give meaning to operational data, helping generate insights, predict trends and, ultimately, provide a competitive advantage. But operationalising ML can be challenging. Development and deployment of models is often done in disconnected silos, and typically happens slowly. So slowly, in fact, that IDC research found that more than one-quarter of AI/ML projects fail - with lack of necessary skills and integrated development environments reported as major contributory factors.

Enterprises are waking up to the need to find a way to incorporate ML model development into their DevOps processes, ensuring collaboration between data scientists and app developers to help bring ML models to production more reliably - and much faster.

When you break it down into such simple terms, it sounds straightforward to implement. The industry has already made major strides in institutionalising DevOps over the past two decades, so adding MLOps into the mix should be simple, right?

In practice, things are more complex. Taking MLOps from concept to reality requires a keen understanding of the mindset and tools required to make it a success. And that requires understanding why MLOps is a discipline in its own right and not a bolt-on to DevOps.

The difference between MLOps and DevOps

The bottom line is that the kinds of problems developers face in machine learning are fundamentally different to the problems they face in traditional software coding. It’s not enough to port your continuous integration and continuous deployment (CI/CD) and infrastructure code to machine learning workflows and call it done.

Functional issues, like race conditions, infinite loops, and buffer overflows, don’t come into play with machine learning models. Instead, errors in machine learning development tend to come from edge cases, lack of data coverage, adversarial assault on the logic of a model, or overfitting. Edge cases are the reason so many organisations are racing to build AI Red Teams to diagnose problems before things go horribly wrong.

Ultimately MLOps and DevOps are solving a different set of challenges. In the last few years, DevOps has gradually shifted away from a culture of CI/CD and towards Git-based techniques to manage software deployments. by using Git as a single source of truth, developers can build robust declarative apps and infrastructure with ease. This transition has made software less error prone, more scalable, and has increased collaboration by making DevOps more developer-centric. But when it comes to machine learning applications, DevOps gets much more complex.

Machine learning development and traditional software development have many similarities. But the differences - which are often subtle - are enough to throw a development team completely off track if they rigidly stick to a standard software development process.

Because MLOps is still emerging, data scientists are often forced to implement tools that support development from scratch. Many DevOps tools are generic and require the implementation of “ML awareness” through custom code. Furthermore, these platforms often require disparate tools that are decoupled from your code leading to poor debugging and reproducibility.

To illustrate the degree of complexity and unique requirements in machine learning development, here is a snapshot of some of the specific challenges MLOps need to solve:

● Model reproducibility and versioning

○ Track, snapshot and manage assets used to create the model

○ Enable collaboration and sharing of ML pipelines

● Model auditability and explainability

○ Maintain asset integrity and persist access control logs

○ Certify model behaviour meets regulatory and adversarial standards

● Model packaging and validation

○ Support model portability across a variety of platforms

○ Certify model performance meets functional and latency requirements

● Model deployment and monitoring

○ Release models with confidence

○ Monitor and know when to retrain by analysing signals such as data drift

Addressing all of those challenges in MLOps hinges on a new approach to an enterprise’s most valuable IT asset: data.

So what should MLOps look like?

A brand new set of tools is required to bridge that gap between traditional DevOps and MLOps. Data is the critical factor. Effective machine learning development requires versioning data and datasets in tandem with the code. In turn, that means we need tools that are bespoke and unique to the many challenges of developing machine learning models at scale.

In its simplest terms, DevOps covers the collaboration between development and operations disciplines to streamline software delivery, while MLOps smoothly integrates data science and data engineering into existing DevOps processes.

How is MLOps operationalised?

Buying the tools you need for MLOps is the easy part. There are plenty of tools available that can help teams to track, version, audit, certify and re-use every asset in your machine learning lifecycle and provide orchestration services to streamline management of the whole process. And GitHub Actions can go even further by integrating parts of the data science and machine learning workflow with a software development workflow.

But just like DevOps, MLOps is not a product that can be bought and installed. It’s a set of processes that need to be agreed, nurtured and constantly evaluated and tweaked.

The harder part is developing a supportive culture for MLOps that recognises and celebrates its uniqueness as well as its close ties with DevOps. Putting in place the understanding of why MLOps is different and institutionalising that ethos is more than half the battle. It takes training, a commitment to reworking existing processes and a fixation with what successful MLOps delivers: fast and reliable models that deliver business value.

And there’s another upside that can’t be underestimated. Collaboration between data science and operations engineers not only eliminates inefficiencies through automation, it also frees up developers and data scientists to spend more time doing what they love most. And that can only be good for the future of open source innovation.