Let’s assume that your organisation’s data team has just built a machine learning (ML) model, which delivers a high level of accuracy, but, what value does it really have? The truth is that currently, it has none. There is no commercial value in an ML model that is not used in production to help make business decisions. This might be a fraud detection or mortgage lending model in the financial services sector or a customer acquisition model in the telecoms industry, but whatever the sector or model, the business needs to access, visualise and use the data swiftly.
How can data scientists cover this “last mile” to that elusive end zone, named production? How can you efficiently optimise the ML model that’s been built? For many, this proves to be a significant challenge, and there are many routes to deployment. Four key considerations can determine what path to take to deployment. These are: Data Sources; Success Measurement; Bringing the Model to Production and External Limitations
Data Sources:
Some of the questions that need asking are: Will the same data within the same structure be available in production and what model is used to generate the predictions? For example, models built before this current pandemic lockdown, will need to be significantly updated, before they are made available in production, given the significant changes to market conditions.
Is the data source and prediction destination reachable from the machines where it is ingested by the ML framework? One of the most common “showstoppers” in an enterprise environment is the lack of network access, low bandwidth and a long roundtrip of your data. Solving these challenges could take significant resource, so ensure you plan, test and involve IT ahead of time.
Is the storage system compatible with the connectors available in the ML platform, and is the data format well supported? It may appear trivial to change the representation and location of your data sets when experimenting with sample data, but it all changes when the data volumes increase. Will the ML framework fit the data set size, and be aware that while many may claim to be big data ready, in reality few are. Whatever the power of the ML technology, ensure that you are not including a large data burden, which is not necessary as part of the scoring process.
Success Measurement:
You should be able to determine success from failure, so objective metrics are key. Can you translate the model’s success in to business metrics, and are you able to communicate the expectations to a non-technical user of your ML models?
Model testing is key when it comes to success measurement. Based on business needs, and the nature of the data set, it is crucial that we incorporate the correct validation strategy of our models, choose the right evaluation metrics and data set splits. The stability, sensitivity and interpretability of the model needs to be considered as part of the measurable outcome.
IT teams need to avoid model decay, as all parties want to access the freshest model in production, but you need to balance “fresh” with the hassle that this could bring. How fast do your models decay, and how severely will the decreased accuracy impact your business case? How costly would it be to retrain the model and redeploy the new one to production? Are you able to replace the old model with a new one without bringing down the whole operation, and if not, what is the cost of maintenance? How often would you be able to collect data from your systems for retraining of the models? How large is the gap in the historical data between ‘now’ and the last data point collected, as these costs of having a fresh model in production need evaluation with the business.
Staging and model monitoring is also important, as we must design the right model for it to shine in deployment. We need to measure the quality of the model in the testing phase, and to establish and maintain baseline criteria that will determine whether we’ve successfully built a model that brings value. In addition, to ensure that the model keeps performing well in production, we need to monitor the same qualities as we would in the lab. A model only recognises patterns from the data it was trained from, as data patterns and values change, so what is the impact on the quality of results being inferred from the model?
Beyond application metrics, resource utilisation is important. The principles of prudent asset management encourages us to monitor the resource usage, such as compute, memory and storage. The consumption of such resources is both variable and case-specific.
Bringing the Model to Production:
Many enterprises serve multiple tenants at once. It is important to distinguish between the cases where multiple tenants can benefit from using a common ML model trained on a joined data set, and cases where such an approach could lead to inferior results.
As with other assets, it is important to keep old models archived and versioned. Authorities may require you to document the underlying logic of certain business operations at any previous time, an internal audit may be required by executives or a flaw in a current model may enforce a rollback to the last known correct model version. Metadata describing the parameters and structure of the model, together with production, needs to be stored too. For experimenting with new ideas, using methods like A/B testing and versioning of the models with their KPIs is invaluable.
It is challenging to predict all the different flavours of scoring logic integration. Underestimating the proper planning of integration caused the productionisation of great ML models to take significantly longer than the actual development of the model.
In simple terms, what was possible in the R&D lab could be a serious issue in the business environment. An unpleasant discovery may require for privileged access to run-scoring logic or set up the dependencies of the scoring run time. This may be an unsolvable problem or may take a while to resolve. Access to sensitive data may require a hermetically closed environment, without access to the internet or any other sources of dependencies. Lazily resolved dependencies or just model deployment processes requiring an online dependency resolution may not be available, and missed deadlines may occur whilst trying to fix the issue
Having your data team build the right ML model for your enterprise, even if it’s perfectly accurate, given the current market conditions, could be a waste of time and effort, unless it is deployed effectively across the organisation, and helps guide business decision-making. The strategy should be to address the challenges, in your chosen path to deployment and to speak in a language that the business appreciates. It is only then that the value of your Machine Learning data model will accelerate across the enterprise, and maximise business impact.