MLOps so AI can scale

Building advanced AI is like launching a rocket. The first challenge is to maximize acceleration, but once it starts picking up speed, you also need to focus on steering.

Jaan Tallinn

For AI/ML to make a sizable contribution to a company’s bottom line, organizations must scale the technology across the organization, infusing it in core business processes, workflows, and customer journeys to optimize decision making and operations in real time. This is particularly difficult with AI/ML models because they are “living organisms” that change with the underlying data. They require constant monitoring, retraining, and debiasing—a challenge with even a few ML models but simply overwhelming with hundreds of them.

In recent years, massive improvements in ML tooling and technologies have dramatically transformed ML workflows, expedited the application life cycle, and enabled consistent and reliable scaling of AI across business domains. With all these new capabilities, however, the key point to remember is that effective ML operations (MLOps) requires a focus on the full set of application development activities rather than just focusing on the models themselves. We estimate that as much as 90 percent of the failures in ML development come not from developing poor models but from poor productization practices and the challenges of integrating the model with production data and business applications, which keep the model from scaling and performing as intended. Effective productization requires developing an integrated set of components to support the model (or, often, set of models) such as data assets, ML algorithms, software, and user interface.

Never just tech

Content update to the original chapter:

The growth of generative AI (gen AI) is leading to an evolution within the MLOps landscape, necessitating the expansion of existing capabilities to construct a more comprehensive gen AI infrastructure. That’s because the nature of gen AI models introduces inherent risks due to their “black box” nature and tendency to generate false outcomes (hallucinations) or outputs not anchored in factual data. The reliance on external large language models (LLMs) also amplifies privacy concerns.

To navigate these complexities, three capabilities have emerged as critical within the MLOps framework for gen AI: 1) automation and data pipeline development, which is essential for assimilating diverse data sources to support gen AI and facilitate its transition into production; 2) modularization and model-application interplay, which enables easy interaction between various large and increasingly small language models from multiple sources, which in turn require robust standards and capabilities; and 3) continuous risk assessment, monitoring, and fine-tuning, which are indispensable for maintaining the integrity and effectiveness of gen AI applications.

MLOps is really a set of practices that are applied across the life cycle of an ML model (exhibit):

  • Data: Building systems and processes that continuously collect, curate, analyze, label, and maintain high-quality data at scale for ML applications.
  • Model development: Professionalizing model development to ensure that high-quality algorithms can be explained, are not biased, perform as expected, and are continuously monitored and regularly updated using fresh data.
  • Data and model pipelines: Maximizing the business value and reducing the engineering overhead by delivering integrated application pipelines that accept data or events, process and enrich them, run the model, process the results, generate actions, and monitor the different components and business KPIs.
  • Productizing and scaling: Enhancing the data processing and model training components to run at scale, including adding tests, validation, security, continuous integration and continuous delivery (CI/CD), and model retraining.
  • Live operations: Actively monitoring resources, performance, and business KPIs.
MLOps is applied across the entire AI/ML model life cycle.

This is an ongoing process requiring you to build robust engineering and ML application practices to continuously develop, test, deploy, upgrade, and monitor the end-to-end AI applications. MLOps builds on DevOps engineering concepts and end-to-end automation to address AI’s unique characteristics, such as the probabilistic nature of ML outputs and the technology’s dependence on the underlying data.

Rewired cover

When companies embrace MLOps best practices, it can dramatically raise the bar for what can be achieved. It’s the difference between experimenting with AI and transforming your company’s competitive position with AI. Effective MLOps relies on implementing four key practices:

  1. Ensure data availability, quality, and control to feed the ML system

    ML models are dependent on data. Without high-quality data, and available data, the ML models will not be accurate or usable. So, you need to implement data quality checks. Tools are now available to assess data quality and detect anomalies to find errors. This is useful in high-throughput scenarios such as monitoring financial transactions.

    To ensure data availability to feed the ML models, you will need to extract from raw data the features that will drive the ML model.

    These features are the fuel for ML models. For example, barometric pressure is measured by atmospheric sensors, but the feature in a weather-forecasting model is the change in barometric pressure. A feature store is a central vault for these features. Feature stores manage, maintain, and monitor features, ensuring that the fuel needed for ML models is consistently available.

  2. Provision tooling to optimize ML development

    Writing reproducible, maintainable, and modular data science code is not trivial. Software frameworks such as Kedro (using Python) aim to make it easier. They borrow concepts from software engineering—including modularity, separation of concerns, and versioning—and apply them to ML code.

    Data scientists like to experiment, trying different data/features and different algorithms to develop a model that satisfies a business outcome. These experiments need to be stored somewhere, along with any associated metadata (for example, the features that were used or any additional model configuration that was used). Tools such as MLflow and MLRun provide model governance and an ability to reproduce these experiments, and also track which experiments have yielded a better business outcome.

  3. Implement an ML delivery platform to automate as much as possible

    Moving from small-scale data science exploration and model development to large-scale production often involves code refactoring, switching frameworks, and significant engineering work. These steps can add substantial delays or even result in the failure of the entire solution.

    It is crucial to design and implement a continuous ML application delivery platform. This platform should execute scalable and automated pipelines for processing data, training, validation, and packaging of high-quality models for production. In addition, the ML platform should deploy the online application pipelines that incorporate the trained model, run data pre- or post-processing tasks, integrate with the data sources and other applications, and collect vital data, model, application, and business metrics for observability.

  4. Monitor model performance to drive continuous improvement

    ML models are not like software. When software is deployed to production, the deployed software should work as expected (as long as there has been a focus on quality and rigorous testing). On the other hand, ML models are “trained,” which means that people need to monitor how each model works and adjust it to improve outcomes over time. Similarly, ML models are sensitive to real-world data conditions and can degrade over time, which is why it is important to monitor them to ensure they are behaving correctly.

    For example, when we were locked down during the worldwide pandemic, customer behaviors changed overnight. ML models that had been trained on historical customer spending patterns (pre-pandemic) were no longer able to make effective predictions, such as models recommending a customer should visit a restaurant, even though the restaurants were closed. This is why monitoring model performance and being able to rapidly diagnose the underlying reason for the variance are critically important.

    Model monitoring should extend beyond looking for drift. It should also be validating data quality and conformance, and measuring model accuracy and performance against business KPIs. This more expansive view of monitoring is particularly important so that companies don’t just fixate on model performance but assess how well it’s helping the business.

MLOps is a fast-evolving field. As of the time of this writing, more than 60 suppliers offer different MLOps software tools ranging from turnkey platforms to niche tooling.

Excerpted with permission from the publisher, Wiley, from Rewired: The McKinsey Guide to Outcompeting in Digital and AI by Eric Lamarre, Kate Smaje, Rodney Zemmel. Copyright © 2023 by McKinsey & Company. All rights reserved. This book is available wherever books and eBooks are sold.