Utilize MLops to improve AI model performance and get the data correctly

Utilize MLops to improve AI model performance and get the data correctly ...

It''s vital to develop a data-centric mindset and support it with strategy development using ML resources.

The value of artificial intelligence (AI) in the lab is one thing; in the real world, it another. Many AI models fail to produce reliable results when deployed, but others start well, but then results reduce, leaving their owners frustrated. Many businesses do not get the return on AI they anticipate. What is the correct answer?

As enterprises have expanded their knowledge about AI models, there have been some successes, but many disappointments. According to Dimensional Research, 96 percent of AI projects are confronted with data quality, data labeling, and building model confidence.

While keeping the models data constant, AI researchers and business leaders are adept at a traditional academic technique of boosting accuracy. That is, while tinkering with model architectures and fine-tuning algorithms. Its akin to mending the sails when the boat has a leak it is an improvement, but it is the wrong one. Why? Good code cannot overcome bad data.

Rather, they should ensure datasets are appropriate for the application. Traditional software is powered by code, while AI systems are developed using both code (models + algorithms) and data. Take facial recognition, for example, in which AI-driven apps were trained on mostly Caucasian faces instead of ethnically diverse faces. Results were less accurate for non-Caucasian users.

Good training data is only the starting point. In the real world, AI applications are often initially accurate, but then worsen. Many teams respond by tuning the software code. That doesn''t work because the problem was affecting real-world conditions. The answer: to improve reliability rather than algorithms.

As AI problems are often related to data quality and data drifts, practitioners can apply a data-centric approach to keep AI applications healthy. Data is like food for AI. In your application, data should be a first-class citizen. This is not enough; organizations need an infrastructure to keep the correct data coming.

MLops: The how of data-centric AI

Continuous good data is required for ongoing process and practices formerly known as MLops, which are essential to a data-centric AI approach. MLops works by addressing the specific difficulties of data-centric AI, which are too costly to maintain regular employment. Here is a sampling:

  • The wrong amount of data: Noisy data can distort smaller datasets, while larger volumes of data can make labeling difficult. Both issues throw models off. The right size of dataset for your AI model depends on the problem you are addressing.
  • Outliers in the data: A common shortcoming in data used to train AI applications, outliers can skew results.
  • Insufficient data range: This can cause an inability to properly handle outliers in the real world.
  • Data drift: Which often degrades model accuracy over time.

These problems are serious. A Google survey of 53 AI practitioners found that data cascadescompounding events causing negative, downstream consequences from data issues triggered by conventional AI/ML practices that are too high (71 percent prevalence), invisible, delayed, but often avoided.

How does MLOps work?

Prior to deploying an AI model, researchers must prioritize it in terms of new data.

  • Audit and monitor model predictions to continuously ensure that the outcomes are accurate
  • Monitor the health of data powering the model; make sure there are no surges, missing values, duplicates, or anomalies in distributions.
  • Confirm the system complies with privacy and consent regulations
  • When the models accuracy drops, figure out why

Here are several questions to address: to practice good MLops and to responsibly develop AI

  • How do you catch data drifts in your pipeline? Data drift can be more difficult to catch than data quality shortcomings. Data changes that appear subtle may have an outsized impact on particular model predictions and particular customers.
  • Does your system reliably move data from point A to B without jeopardizing data quality? Thankfully, moving data in bulk from one system has become much easier, as tools for ML improve.
  • Can you track and analyze data automatically, with alerts when data quality issues arise?

MLops: How to start now

As an early days discipline, MLops is evolving. There is no gold standard or approved framework yet to define a good MLops system or organization, but here are a few things:

  • In developing models, AI researchers need to consider data at each step, from product development through deployment and post-deployment. The ML community needs mature MLops tools that help make high-quality, reliable and representative datasets to power AI systems.
  • Post-deployment maintenance of the AI application cannot be an afterthought. Production systems should implement ML-equivalents of devops best practices including logging, monitoring and CI/CD pipelines which account for data lineage, data drifts and data quality.
  • Structure ongoing collaboration across stakeholders, from executive leadership, to subject-matter experts, to ML/Data Scientists, to ML Engineers, and SREs.

Continued success for AI/ML applications requires a shift from getting the code working and youre done to continuing focus on data. The goal of systematically improving data quality for a basic model is rather than seeking out modern-day solutions with low-quality data.

MLops, which isn''t yet a defined science, encompasses practices that make data-centric AI functional. In the near future, we will learn much about what works most effectively. During the future, you and your AI team may proactively and creatively develop an MLops framework and customize it to your models and applications.

WhyLabs CEO, Alessya Visnijc

You may also like: