DataOps in ML and AI

Your machine learning (ML) initiative is only as good as the data you feed it. ML algorithms require high-quality representational data for training, and reliable data streams for execution.

The Challenge

Machine learning frameworks and services have built massive momentum. Sophisticated models are driving new forms of data science, and ML-enabled applications are delivering new capabilities around voice computing, image recognition, and more. All of this is driving demand for clean, representative data.

Poor training data leads to bad models and ineffective algorithms. Unreliable data pipelines affect the ability of ML systems to deliver timely answers which, in turn, has an outsized impact on business outcomes. A brief delay in getting accurate data once resulted in a few compromised decisions with limited financial impact. But with ML-based systems driving business decisions at scale, delayed or dirty data could result in much more significant losses.

These factors make DataOps critical for the operations of deployed ML systems, as well as in training contexts where full, clean datasets are required to test and tune the algorithms that constitute a system’s intelligence.

DataOps Success Patterns

People

  1. Educate the operator: Data consumers with expertise in AI/ML must educate their operator counterparts— who may be unfamiliar with relevant concepts — on new workflows, dynamics, and requirements demanded by emerging systems.
  2. Organize goal-oriented teams: With the ML field rapidly evolving, teams must work cross-functionally to advance their projects with speed and efficiency.

Process

  1. Leverage “hands-free” processes: Instant, seamless, and automatic delivery of data is vitally important to fully leverage the automation provided by ML. Processes must be be “hands-free” from the beginning.
  2. Review existing data policies: Process review should be integrated directly into ML development projects: Pre-existing data management policies were designed with people in mind, not machines.

Technology

  1. Treat data like code: You will be continually transforming and redesigning existing datasets to better serve ML projects. Changes should be versioned, stored, and otherwise treated as code to maximize flexibility and support iterative development processes.
  2. Focus on automation: Automated data selection, cleansing, and delivery prevents data from bottlenecking time-sensitive ML processes.

Data Operators and Consumers

Data friction exists between two groups of people: Data Operators and Data Consumers. Here are some examples of both for Machine Learning and Artificial Intelligence

Data Operators

  • Database Team
  • Data Engineers
  • Storage Team
  • Server Team
  • Information Security
  • Data Architects

Data Consumers

  • Developers
  • Testers
  • AI/ML training systems
  • AI/ML production systems