Data Friction

The constraints on data that keep people from meeting the ever growing demand of the business.

Data anchors applications. Using the laws of physics as an analogy, data has a lot of “mass”, which means it takes a lot of effort to move around. Furthermore, “data entropy” results in it spreading across silos, which also makes it more difficult to store and protect. And data has a high coefficient of friction; it’s often the most restrictive force in application projects. It’s a huge issue and it’s been this way for a very long time.

Data Friction manifests itself in a variety of ways

Data Friction, the sum total of complexity and delays in delivering data to data consumers in the enterprise, is a growing concern. DataOps aims to minimize data friction so that enterprises can leverage data quickly, easily and when needed in order to maximize their business performance.

Data Friction can result from the activities:

  • Governance: Enterprise data governance activities ensure that the right people can access the right data, that the data is not tampered with, and that unauthorized access or data changes can be tracked. Governance becomes even more important in heavily regulated industries with sensitive data, such as healthcare, finance, or retail. Manual or immature approaches to handling these concerns slows down data delivery and increases the risk profile for the business. Additionally, many scenarios encountered by data consumers require repeated use of known data states, reuse of those states with minor modifications, or referencing distinct data sets for collaborative purposes. Bringing data, and its associated objects and changes, under version control helps drive higher quality, efficiency and predictability in activities of data consumers.

  • Operations: Without operational excellence, availability of data platforms and systems cannot be guaranteed. Disrupted availability can result in downtime that impacts revenue, reputation and trust. Most enterprises invest heavily in monitoring, downtime prevention, backup, recovery, scalability and redundancy of data systems. With increasingly heterogeneous data environments, this task becomes even more complex. Without a comprehensive strategy, culture, and accompanying tooling, a streamlined and reliable approach to operational excellence becomes difficult to achieve.

  • Delivery: Most large organizations hundreds if not thousands of apps and data stores, sprawling across organizational and geographic boundaries. It can be difficult for data consumers to access the data they need if it is splintered across multiple systems, or otherwise inaccessible due to security controls, permissions restrictions, or performance impact on production systems. In these scenarios, manual effort and coordination across teams is often required to provision the right data, making efficient delivery at scale untenable.

  • Definition: Long standing companies often have legacy data sources that have been present longer than most employees; which can result in losing the original intent of why some data was captured or how it was used. Additionally, with the explosion of the volume of data and the proliferation of new data sources, the data landscape has become a complicated web, filled with fragile interconnected dependencies, that is entangling and complicating our businesses. The lack of a defined and documented data landscape makes it impossible to have a clear view of the complete data needs and requirements of the business.

  • Transformation: Data consumers are increasingly combining data in innovative ways to derive a deeper understanding of their businesses. To leverage data from existing repositories within new contexts, it often must be transformed in terms of structure/schema, or to meet business concerns such as privacy. Techniques such as masking, de-identification, and schema conversion transform data into a form that is usable. However, these solutions can be time-consuming and error-prone without the appropriate tooling and automation. While new “flow”-oriented constructs such as data pipelines are helping organizations transform and use data on-demand or in real-time, traditional “batch”-oriented means of transformation such as ETL are still common sources of friction.

In practice, multiple sources of Data Friction manifest together in various scenarios. For example, an enterprise seeking to understand customer purchase behavior might require a copy of production data (Delivery) with personally-identifiable information that has been obfuscated or masked (Transformation).

Without a concerted and intentional effort to address it through DataOps, Data Friction becomes the rate-limiting factor affecting many enterprises’ ability to capitalize on innovation.