Many enterprises today face a data dilemma: siloed data, limited data access, and data quality that's questionable at best. That's probably why a recent survey found that only 26.5% have established a data-driven organization. Instead, data-driven decision-making only happens in pockets across the enterprise.
Data productization is the answer to this problem. Treating data as a product helps everyone in enterprises access trustworthy data for smarter decisions, innovation, and new revenue streams. At the same time, they squash any data governance issues.
A data product is a product or service that leverages data as its core component to deliver value to end users or customers (data consumers). It involves collecting, analyzing, and using data to provide insight, information, or functionality that helps address specific needs or business challenges. Data products integrate data from source systems, process it, ensure compliance, and make it instantly available to consumers.
The process of creating the product is as important as the product itself. Most data-related initiatives in the enterprise take a project approach. Each time a new business question or challenge arises, the data team or analyst must acquire, clean, prepare, and analyze a new dataset. This reactive approach to responding to data requests can lead to slower data delivery and can easily cause duplicate, siloed work.
In contrast, organizations that take a product approach treat data as a reusable asset. They apply product management principles to create data assets, services, or systems that can be used repeatedly. A data product owner oversees the entire lifecycle of these data products, from gathering requirements to managing releases to eventual retirement. This allows a proactive approach where data product owners understand how their customers use their data products and continually improve them and keep them available to the data consumer.
While data products have been around for some time, the idea of managing data as a product for internal customers has gained momentum recently with the advent of Data Mesh.
Data Mesh represents a paradigm shift in data management. It removes the need for a single IT/data team to control all data in a data warehouse or data lake. Instead, different groups or units take ownership of their own data, treating it like a product. Linking data ownership more closely to those who understand the business challenges creates more value from the data and better outcomes.
Data products can take various forms depending on the industry and application and target internal or external audiences. Here are a few examples:
Creating higher value data products faster
Focusing on the data engineering aspects, for example, the modeling and transformation of data, is an important aspect of a data product. But doing so exclusively is insufficient. It will reduce the scope to just a subset of the ten defining data product principles. You can ensure that the datasets are inherently valuable, yet you will miss out on many other of the principles.
You need to enable yourself to product manage your data, starting with a focus on business outcomes. This will ensure you are setting goals, working against a well-defined roadmap, and have a lifecycle in mind. As an example, this will allow you to sunset data that no longer needs to be optimized, saving your costs and data consumption.
As soon as you start managing data as a product, you are able to embrace agile development processes using the principles of DataOps. You can introduce and leverage team workflows allowing each data engineer in the team to be self-sufficient with an isolated development environment, yet benefit from a standardized workflow, review, and approval process to promote your data product to production without conflict between the team members.
As you develop and improve the value of the shared data asset, you will inevitably have to face backward compatibility concerns and questions from the consumers of your data. Having the guardrails available to ensure that every change is tracked and assessed for compatibility gives you and the team the confidence to move fast and fail fast. Further, it gives you the means to identify when to ship a new major version since breaking changes can no longer be mapped and maintained by an older version.
For you to master versioning, a key concept is encapsulating which data you share and creating and exclusive access layer—the dataset—for your data. By not exposing all your data, just the well-defined, exclusive access layer, you retain the ability to freely change the underlying representation, data pipelines, or even your choice of vendor to curate a given data asset.
As you share data with the rest of the organization, you'll have to make certain guarantees about it. For instance, you may have to provide the same dataset with up-to-date data every day, hour, or minute. You may have to guarantee the completeness and consistency of the data. Why? You want your data consumers to trust your data and ensure everybody in the organization has the same data simultaneously.
As you share data with the rest of the organization, specifically across different departments and domains, you'll face interoperability issues. Even if you are still in the same domain, e.g., the people management org, the person’s birthdate might once be captured as a date, once as a string. Merging and joining datasets with the same semantics but different technical representation requires you to clarify such semantics carefully, further enabling you to first compose a new data product within the same domain, then eventually across domains or companies.
Typically, you are also subject to compliance regulations. This will require you to secure your data in various ways. From physical access security to data encryption, continuing with data masking and anonymization or making data only available in a specific region, security is always a core principle of any data product.
Finally, you will have to think about the accessibility and discoverability of your data. Accessibility requires you to make data available in the format your consumers require it. This may include using a Snowflake data share or providing it as PDF document. Discoverability defines how your stakeholders in the rest of the organization can find your data. Still, it also ensures they can trust it, and that you are adhering to your given guarantees. Normally, you want to publish your data product to a catalog having the key information your consumers need.
Now, how do you encapsulate the core principles of each data product and why?
Let’s start with the why. As you share data across organizational boundaries, you are faced with the questions outlined above. Depending on your role in the organization, you may care more or less about certain ones. If you are a data product producer, you want a developable product. If you are a data product consumer, you want some guarantees about the product. An easy way to capture the criteria both parties care about is a data product contract.
A data product contract needs to encapsulate a set of contract items.
To make a data product findable, be sure to add awe require name, description, and version to be published to a data product registry. To share a data product and make it inherently valuable, you need to capture the datasets. To make a data product accessible, share the desired output ports. To provide the necessary guarantees, define the Service Level Objectives (SLO), the desired KPIs, and Service Level Indicators (SLI), as well as the current value of the KPI.
As you deploy many data products across domains and the organization, you want to compose data products and ensure interoperability as defined in the metadata of the datasets. If you are sharing data outside your organization, you will want to define the license types. If you want to monetize data, your clients will likely ask you to provide sample data so that they can try it before they buy.
Defining the data product contract allows you to foster communication between data product producers and consumers effectively. The data product owner can improve data quality over time. The data product consumer can enjoy a service level agreement with defined guarantees supporting them throughout the entire product lifecycle.
By treating data as a product, organizations prioritize agility, product lifecycle, data quality, and governance. This focus creates highly accessible data products that can be easily shared and found across an organization, leading to improved data insights and enabling data-driven decision-making at all levels of the organization.
Focusing on the needs of the people—both the consumers and producers—and pairing it with processes and workflows enables your data teams to scale throughout the organization. The consistent, holistic data product approach has many benefits including improved productivity of your data teams, scalable federated governance across team boundaries, and increased adoption by your line of business users.
To see data products in action, reach out to us and learn how a DataOps approach to managing your entire modern data platform will accelerate time to value.