The Collibra Data Citizens show in San Diego is only 7 days away now! In advance of the DataOps.live presentation: Data Products as 1st Class Citizens in a Snowflake Architecture, I wanted to share a prelude of thoughts.
In the early days within the data office, we faced challenges with providing relevant data to our internal or external customer. There are various areas to focus on why that was, but as one Head of Data Office recently asked me, “What would you start with if you had a blank slate?” I said, “governance of course, but I'm biased.” He told me that “while many enterprise governance initiatives have been less successful than hoped, he'd start with it too!” The real question is why is there that perception?
From my experience, data governance is not only about the defensive aspects of working with data, like defining standards and policies, but also business critical needs like connecting your data producers to data consumers. Working with organizations to help setup a data governance office, I've been faced with a lot of cases where we wanted to speed up the delivery of data "products" to the consumer. In these early days we never called the deliverables “data products”, but we did place value on them and treat them as "sellable" items to an internal (in most cases) consumer.
Defining those sellable items in Collibra we'd often confuse ourselves a little bit as we didn’t have a standard to clearly differentiate between the Table or Data Set being that consumable piece. "Why can't we use a Snowflake/BQ table, instead of using a Data Set in the shopping for data process?" was a frequent question. I'm sure a lot of practitioners would face similar questions in previous days defining data products, thinking of it just a Data Set (term being standardized now), or let's say a Tableau Report. (BTW, as you already likely know, Collibra has now added Reports as a Standard Asset Type supported by the shopping for data process) The key thing we have learned is that it is critical to work closely with your Data Consumer and their business requirements for the data insights.
Table Level access is quite straight forward and understandable in many instances where we would want to manage access control for example. But if we want to provide an atomic sellable piece to the end user, we might require the ability to know who "bought" that piece of data, and to check the contract (or data sharing agreement) on how and by whom it can be used, ensure data quality and freshness, and allow consumption of our precious data though multiple channels—in these instances we might find the Data Set concept useful, if sometimes not enough…
Today, we are now simply evolving past that, and have started to define an agreement in our organizations on what that “consumable” piece of information is and define a common understanding of that for all to know. In this context I believe, is where the Data Product concept is truly being born. A Data Product becomes a mix of structural components, supporting process and organization. When we look at the implementation of data products at scale, the automation of both governance processes of consumption of each data product, and the data engineering layer where the Data Product is been produced is critical, to ensure Data Product adoption and usability.
In our session at Collibra DC2022, we are going to have a closer look into the operating model and definition of the data products in Collibra & DataOps.live, plus future opportunities for DQ & Observability, and lessons learned. Come and join us here!