How do you balance agility and governance, particularly when you now have the capability to ingest, deploy and transform massive amounts of data at a scale that no-one had previously dreamed of? And often this is sensitive data. Perhaps you’re working a highly regulated area like healthcare or financial services.
How can you ensure domain teams adhere to the rules and standards while also giving them freedom and autonomy - the responsibility - to manage and develop data products appropriate to their domains, without it turning into a free-for-all?
The concept of federated governance isn’t new but the popularity of data mesh means it’s become a much hotter topic, and a challenge for teams to solve. One of the four pillars of data mesh, a federated data governance framework provides for some centralized management, for enterprise level rules and standards, but the ultimate responsibility to apply and execute the rules is decentralized to the domains.
In short, it’s a way to ensure good governance with the right rules being applied in the right ways to the right data but through a more flexible approach, one that empowers domain team developers and supports collaborative working.
For example: you have a governance policy that all PII data must be masked. In federated governance, it’s up to teams in finance, customer service, product management, etc. to apply this policy appropriately to the data in their domain. They are responsible for looking at what PII data they have, and defining the best ways to apply the rules in their area: proper tagging, making sure masking is being applied, that clear rules exist around who has access to the data, and so on.
Control + Governance + Collaboration + Managed Self-Service
If you’re in a fully centralized environment with a single IT team responsible for everything, you probably don’t need an approach like federated governance (what would be the point?). But even in that world, IT still needs input from the domains (what we used to call “the business”). Who should be allowed to access this data? What are the rules? Are there regulatory requirements? Data mesh holds that doing things this way means bottlenecks: the IT team has to do all the work find this information around the governance of the data and then apply they rules to that data (assuming they actually understand the data). Add to that the fact that all the departments and functions that may deal with a business subject may have very different rules and regulations to apply. It is easy to see how that team may end up with a backlog and become a bottleneck to rolling out the data with proper governance in place.
The data mesh proposal is to decentralize the application of governance to the people who know their data/sector/rules/regulations best. Federated governance gives them the responsibility to apply and execute proper governance, with a central, enterprise focused team providing guidance, tools and training. In this world, the DataOps.live platform provides you with the management and oversight (control) you need, to put standards in place, with related quality checks and workflows to enforce them.
Such an approach supports Collaboration, a pillar of #TrueDataOps. Another pillar, of course, is Governance and Security. DataOps.live gives you the tools and functionality to administer all that governance and to distribute it to domain teams, so people can work in faster, easier and more collaborative ways while ensuring the enterprise rules are always followed. In order to move something into production, governance, approvals, and testing are built into the platform. And it’s all traceable and auditable.
(There are many other interesting aspects, too. For example: a neat feature of Snowflake is that features like dynamic data masking are policy-based. You develop a policy to apply the rule at a various levels and that rule follows the data wherever it goes, even if it is shared out in a data product. That solves another challenge: what happens after the data leaves your domain? It needs to stay masked and secured in appropriate ways after it’s shared.)
Roche: Centralized Policy Definitions and Enforcement
When Roche Diagnostics wanted to implement a novel data mesh approach in its highly regulated healthcare sector, it chose a forward-looking data stack that included Snowflake and DataOps.live. Omar Khawaja, Global Head of Business Intelligence, said Roche “wanted to address critical issues around our people and our unique decentralized culture, to empower and give ownership to teams, creating and using our tech stack in a very different way while still having the federated governance on top.” With DataOps.live enabling centralized policy definitions and enforcement, application is decentralized so data product teams in different domains could build and adapt products faster and more effectively than previously, using all of the governance best practices such as code check-in and checkouts, and with multiple data engineers working concurrently without creating bottlenecks or interfering with one another.
According to Paul Rankin, Head of Data Platforms, “Federating this approach across hundreds of developers in 20+ data product teams was the challenge… data is continuously changing and the pipeline is continuously questioned. This could never have been done with a centralized approach.”
Avoiding ‘Data Chaos’
Even though you’re giving individuals and teams freedom to be co-operative and creative, you still need governance rules and they need to be enforced. Federated governance is the way to set, share and apply the rules in smarter ways. One challenge (and fear) around data mesh is ending up with more data silos. We already know what a Data Wild West can look like. Decentralizing doesn’t mean giving everybody 100% responsibility for, say, picking their own data stack, and making and enforcing their own rules and standards. A potential pitfall of data mesh is that you don’t have an overarching architecture or framework that allows you to easily and transparently manage all those moving parts, giving you the enterprise-level oversight you need. Unless, as we’ve seen, you have a robust approach to federated governance through a platform like DataOps.live.
It does sound like a paradox: control plus freedom. But it’s actually about responsibility and where that lies, along with removing bottlenecks. The principles you need to be successful taking this approach are already built-in to the seven pillars of #TrueDataOps. Environment management in tandem with governance and security, alongside automated monitoring and testing to make sure things are working as intended, so enabling collaboration and managed self-service, and so on. DataOps.live ties all this together, making #TrueDataOps a reality and enabling data mesh in such a way that all data is being governed as needed and data chaos is avoided.
When we talk about federated governance what we’re really talking about is good data governance full stop. But the thing is, the data world and the needs of large distributed organizations have changed. What’s required today is a strong, yet flexible, governance framework that actively supports the speed, autonomy and agility that domain teams are striving for. This is what the DataOps.live platform enables.
Want to learn more about DataOps.live?
- Download your copy of our exclusive eBook, DataOps for Dummies - the must have guide to kicking off your DataOps journey
- Check out the Roche Case Study
- Register for our Podcast Series hosted by Kent Graziano, The Data Warrior
Ready to jump right in? Begin your DataOps.live Free Trial