Data Mesh is a decentralized architecture that organizes data by specific business domains and teams, leveraging a self-service approach. It’s designed to provide greater ownership and responsibility to Data Product teams and, it is hoped, more effective outputs in a timely manner.
Data mesh is founded on four pillars:
- Empowered domain ownership—letting the experts who know the data best do their jobs.
- Cross-organizational transport—enabling consumption through data products.
- Self-service analytical platform—reducing costs, improving agility.
- Federated governance—the ‘glue’ that holds the process together, enabling interoperability and compliance.
Is DataOps simply an enabler for more sophisticated design patterns like Data Mesh? If we carry on with DevOps thinking—considering how the involvement of end users in a product lifecycle went from passive to truly active, one starts to see that it’s more than that.
We spoke with Paul Rankin, Head of Data Management Platforms at Roche Diagnostics, who shared, “DataOps is at the heart of our Data Mesh implementation. Everything from ingestion, transformations, DQ, access policies, and data product governance is orchestrated by DataOps.live. DataOps is a must for any company wishing to implement Data Mesh. A distributed architecture requires a tool like DataOps.live to allow the regular data engineer to easily navigate the complexity of CI/CD, code release, and orchestration.”
It’s easy to see how data mesh dovetails with #TrueDataOps, and with the capabilities and flexibility of the DataOps.live platform. DataOps.live carefully aligns with the requirements of data mesh and the opportunities it presents, particularly when you consider the capabilities added in our latest release that enables enterprise data engineering and management at scale.
Data mesh can be described as a decentralized sociotechnical framework. Because of this, It requires the right mindset and organizational structure to work. This isn’t just about IT doing its job, it’s about organizations having a genuine appetite for change, being open to a decentralized approach, and embracing a data-driven culture from the top down. Everyone needs to be on board, with roles and responsibilities agreed upon. And as a minimum, you need high-quality data, and for the data products created, to be discoverable, consumable, and usable by everyone in the internal organization.
Why Data Mesh?
The founder of data mesh, Zhamak Deghani, calls it a new approach to sourcing, sharing, accessing, managing, and analyzing analytical data at scale. The “at scale” is key. Data mesh has emerged at what many people see as an inflection point, where the traditional data management approaches can no longer match the complexity of organizations, the proliferation of the data sources available, and the scope of requirements to create as much data-driven value as possible, using analytics and tools such as artificial intelligence.
Some critics have argued that promises around ‘building a data culture’ and ‘becoming data driven’ have yet to be delivered—and it’s true that data mesh is still in its relative infancy. Challenges remain. For example, the perennial risk of disconnected data silos (and a lack of governance around those), issues around the quality of the data used in mission-critical analytics, and gaps between operational and analytical data in terms of value delivery. Data mesh alone doesn’t immediately solve such issues, yet the benefits promised by this new paradigm remain clear.
Questions have also been asked about the ability of data mesh to scale quickly and effectively. Experiences on the ground, including innovative satellite communications provider OneWeb and Roche Diagnostics, suggest that such scaling is truly achievable. Both companies are using DataOps.live, which placed the scalability issue at the heart of its October 2022 release.
Omar Khawaja, Roche Diagnostics’ BI & Analytics Services Team Leader, stated, “You can’t carry on doing the same things and expect different results. We wanted to move the needle further on the dial and become a more agile data-driven business, which led to a pioneering data mesh and TrueDataOps approach.”
Data Products and the Role of DataOps
Data mesh turns the notion of ‘IT delivering data products’ on its head. The thinking starts with the needs of business users as supported by expert domain teams. Then work backward from that—rather than looking at technical architecture and assets first and letting those drive the assembly and delivery of data products.
The old way is slow and full of risks: IT rarely knows precisely how the data should be used, including the business rules and factors that apply to the data in order for it to be valuable to business users. This is where data mesh concepts such as self-service use by domain teams, accessibility and consumability, and federated governance come to the fore. These concepts are aligned with the principles of #TrueDataOps, which underpin the practical application of the DataOps.live platform.
For example: providing agility, with continuous integration/continuous deployment (CI/CD). You need reliable, repeatable, standardized processes for your domain teams, especially when it comes to data pipelines, if you’re going to build consumable data products successfully and instill that mindset in domain teams. This includes automation to build new data products and add to existing ones. It’s very difficult, if not impossible, to achieve your vision for data mesh and all that entails without having a disciplined approach such as this, which means #TrueDataOps.
You also need a focus on component design and maintainability to create reusable data products: a modular way to organize, reorganize, and reuse data. You shouldn’t have to build it yourself from scratch: use your colleagues’ work and blend that with your data. Treat it like a giant box of Lego: all those different pieces, sizes, colors, and configurations to help you build precisely what you need at that precise moment, which you can later take apart and reorganize. Again, this is about developing a new mindset, and it can be a steep learning curve for domain teams.
The crossover between data mesh and #TrueDataOps continues with governance, security, and change control; building sharable data products requires governance by design and security by design. For data mesh to work, because data products are meant to be consumed and shared, the platform has to make that easy while ensuring the data is still controlled and secured.
Automated testing and monitoring are also critical to moving fast, staying agile, and ensuring you don’t break data products already in use. And, of course, include collaboration and self-service, empowering domain teams to build the data products and enabling consumers to discover and use them, all within a powerful federated governance framework. You’ll need a strong data catalog and metadata.
Taking People (Domain Teams) with You
It’s always about people as well as processes. To take your domain teams with you, you must show them the possibilities.
Data mesh is a journey, not an end. It’s a new way of thinking. When you start out with a clear vision, and a determination to create a data-driven culture and mindset, the results can start appearing very rapidly.
Paul Rankin continued, “Federating our approach across hundreds of developers in more than 20 data product teams initially was the challenge. We knew we’d need a powerful tool; the data is continuously changing, and the pipeline is continuously questioned. This could never have been done with a centralized approach. DataOps.live enables us to pull all this together in terms of orchestration, deployment, release management, and CI/CD, and to do it at scale. We’re talking about ROI in terms of saving thousands of hours and dollars in processing and developer time.”
Platform build commenced in early 2021: by summer 2022, more than 1,300 users, developers, and consumers were on board, including more than 40 data product teams. Two teams are onboarded each month.
- Average MVP time has fallen from six months to only 6-8 weeks
- The number of monthly releases has risen to 120, compared to one release every three months prior to data mesh and DataOps.live
- The DataOps.live platform has enabled the integration of more than 15 additional capabilities and partners into the Roche Diagnostics data ecosystem
DataOps.live: The Route to Data Mesh
DataOps is essential to create and to share those high-quality data products, at scale, in an efficient, agile and controlled manner. You need automation, and self-service, to empower data engineers and developers, and to put data products in the hands of the business. The environment has to be scalable as it moves from a relatively simple to a highly complex data mesh, enabling you to control and manage all the data you need and avoid the data chaos that might otherwise descend.
You need to be able to apply standardized, reliable, repeatable tools and processes and have all the lineage and auditability necessary for governance. And, ideally, have all this under one roof, managed through a single pane of glass. You need to show your domain teams the possibilities and work to instill a data-driven culture from the top down—creating an organization where your data consumers trust the data and products from the domain teams.
Is Data Mesh for everyone? It can depend on who you are, what you’re doing right now, and what you want to achieve with your data. And it is still a relatively novel concept. However, the benefits of DataOps are already very well known in bringing new levels of agility, automation, governance, and opportunity to manage your data and create data products. It means you can tap into and fully utilize the Snowflake Data Cloud, for example, and the access it provides to near-infinite amounts of data. And if you’re planning to grow your business and increase how you utilize your data, DataOps.live will accompany you on that journey and grow alongside you. So when it’s time to scale up and maybe follow the data mesh route, whenever that is, you’ll be ready.
Also, register for the #TrueDataOps Podcast, hosted by Kent 'The Data Warrior' Graziano.