The DataOps.live platform is helping data product teams in this global pharmaceutical giant to orchestrate and benefit from next-generation analytics on self-service data and analytics infrastructure consisting of Snowflake and other tools using data mesh approach.
Roche Diagnostics case study PDF ->
Customer
Roche Diagnostics, part of multinational healthcare company Roche (F. Hoffmann-La Roche AG).
Requirement
Improved data management and analytics to empower teams and drive the company's purpose—Doing now what patients need next.
Solution
DataOps.live platform enables a key capability for the self-service data and analytics infrastructure as part of the data mesh implementation, integrating Snowflake and other tools in a powerful new true DataOps approach.
Results
An agile data-driven business; orchestration and automation so data products teams can adapt to changing needs faster while maintaining a strict data governance and security regimen; ROI can be measured in thousands of hours and dollars.
AWS Solution focus:
For Roche PDIL, DataOps.Live was leveraged to construct orchestration pipelines for various workload use cases in Snowflake. These pipelines enable Roche to automate environment management within Snowflake, orchestrate third-party tools like Talend for data integration, and utilize data cataloguing tools such as Collibra. DataOps.live incorporates data quality checks and extracts metadata at each step to ensure observability and monitoring. All of this is achieved while maintaining an agile environment that allows developers to release frequently, often on a daily basis.
”You can’t carry on doing the same things and expect different results. We wanted to move the needle further on the dial and become a more agile data-driven business, which led to a pioneering data mesh and true DataOps approach as our way forward.“
Global Head of BI @ Roche Diagnostics
To integrate multiple data sources, it was decided to utilize AWS S3 as a central data lake. S3 offers the advantage of effectively organizing data in hierarchical formats, which then allows DataOps.live to establish lifecycle policies for data retention management. Since the customer already had DataOps.Live runners deployed in their AWS account, DataOps.live also configured appropriate AWS IAM policies and roles to grant access to the S3 buckets used for staging purposes.
The pipeline that have been constructed orchestrates a Talend job that retrieves newly integrated data from S3 and ingests it into the Integration Schema within Snowflake. DataOps.live then implement data quality tests on both the source and transformed data as it moves between different schemas (e.g., Integration schema, Processed schema, and Reporting schema). Ultimately, the metadata generated throughout the various stages of this pipeline is extracted and compiled into a formatted document, which is subsequently published to Collibra for cataloguing purposes.