Roche Diagnostics Case study | DataOps.live

Written by DataOps.live | Apr 24, 2025

The DataOps.live platform is helping data product teams in this global pharmaceutical giant to orchestrate and benefit from next-generation analytics on self-service data and analytics infrastructure consisting of Snowflake and other tools using data mesh approach.

AWS Solution focus:

For Roche PDIL, DataOps.Live was leveraged to construct orchestration pipelines for various workload use cases in Snowflake. These pipelines enable Roche to automate environment management within Snowflake, orchestrate third-party tools like Talend for data integration, and utilize data cataloguing tools such as Collibra. DataOps.live incorporates data quality checks and extracts metadata at each step to ensure observability and monitoring. All of this is achieved while maintaining an agile environment that allows developers to release frequently, often on a daily basis.

”You can’t carry on doing the same things and expect different results. We wanted to move the needle further on the dial and become a more agile data-driven business, which led to a pioneering data mesh and true DataOps approach as our way forward.“

Omar Khawaja
Global Head of BI @ Roche Diagnostics

To integrate multiple data sources, it was decided to utilize AWS S3 as a central data lake. S3 offers the advantage of effectively organizing data in hierarchical formats, which then allows DataOps.live to establish lifecycle policies for data retention management. Since the customer already had DataOps.Live runners deployed in their AWS account, DataOps.live also configured appropriate AWS IAM policies and roles to grant access to the S3 buckets used for staging purposes.

The pipeline that have been constructed orchestrates a Talend job that retrieves newly integrated data from S3 and ingests it into the Integration Schema within Snowflake. DataOps.live then implement data quality tests on both the source and transformed data as it moves between different schemas (e.g., Integration schema, Processed schema, and Reporting schema). Ultimately, the metadata generated throughout the various stages of this pipeline is extracted and compiled into a formatted document, which is subsequently published to Collibra for cataloguing purposes.

View full post