Roche Diagnostics case study PDF ->
For Roche PDIL, DataOps.Live was leveraged to construct orchestration pipelines for various workload use cases in Snowflake. These pipelines enable Roche to automate environment management within Snowflake, orchestrate third-party tools like Talend for data integration, and utilize data cataloguing tools such as Collibra. DataOps.live incorporates data quality checks and extracts metadata at each step to ensure observability and monitoring. All of this is achieved while maintaining an agile environment that allows developers to release frequently, often on a daily basis.
To integrate multiple data sources, it was decided to utilize AWS S3 as a central data lake. S3 offers the advantage of effectively organizing data in hierarchical formats, which then allows DataOps.live to establish lifecycle policies for data retention management. Since the customer already had DataOps.Live runners deployed in their AWS account, DataOps.live also configured appropriate AWS IAM policies and roles to grant access to the S3 buckets used for staging purposes.
The pipeline that have been constructed orchestrates a Talend job that retrieves newly integrated data from S3 and ingests it into the Integration Schema within Snowflake. DataOps.live then implement data quality tests on both the source and transformed data as it moves between different schemas (e.g., Integration schema, Processed schema, and Reporting schema). Ultimately, the metadata generated throughout the various stages of this pipeline is extracted and compiled into a formatted document, which is subsequently published to Collibra for cataloguing purposes.