Should you include DataOps.live when evaluating Snowflake? Is it a platform you could benefit from? At what point does it make sense to take a serious look at DataOps.live?
Since you’re reading this on the DataOps.live blog it should be no surprise that our answer is an emphatic YES. Wherever you are, in terms of Snowflake status and your journey to a data-driven culture, we believe DataOps.live can help you to achieve your data-driven goals in faster and smarter ways. However, we can also provide you with details and proof points to back this up and look at some of the questions and concerns that can arise. Evaluating a Cloud Data Platform is no trivial matter but by using DataOps.live, many steps can be automated and streamlined giving you the ability to really focus on Snowflake’s functional capabilities. So let’s explore this topic in some detail.
It’s probably worth having a quick recap of the seven key pillars of the #TrueDataOps philosophy:
- Collaboration & self-service with all the governance and control you need
- Automated testing & monitoring—test-driven data development, regression testing and ongoing monitoring
- ELT & the spirit of ELT—lift & shift raw source data quickly and building for an as-yet unknown future
- Governance, security & change control—anonymization, approvals, auditability and more (*especially valuable for different locations/geographies, and heavily regulated industries)
- Agility via orchestrated pipelines and CI/CD (Continuous Integration/Continuous Deployment)
- Component design & maintainability—small atomic pieces of reusable code and configuration
- Environment management—branching data environments like we branch code
A good question to ask, if you’re starting out or developing your data environment, is do I recognize these as issues in my own organization? These pillars were echoed in a recent blog by a DataOps.live cloud and DataOps engineer. When asked what the key imperatives for data engineering are, he said to automate as much as possible, to test (and test), have the ability to store configuration as code in source control—and to deliver what the customer wants. That last one is key.
Any implementation that lacks any one or more of the seven pillars of #TrueDataOps cannot really call itself, #TrueDataOps. It will most likely be just another variation of DevOps with some data management concepts bolted on top. Which brings us to…
Build vs. Buy?
Modern data environments have a great many moving parts, typically including tools from multiple vendors. Today, Snowflake’s multi-cluster shared data architecture is increasingly used as a core solution, “powering the data cloud”. Of course, it’s possible to build out the environment piece-by-piece, using different tools for ETL, CI/CD, governance, and so on. The problem is that risks and inefficiencies can quickly emerge. The other option is buying a flexible ready-made solution, customizable, that enables orchestration, automation, and governance though “a single pane of glass”.
Business teams always want more, which can mean constant changes and updates for the data environment. Data teams have tried to solve this by building ever-more complex environments using open-source tools. The problem is, such environments need a high degree of manual intervention (and therefore highly skilled engineers) and rely on database change management tools that break easily, orchestration tools that don’t properly understand data, and environment management tools designed for cloud infrastructure that are difficult to configure for data. Manual coding is needed to orchestrate all these tools and stitch the patchwork together. Problems continue with a lack of enterprise support, security issues such as integration with secrets management, and problems rolling out to users beyond the core team. And someone, or more likely a team of people, have to keep it all running.
The alternative? A single platform optimized for all facets of #TrueDataOps: agility and responsiveness without compromising on security and governance. A platform that empowers people to create, test, share and release greater numbers of higher quality data products at a more rapid rate. You can build and rebuild your data platform in minutes from your Git ‘single source of truth’ repository. You can clone complete data environments in seconds and create feature branches of the code and data platform for every new feature that a stakeholder wants. You reduce errors through automated testing, increase productivity for teams and individuals, and track every change ever made.
The are all important elements to keep in mind when deciding whether or not it’s worth evaluating Snowflake with DataOps.live.
As Kent puts it, “Do you want your experts to spend all their time building out scripts and processes and checklists, and teaching others how to do that? Or would you rather have a solution with all that built-in so your people can focus on creating new value?”
Snowflake user or new to Snowflake?
If you’re already using Snowflake, and want to grow your capabilities to do even more, you should be looking at DataOps.live. It’s a pathway to scale fast and to do more, manage more, and create higher quality data products faster. Perhaps it’s worth doing that deep dive, to evaluate DataOps.live in the context of what you want to achieve in your own data environment?
Roche Diagnostics had a clear appetite for growth. With huge volumes of data available internally and externally, the Roche data ecosystem had Snowflake as its core data platform and used tools such as Talend, Alteryx and Collibra. But there was no effective way to orchestrate activity across all of these and automate the creation and lifecycle management of Snowflake environments. The answer was a domain-driven data mesh/data vault implementation using DataOps.live combined with Snowflake. This secure platform for orchestration, lifecycle and release management supports the set-up and enforcement of policies, meaning teams can be autonomous and innovative while also adhering to governance and security in data access, residency, encryption and more.
Omar Khawaja, Roche Diagnostics’ Global Head of Business Intelligence, said that “DataOps.live is exploiting every functionality Snowflake provides, bringing us the true DataOps practices we need. It enables our teams to create the data products we require, using all governance best practices like code check-in and check-outs, and allows multiple data engineers to work concurrently in the same team without creating a bottleneck or interfering with each other.”
So, what if you’re brand new to Snowflake? If your ambitions with your data (and therefore your business) already extend to using Snowflake, and if you already employ even just one data engineer, then you will benefit from DataOps.live. The important thing is, when you are ready to scale your operations big-time, you can. You won’t be restrained or limited in what you, and your rapidly rising headcount, can achieve using #TrueDataOps.
A ‘build from the ground up approach’ was taken by innovative satellite communications provider OneWeb. The company now has a distinctive architecture powered by DataOps.live, data.world and Snowflake to drive new business value from huge data volumes across highly complex operations. “We started with a vision for a totally new data-driven operating model. A vision of self-service democratized data,” says David Bath, VP Digital Products, OneWeb. “We wanted strong governance without stifling the creativity of individual engineers.” He says DataOps.live and Snowflake took the company “from vision to production in a frankly terrifyingly short time.”
With data discoverability for all users, the volumes involved are incredible. The environment ingests 55 billion rows every day, with 8.8. trillion rows in a single table, managing 1,000 trillion rows of data. With data quality business critical, DataOps.live enables more than 200 automated tests per data source. Previously, it would take several weeks to access all data: it now takes 20 seconds. Time to share data with a distribution partner fell from several weeks to two hours. It now takes two hours to create a dashboard compared to two weeks. The number of dashboards increased from five in August 2021 to 700+ a year later. It takes one day to analyze one billion records. The number of use cases increased from two to 14 per month. A 625% data increase is now anticipated in the next 12 months.
It’s important to understand the role of security—particularly how it’s implemented with respect to Snowflake.
- In Snowflake, all data is encrypted and stored. Snowflake’s offers additional security capabilities including analytics to accelerate threat detection and response.
- Snowflake features such as Dynamic Data Masking and Row Access Policies can be setup, deployed, monitored, and governed from inside DataOps.live
- The DataOps.live platform has received SOC 2 certification and is subject to regular Penetration Testing by external security experts
- Certification & Compliance: the right processes cover our Data Center Operations e.g., AWS, SOC 2, ISO 27001) and for data protection, data processing and data integrity (e.g. GDPR, CCPA, ISO 2707 for information security, cybersecurity and privacy protection. We also enable compliance with regulatory mandates, and international and regional standards (ISO, GDPR, PCI DSS, HIPAA, CCPA.)
- Secrets and Credentials Management—unlike other solutions, you retain control of your credentials: they are not stored by DataOps.live. Authentication details are requested, and you grant permission. This approach can mean opportunities for additional security, such as rotating keys, as you work towards Zero Trust. (User Access Control/Two Factor Authentication).
- User Restrictions: Role Based Access Control (RBAC)
- DataOps.live supports Single Sign-On (SSO) authentication: standard protocols SAML 2.0 and OpenID Connect (OIDC), Google Workspace, Microsoft Azure Active Directory, Active Directory Federation Services (ADFS), Ping Federate, and other SSO providers
- DataOps.live supports Transport Layer Security (TLS) protocol versions 1.2 and 1.3. Older insecure versions are disabled
Avoiding Technical Debt
So, when exactly should an organization look to include DataOps.live in its Snowflake evaluation? As soon as you can, most probably, depending on the scale and speed of benefits that you want to see.
You may have heard of the concept of technical debt used primarily in software development. Also known as tech debt or code debt, it describes the costs of additional rework when a development team chooses a more limited and perhaps “easier” option to achieve their objectives. In short, the sooner we can get the right (optimal) solutions in place, the less technical debt we’re going to face. A debt that, of course, has a direct correlation to time and money. Such a debt can be inadvertent. But it can also be avoided.
Part of “getting it right” is giving due consideration to why we’re evaluating Snowflake and/or DataOps.live in the first place; what do we hope to achieve? Are we evaluating on the basis of gaining new data efficiencies, say, or cost reductions, simplification, gaining greater agility, or new levels of flexibility? DataOps.live can help in all of those areas.
And more specifically, if you’re exploring issues such as Continuous Integration/Continuous Deployment (CI/CD), change control, or governance, then you need to be looking at the DataOps.live platform. Or governance and control, if you’re operating in a heavily regulated sector. And remember, considerations such as these should not be allowed to slow your move to Snowflake and the benefits it can bring to you. Evaluation of DataOps.live should run in parallel with your Snowflake evaluation. They are entirely complementary. DataOps.live helps you to do more from the get-go, to go further, faster. If you’re already up and running with Snowflake, great: bring DataOps.live in too, accelerate your first pilot project, and see what you can do then. The proof is in the pudding, so to speak.
As a DataOps.live developer wrote recently, “I can’t think of any scenario where you wouldn’t want to use it, even smaller businesses if they have ambitions to grow… if you’re using Snowflake and have enough data to warrant a data engineer, you’ll want to look at DataOps.live.” The key issue, he said, was that DataOps.live enables you to focus on the data. “You’re not worrying if this change or that one affects production, or what else might happen. The risk is removed. It’s an end-to-end platform, so you can see what comes in one end, and what goes out the other—all fully observable, all fully tested. There are so many utilities and helpers, removing manual steps, that data engineers save significant time and effort. You gain the visibility and control you simply wouldn’t have with a ‘build your own’ mix of separate tools. Yet you still have the freedom and flexibility to incorporate specialized tools.”
How Do You Evaluate Snowflake?
The standards by which you judge something should be defined prior to testing. As with any type of experiment, start with a list. Here are a few things you may want to test.
- Verify the security hierarchy can be implemented.
- Ensure multiple environments can be created and separated
- Verify that data can be ingested easily
- Test that Snowflake can handle a high workload to your satisfaction
- Run these tests multiple times and review the timing of all experiments
We discussed security matters in the blog Security considerations in a DataOps World, so we won’t cover all the details here. Just giving a light touch on this you can define all your security hierarchy within configuration files in the DataOps.live Data Platform. Once this is done, you can duplicate it or delete all the roles then recreate them. All the while keeping track of how long the implementation takes.
For multiple environments, one of the most important default capabilities in the DataOps.live Data Platform is the ability to create multiple environments within Snowflake based on branches defined within the repository. In this blog Make Data Engineering Repeatable! Mincho describes how the automated, Idempotent nature of the projects within the Data Platform makes it simple and straightforward to manage. Create as many database environments as necessary for your testing. Each environment can be worked on independently, then when it is time to bring things together a merge of the code will deploy changes up the hierarchy of branches.
Data Ingestion speed is a requirement in every organization we’ve ever worked with. Our Data Platform can run the same ingestion multiple times. Or it could run multiple types of ingestions into the various environments created above.
As for workload, depending on how you set your project up, there are some configuration options that allow you to dynamically create many sessions connected to Snowflake. These configuration options can be changed, committed to the branch, then run to see the outcome of having multiple sessions running simultaneously against Snowflake.
The logging of all these operations is built into the platform. Evaluating Snowflake becomes as simple as modifying a few configuration files. There is no need to build your own testing and logging platform to verify that Snowflake meets all the needs you may have.
This Blog post is intended to get the discussion going. On February 22nd, we’ll be presenting a webinar with a Panel Discussion hosted by Kent Graziano on this very topic. You’ll hear from practitioners, thought leaders and DataOps.live experts—it’ll be your chance to hear these perspectives and get YOUR questions answered! Click here to register for the Webinar.
Ready to get started? Sign up for your free 14-day trial of DataOps.Live on Snowflake Partner Connect today. We’re also excited about our Podcast Series hosted by Kent Graziano, The Data Warrior. You can also download your free copy of DataOps for Dummies.