Skip to content
DataOps.live Professional EditionNEW
Purpose-built environment for small data teams and dbt Core developers.
DataOps.live Enterprise Edition
DataOps.live is the leading provider of Snowflake environment management, end-to-end orchestration, CI/CD, automated testing & observability, and code management, wrapped in an elegant developer interface.
Spendview for Snowflake FREE

An inexpensive, quick and easy way to build beautiful responsive website pages without coding knowledge.


Pricing and Edition

See whats included in our Professional and Enterprise Editions.

Getting Started
Docs- New to DataOps.liveStart learning by doing. Create your first project and set up your DataOps execution environment.
Join the Community
Join the CommunityFind answers to your DataOps questions, collaborate with your peers, share your knowledge!
#TrueDataOps Podcast
#TrueDataOps PodcastWelcome to the #TrueDataOps podcast with your host Kent Graziano, The Data Warrior!
Academy
DataOps AcademyEnroll in the DataOps.live Academy to take advantage of training courses. These courses will help you make the most out of DataOps.live.
Resource Hub
On-Demand Resources: eBooks, White Papers, Videos, Webinars

Learning Resources
A collection of resources to support your learning journey.

Customer stories
Events
Connect with fellow professionals, expand your network, and gain knowledge from our esteemed product and industry experts.
#TrueDataOps.org
#TrueDataOps.Org#TrueDataOps is defined by seven key characteristics or pillars:
Blogs
Stay informed with the latest insights from the DataOps team and the vibrant DataOps Community through our engaging DataOps blog. Explore updates, news, and valuable content that keep you in the loop about the ever-evolving world of DataOps.
In The News

In The News

Stay up-to-date with the latest developments, press releases, and news.
About Us
About UsFounded in 2020 with a vision to enhance customer insights and value, our company has since developed technologies focused on DataOps.
Careers

Careers

Join the DataOps.live team today! We're looking for colleagues on our Sales, Marketing, Engineering, Product, and Support teams.
Doug 'The Data Guy' NeedhamNov 22, 2022 2:20:51 PM7 min read

Security Considerations in a DataOps World

In this blog we will introduce the ability of the DataOps.live platform to automatically deploy security roles following Snowflake best practices for separation of duties.  

Managing the security of your enrichment platform (Data Lake, Data Vault, Data Warehouse) is one of the key responsibilities of the Data Architect, Data Engineers, and Database Administrators.  

Every time an orchestration is run that ingests new data and updates or creates a Data Product the data is at risk of having some permission changes that could cost the enterprise money, good will, or even customers.  

Just do a Google search for Data Breach and you will see the various companies that have fallen victim of a mishap that released their customers data.  

Kevin Mitnick once said: “Social engineering is using manipulation, influence and deception to get a person, a trusted insider within an organization, to comply with a request, and the request is usually to release information or to perform some sort of action item that benefits that attacker. 

Any slight misalignment between the design of a security architecture for an enrichment platform and the actual implementation could put your data at risk.  

Using Snowflake as the database that lies at the heart of your enrichment platform is a fantastic way to ensure that your data is protected, available, and scalable. There are many components of security related to protecting a database. Network Policies, IAM roles, SSO integration, Multi Factor Authentication (MFA), and Key based authentication of users. In this article we will only be talking about the hierarchical implementation of roles with the database and giving users access to the correct role based on their business need to either read, ingest, or transform the data.  

The online documentation about Snowflake security is extensive. By default, a data architect has the following roles to start with: 

ACCOUNTADMIN  

Top-level role in the system. Owns the Snowflake account and can operate on all objects 

SECURITYADMIN  

Security Administrator: Can watch and manage users and roles. Can also change or revoke any grant 

SYSADMIN  

System Administrator: Has privileges to create warehouses, databases, and other objects, and grant those same privileges to custom roles 

USERADMIN  

User and Role Administrator: Can create users and roles in your account and grant those same privileges to other custom roles 

PUBLIC  

Pseudo-role that is automatically granted to every user and every role in your account 


These are the most fundamental functional roles provided by Snowflake. 
 

One of the recommendations from Snowflake is that following an approach of Role-based Access Architecture allows fine-grained least access. How does this work?  

A system role is defined for your overall Enrichment Platform. This role is a child role of the SYSADMIN role. In other words, someone using the SYSADMIN role can do all the things that the system role can do.  

Domain roles are defined to define the business function that a role is performing. For example: Data Loader, Data Scientist, Data Analyst, Reporting.  

Functional roles act to tie things together. The access roles defined next are assigned to the functional roles. The Functional roles are the children of the domain roles to segregate functionality, and the users are assigned to functional roles (Data Scientist, Business Intelligence User, Data Engineer) depending on their business need.  

Database Objects are assigned access roles with certain privileges: Select, Insert, Update, etc.... 

This starts to sound complicated, and it could be. This diagram stands for how these roles should be implemented. 

Security-hierarchy-blog-diagram


With DataOps.live these roles can be defined in configuration files. Every time the pipeline file runs an orchestration the part of the DataOps platform called the Snowflake Object Lifecycle Engine (SOLE) will compare what is implemented in the database it is working with (Production, Development, Quality Assurance, or Feature Branch databases) to the configuration files to decide if there are any changes that need to be made. SOLE treats the configuration file as the source of truth.
 

If a bad actor has created a new role and granted that role permission to read from some critical tables, SOLE will see that this ROLE is not defined in the configuration file. It will remove that grant and restore the grants back to the way the configuration file specified. 

The details of how SOLE does this are beyond the scope of this article, by we will talk about the configuration files.  

Before jumping into what a configuration file looks like, there is a quick aside we must cover. The DataOps.live platform includes support for the Jinja template language. Jinja is a text-based template language that can be used to generate the text within the configuration files. These files in the DataOps platform have the middle work template in them (e.g., databases.template.yml, roles.template.yml)  

There are multiple ways to set up the configuration files.  

They can be hardcoded every time they are needed.  For more info on in our documentation, see hardcoded grants.

 hardcoded grants 


They can be set as variables to use throughout the configuration files so that the roles are only defined in a unique location. 
 

 hardcoded grants

 
They can be set as dynamically created variables and generated at orchestration time-based on human readable configuration files. (Writing code that writes code).

 hardcoded grants

 
This part of the Jinja code will read from three files (working_schemas.yml, access_roles_reader.yml, and access_roles_writer.yml) 

working_schemas.yml  

 working_schemas.yml  


access_roles_reader.yml 
 

 access_roles_reader.yml  


access_roles_writer.yml 
 

 access_roles_writer.yml  


Since the space for this jinja code is limited, I will walk through it. 
 

First, we have a loop that creates variables for every schema defined in working_schemas.yml. We can look at just one above in the example showing how to set a variable to use such that the roles are only defined in a unique location (dflt_schema_grants_DEMO). 

Second, we set the variable up to apply the permission USAGE of the schema to all the reader roles in the access_roles_reader.yml file.  

Third, we set up MODIFY of the schema to all the writer roles in the access_roles_writer.yml file. 

Finally for the schema variable, we iterate through the CREATE FUNCTION, CREATE VIEW, and CREATE TABLE permissions for the writer roles in the access_roles_writer.yml file.  

The next variable we are setting (dflt_table_grants_DEMO) can be applied to all the tables in the DEMO schema.  

Select is granted to the reader roles.  

Insert, Update, Delete, and Truncate are all granted to the writer roles.  

What is the advantage to this approach?  

An Enrichment Platform no matter your choice of implementation will continue to grow. New sources, new data products, new users, new questions, and new use-cases will come up. As this grows just add the new schemas to the working_schemas.yml file. If there are new reader access roles or writer access roles, these can be set up in the basic yaml files that specify reader or writer access roles.  

One other step in this process is to define the roles and assign the access roles to the functional roles. The following bit of jinja iterates through the defined variable files and builds out the hierarchy by creating the child role, then assigning the child role to the parent role.  

roles.template.yml  

roles.template.yml  


Again, using simple configuration files as input to this process allows all the required grants to be generated based on definitions of what roles need to be defined and used in the right places. 
 

We covered a few ways to set up the configuration files for SOLE to implement the security hierarchy designed by the Architects. Hardcoding values, creating variables to apply to various classes of objects within Snowflake, and finally dynamically generating the configurations using Jinja and some variable files. Regardless of how the configurations are set up SOLE will take an observation of what is implemented in the database and compare that to what the configuration file says. If there are any differences between what is observed and what is specified, SOLE will conform the Snowflake Security hierarchy to what is specified.  

SOLE can do this every time it runs.  

Verifying the security that is deployed matches what is designed and stored in the platform git repository. Every time.  

This is Automation at its finest. Wouldn’t you sleep better knowing this automatically verifies security at every run?  

There are a few other pieces to this set up that I have not covered here, like assigning users to roles, that topic will have to come later. Rest assured it is all done via configuration files, jinja scripts and SOLE doing the implementation.  

There are many more things that can be done with this jinja capability that will enable you to be able to turn requests around and deploy things to production faster than ever before.  

If you want to learn more about how DataOps.live can enable the automation of your Snowflake environment, check out our pioneering approach to DataOps and sign up for a free 14-day trial of DataOps.live. Also don't miss us at Snowflake BUILDThe Data Cloud Dev Summit where DataOps.live will have a session on Wednesday, November 16th at 3:30pm EST! Click here to register.


How can we help you? 
 

  

avatar

Doug 'The Data Guy' Needham

“The Data Guy” Needham started his career as a Marine Database Administrator supporting operational systems that spanned the globe in support of the Marine Corps missions. Since then, Doug has worked as a consultant, data engineer, and data architect for enterprises of all sizes. He is currently working as a data scientist tinkering with graphs and enrichment platforms – showing others how to get more meaning from data.

RELATED ARTICLES