Community blog | DataOps.live

FUTURE PERMISSION GRANTs: why they are very useful and why you shouldn't use them in a DataOps world

Written by DataOps.live | Apr 29, 2021 8:12:48 PM

FUTURE GRANTS, ALL TABLES in the Snowflake Data Cloud (and similar constructs in other databases) are necessary and powerful tools for manually administered databases.  However, they have significant downsides in terms of flexibility, auditability, potential information bleeds, Principle of Least Privilege etc. In a DataOps approach all the same convenience is possible, but all of these limitations are addressed.

Most databases have concepts of GRANTing permissions to all existing objects in a container (Database, Schema etc). In a system like MySQL this can be done with wildcard GRANTS. These wildcards apply to all current, and all future objects that match the rules.

In Snowflake this is split into two parts, GRANT [permissions] on ALL TABLES/VIEWS in [container] for all objects that are currently there and grant [permissions] on future tables in [container]for all objects that don’t currently exist but will be added in the future.

In both cases there is similar principle i.e., “Set some permissions now that will apply to objects in the future”. Why is this so important? Pre-DataOps approaches virtually all databases are configured by hand, meaning that the action of creating a table/view is a manual one and therefore the process of granting permissions to see it would be similarly manual. This creates a high administrative burden on people, and a significant chance of permissions being forgotten. By essentially preloading these permissions, users will get to see new objects without the person creating them having to remember to grant explicit permissions i.e., these permissions are implicit permissions.

However, there are a few problems with this approach which make it incompatible with TrueDataOps principles.

Firstly, it very often violates the Principle of Least Privilegewe aren’t always granting just to those who need it, we’d often be granting to a wider group to ensure that at least those who need permissions get them.

Secondly, there are a number of activities that can apply to an object which mean that it may not be ‘safe’ through its entire life. Consider the very powerful Snowflake feature of Dynamic Masking. This is applied after a table/view is created. In the case of a VIEW, or a TABLE created via CTAS, a user given access via FUTURE GRANTS would have access to see this VIEW/TABLE the moment it was created, before the Dynamic Masking had been applied. Hopefully, this is done soon after, but from a security perspective, even a short period where users that should only see masked data see the raw data is totally unacceptable.

Thirdly, from an auditing and governance perspective, an implicit approach makes it very hard to determine/prove “who actually got to see what, when”with an explicit approach, managed under TrueDataOps principles, it’s far clearer to see who was allow to see what, and when these were actually applied.

Finally, a FUTURE GRANTs approach locks you into a very rigid structure for the future. For example, it becomes extremely hard to say “Actually I don’t want these specific users to have access to those view tables and views, but otherwise keep everything as it is”. When you need to move to a granularity below the schema level, it’s very hard to do.

Given our understanding of how important these approaches are in a manually administered database world, but the challenges they create, how can we solve this in a DataOps world?

In a DataOps world, all GRANTs and permission must be stored in the central source of truth and applied to the target platform via automated DataOps pipelines. This:

  • allows testing of GRANTs and permissions to be done in branches other than production
  • removes the possibility of manual mistakes. If the configuration has been changed in a development branch, peer reviewed and tested outside of production, we can be extremely confidence that, once in production, it will perform correctly. Given the importance of GRANTs and permissions we can be extremely confidence that, once in production, it will perform correctly. Given the importance of GRANTs and permissions (and the potential catastrophic impacts of them being misapplied due to human error) this is one of the most important elements that should be automated in a DataOps approach.

Furthermore, applying fine grained, explicit permissions to objects, doesn’t have to mean that there is any additional burden to users. For example, we can still define a high-level intent e.g.:

[my_database]:

[my_schema]:

apply_grants(role='ANALYST_RO_ROLE' privileges='SELECT')

Is still saying that I want my ANALYST_RO_ROLE to have SELECT permissions on everything in my_database.my_schema, but the pipeline execution will convert this to individual GRANT statements that can audited later. This also allows me to be more selective only when I need to be e.g.

[my_database]:

[my_schema]:

apply_grants(role='ANALYST_RO_ROLE' privileges='SELECT')

[my_table]:

apply_grants(role='FINANCE_RO_ROLE' privileges='SELECT')

Still keeps the broad and generalised definition for the ANALYST_RO_ROLE, but adds a much more fine grained permission for the FINANCE_RO_ROLE.

To address the information bleed potential race condition, in a DataOps pipeline you would ensure that the GRANTs were only applied AFTER the TABLE/VIEW was in it’s secured state. By performing explicit GRANTs, we can choose when to do these, as opposed to implicit GRANTs which apply whether we are ready or not.