The challenges posed by the ever-increasing and ever-pervasive nature of data, as well as the increase in prominence of the cloud-based data storage paradigm, are driving the mandate to control access to enterprise data, namely, sensitive, regulated, and Personally Identifiable Information (PII).
Not only is it critical to prevent data from being stolen by hackers and sold on the dark web, but regulations such as GDPR and CCPA mandate the control of access to data by unauthorized persons; thereby, making the implementation of enterprise data access control policies imperative to fulfill the requisite regulatory requirements designed to protect organizational data.
The world of DataOps and #TrueDataOps
Before expanding on this discussion by describing the adoption and implementation of data access controls, let’s dive into the world of DataOps and #TrueDataOps.
DataOps for Snowflake is a DataOps platform that, according to our book, “DataOps for Dummies,” “started with a clear vision to build data products and data applications the same way the world builds software products.” We first focused on developing the #TrueDataOps philosophy and its multi-pillared framework as part of the initial DataOps platform development process.
Why?
We believed (and still believe) that we needed to find answers to the questions posed by the shortcomings in the way data systems were being built and managed without agile DevOps based processes at the core. .
As described in our book, #TrueDataOps originated as a philosophy based on the battle-hardened principles and practices of true DevOps CICD processes while adding in the necessary changes to handle the nuances of data applications and the gigabytes, terabytes or petabytes they hold. We aimed to provide data teams within enterprise organizations with the ability to work efficiently with Big Data while solving the agility versus data governance conundrum and delivering data products by accelerating the time-to-value for business users and stakeholders.
Fast forward to 2022, and the DataOps for Snowflake platform has evolved into the de facto, end-to-end data automation, orchestration and observability platform, orchestrating individual data pipeline components like:
- Matillion ETL to ingest data into Snowflake from multiple, different data sources
- Soda SQL, a data testing, monitoring, and profiling tool for data-intensive environments
- data.world for One-Click data cataloging for Snowflake
- Okera’s Universal Data Authorization Platform (ODAP) for data access policy and security enforcement and management
And lifecycle managing related data application code like Java, Scala and Python that can be run inside the Snowflake Data Cloud in Snowpark.
Describing, adopting, and implementing data access controls
Now that we have gained insight into (and understand) DataOps for Snowflake and #TrueDataOps, let’s dive straight into “everything” data access control.
1. What is data access control?
To best understand data access control, it is necessary first to define the access control principle and then look at the specifics of controlling access to data assets or data products.
Techtarget.com defines access control as a “security technique that regulates who or what can view or use resources in a computing environment.” Data access control theory notes that its core function is to mitigate and minimize risk to the organization.
Data access control is specific to data or data protection and security and forms a critical component of data security.
2. Types of access control
Four types of access control can be applied to data.
Note: The first two types (MAC and DAC) are outdated and are no longer used. RBAC is the most popular, while ABAC is the newest access control type.
Mandatory Access Control (MAC)
MAC is a security model where data access rights are regulated by a central authority based on many different levels of security. In other words, it is a security strategy that restricts the ability individual data owners have to grant or deny access to their data.
Discretionary Access Control (DAC)
DAC is a direct contrast to MAC. Where MAC is the highest level of access control, DAC functions at a much lower or more granular level. It allows individual data owners to draw up and implement their own security policies and assign security controls to data assets and products.
Role-Based Access Control (RBAC)
RBAC is the most widely used data access control model. In summary, it defines roles based on the individual requirements of users and stakeholders who need access to the data. RBAC also uses the principle of least privilege based on the individual’s role to grant access rights to the organization’s data.
Attribute-Based Access Control (ABAC)
ABAC is a dynamic method of data access control. Resources such as data products are assigned a series of attributes. And users are granted access to a data asset based on these attributes when considered in relation to a set of rules, policies, and relationships.
3. Why is data access control essential?
Data access control must form a critical part of any organization’s data ecosystem to ensure that enterprise (or organizational) data does not end up in the wrong hands intentionally or even unintentionally and to meet regulatory requirements such as those set out by regional, state, and even federal governments.
Why?
Most organizations store the following data:
- Personal and sensitive data about themselves, their customers, and their employees.
- Documents containing classified information, and much more.
Therefore, this data must be protected to prevent data leaks. Implementing a robust, agile access control system is the solution to the challenge of ensuring that company and client data isn’t stolen via hacking from the outside in or leaked from the inside out.
4. Authentication versus authorization
At the outset of this section, it is vital to note that both authentication and authorization are integral cogs in the data access control paradigm. The one cannot exist without the other; otherwise, data access control policies and strategies will not be successful.
While authentication and authorization are used interchangeably, they are not the same. They are separate processes that, when combined, are used to protect an organization’s data from falling into the wrong hands.
Therefore, the question that begs is, what is the difference between authentication and authorization?
In summary, authentication, which comes before authorization, is verifying that users are who they say they are.
Once the system has determined that users are who they represent themselves to be, authorization then comes into play. It grants users specific permissions to access different levels of data and perform specific functions, depending on the attributes or roles given to each user as well as the data.
5. Types of unauthorized access
Before we look at the different types of unauthorized access, let’s consider what unauthorized access is:
In summary, the term “unauthorized access” describes illegal access to data or a data product.
It stands to reason that there are multiple ways for a user to gain access to data products or resources without the necessary authorization. Here are several examples:
- Without the necessary authorization, a developer needing to copy data from a table uses credentials from the application being developed to retrieve the data.
- A data analyst generates a report by extracting data from tables where the proper access controls have not been set.
- An external security researcher connects to an organization’s cloud data lake incorrectly configured to allow direct access from external networks and sensitive, confidential data.
- Malicious software finds a security vulnerability in the organization’s data cloud to bypass the authentication and authorization protocols to access and extract the data.
6. Protecting data from unauthorized accesS
There are a few fundamental principles to consider when designing a data access control model:
- Use strong authorization for any access to the organization’s data.
- Use the ABAC model, where dynamic attribute-based access is based on a set of security rules, policies, and relationships between the data product and the user.
- Implement robust patching and configuration management protocols to ensure that authentication and authorization processes are always enforced and cannot be bypassed.
7. The challenges of data access control
While every organization is different and has its own data access philosophy and framework, there are a universal set of challenges that need to be faced and overcome; otherwise, the organization’s data-driven strategies will falter.
Based on years of experience solving common data problems that plague most organizations, our firm belief is that adopting the #TrueDataOps philosophy within our DataOps for Snowflake data orchestration platform is the answer to implementing a robust data-driven strategy to derive maximum value from enterprise data.
Implementing data access controls in a DataOps world
This article’s focus is the criticality of securing enterprise data for multiple reasons. In other words, the WHAT and WHY. Let’s now turn to the HOW: How do you implement a robust, reliable, and comprehensive data access control framework that ticks all the boxes in terms of data security in an environment where change is a constant, and the challenges of data security are increasing exponentially?
As part of the solution to these challenges, we have partnered with Okera to implement an attribute-based, granular data access control solution. While their data access management platform has multiple functions, our focal point is Okera’s Data Access Control (ODAC) features.
Okera provides end-users and business stakeholders, including data stewards, data engineers, application developers, and data analysts and scientists, with the authorization to access enterprise data appropriate to their role while remaining true to security and regulatory-based access controls or limitations. In other words, it expands and simplifies data access without exposing sensitive data or PII to unauthorized users.
Okera gives data owners (and system administrators) the ability to create and apply tags to data attributes, such as a tagging dataset containing sales data as a “Sales” dataset. To achieve this, you need to go to the ODAC management GUI, create a tag, and apply the necessary masking policies to the tag. However, this is a manual process, and it is time-consuming to create, update, and monitor these attributes, tags, and masking policies.
It stands to reason that this is scenario is untenable over time for all data ecosystems, especially in pervasive, complex, technical data. Therefore, the question that must be asked and answered is how to automate this process.
Our Snowflake Object Lifecycle Engine (SOLE) automates this process, giving robustness, auditability, and governance to data access control and policy management.
How?
The data access control tags and policies are defined and linked to attributes in the core YAML file which SOLE uses to build and rebuild data platform environments. These are just additional elements in line with normal definitions of schemas, tables views, attributes, etc. Therefore, SOLE will analyze these and revert any changes made that are not specified in the YAML file, preventing intentional or unintentional unauthorized data access.
Here is a simplified overview of how ODAC manages attribute tags. This is the logic that we add to SOLE’s YAML definition files. Therefore, you can grant CREATE, ADD_ATTRIBUTE, or ALL permissions on attributes:
- CREATE – users are only able to create attributes within a specific attribute namespace.
- ADD_ATTRIBUTE – users can assign attributes inside the specific attribute namespace, only on data they have permissions to assign.
- ALL – users can create, drop, and assign attributes within the specific attribute namespace.
For instance, if you want to give access to a user role to create, drop, and assign attributes from a specific attribute namespace, you need to use the following logic:
GRANT ALL on ATTRIBUTE NAMESPACE sales to ROLE product_owner_sales;
Note: Attributes are objects, and access to attributes is controlled like all other objects in the Snowflake data cloud.
To assign tables to a data cloud object, such as a database table, dataset, column, or field, you need to have the following permissions:
ADD_ATTRIBUTE, REMOVE_ATTRIBUTE, or ALL permissions for the data to which you need to assign tags.
ADD_ATTRIBUTE or ALL for the attribute namespace you need to assign tags to.
For instance, here is how to set up a data product owner to assign tags from the sales attribute namespace on data inside the sales database.
GRANT ALL on ATTRIBUTE NAMESPACE sales to ROLE product_owner_sales;
GRANT ALL on DATABASE sales TO ROLE product_owner_sales;
The following example describes a more granular way to achieve the same use case. The only difference is that we do not grant the product owner ALL access to the sales database because of the principle of least privilege. We only need to grant permissions to add and remove data attributes:
GRANT ALL on ATTRIBUTE NAMESPACE sales to ROLE product_owner_sales;
GRANT ADD_ATTRIBUTE on TABLE sales.invoice_header to ROLE product_owner_sales;
GRANT REMOVE_ATTRIBUTE on TABLE sales.invoice_header to ROLE product_owner_sales;
Conclusion
Implementing a robust, all-encompassing, sustainable data access control is mandatory for all organizations who aspire to become a data-driven enterprise. DataOps for Snowflake, together with Okera, provide the answer to robust data access controls and policy enforcement.
Ready to get started?
Sign up for your free 14 day trial of DataOps.Live on Snowflake Partner Connect today!