Skip to content
DataOps.live Professional EditionNEW
Purpose-built environment for small data teams and dbt Core developers.
DataOps.live Enterprise Edition
DataOps.live is the leading provider of Snowflake environment management, end-to-end orchestration, CI/CD, automated testing & observability, and code management, wrapped in an elegant developer interface.
Spendview for Snowflake FREE

An inexpensive, quick and easy way to build beautiful responsive website pages without coding knowledge.


Pricing and Edition

See whats included in our Professional and Enterprise Editions.

Getting Started
Docs- New to DataOps.liveStart learning by doing. Create your first project and set up your DataOps execution environment.
Join the Community
Join the CommunityFind answers to your DataOps questions, collaborate with your peers, share your knowledge!
#TrueDataOps Podcast
#TrueDataOps PodcastWelcome to the #TrueDataOps podcast with your host Kent Graziano, The Data Warrior!
Academy
DataOps AcademyEnroll in the DataOps.live Academy to take advantage of training courses. These courses will help you make the most out of DataOps.live.
Resource Hub
On-Demand Resources: eBooks, White Papers, Videos, Webinars

Learning Resources
A collection of resources to support your learning journey.

Customer stories
Events
Connect with fellow professionals, expand your network, and gain knowledge from our esteemed product and industry experts.
#TrueDataOps.org
#TrueDataOps.Org#TrueDataOps is defined by seven key characteristics or pillars:
Blogs
Stay informed with the latest insights from the DataOps team and the vibrant DataOps Community through our engaging DataOps blog. Explore updates, news, and valuable content that keep you in the loop about the ever-evolving world of DataOps.
In The News

In The News

Stay up-to-date with the latest developments, press releases, and news.
About Us
About UsFounded in 2020 with a vision to enhance customer insights and value, our company has since developed technologies focused on DataOps.
Careers

Careers

Join the DataOps.live team today! We're looking for colleagues on our Sales, Marketing, Engineering, Product, and Support teams.
DataOps.liveJan 11, 2021 6:19:00 PM6 min read

PART 2: The Challenges of Repeatable and Idempotent Schema Management: Idempotence in Snowflake

It may appear that most of this should be possible with native SQL statements and indeed some DDL operations in Snowflake are naturally idempotent, whereas others have impacts on data and object state, and some are not idempotent at all. Let’s look at some of the ways SQL tries to help us with this and the problems that remain.


Object Creation

When executing an operation to create an object in Snowflake, and an object with the same name already exists, many DDL operations can be instructed either to gracefully cancel the statement (IF NOT EXISTS) or drop the existing object first (OR REPLACE).

If Not Exists

CREATE <OBJECT_TYPE> [IF NOT EXISTS] <OBJECT NAME> ...

The IF NOT EXISTS clause in many create statements will cause the database to check for the presence of an object with the name provided, and will halt execution of the statement gracefully if one exists, usually noting this in the command result message.

This can therefore be used to make many create statements idempotent, but only up to a point. In fact, true idempotence can only be achieved this way if only the object’s name (rather than the rest of its configuration) is important to the overall data model. This is because the IF NOT EXISTS clause will only check for an object with the given name, and not against any other attributes of the object configuration. This may be fine in the case of schema objects for example, as Snowflake only provides a few configuration options (e.g. data retention days, default collation and comment) which are not necessarily in common usage.

As a specific example of this, take the example of creating role objects in Snowflake. Apart from comment text, roles have no configuration other than their name . Therefore, executing CREATE ROLE IF NOT EXISTS MY_ROLE should not pose any issues. Conversely, file formats have a very rich configuration, comprising dozens of attributes , so this approach would not be suitable for these objects.

However, in the case of a more common object type, such as a table, simply checking that the object already exists cannot imply anything about its current configuration.

Or replace

CREATE [OR REPLACE] <OBJECT_TYPE> <OBJECT_NAME> ...

The OR REPLACE clause will also cause the database to check for the presence of an object with the name provided, but in contrast to IF NOT EXISTS it will then implicitly drop the existing object before then creating the new one. This effectively “overwrites” the old object with a new one.

Using this clause can bring an implementation much closer to idempotence, as after a statement is run one or more times, the database will always be in the same new configuration/state .

However, it is worth noting these important points regarding OR REPLACE when used in create statements:

  • There is nothing particularly clever about itall the database does is effectively prepend a DROP OBJECT_TYPE IF EXISTS OBJECT_NAME to the create statement.
  • Any child objects (in databases, schemas), materialised data (in tables, materialised views) or state information (in stages) will be lost by the drop, which may need to be recreated.
  • Any grants directly on the object will also be lost by the drop, so will need to be re-granted. Even though the new object (i.e. object that replaced the previous one) has the same name, it’s internal IDs are different and therefore previous GRANTs will no longer apply.
  • The executing user will require not only CREATE privileges, but also DROP on existing objects if they are to be replaced.

Moving the problem

So, if detecting the presence of a similarly named object is easy, we could simply move the problem, and any potentially complicated implementation, out of the realm of object creation and into object alteration, as illustrated by the following conceptual flow.

Picture1-1


Thus, we can imagine an implementation of this simply detecting the object by name (similar to in the if not exists clause, above), and then switching to the alteration process if it does. This has the benefit of moving any complex logic into a single process.

Object Alteration

Some database ALTER statements can be idempotent, but not all. Many operations simply set or unset attributes on an object, or execute a static rename operation, but some operations perform intrinsically non-idempotent operations such as adding a column to a table or swapping the names of two objects.

Natural Idempotence

Many ALTER statements are naturally idempotent, for example:

ALTER DATABASE ALPHA_PROD SET DATA_RETENTION_TIME_IN_DAYS = 90; This is clearly an idempotent operation i.e. rerunning this statement many times will not change the resulting state. Similar behavior can be observed for other ALTER statement clauses such as UNSET.

Additionally, some statements include a REFRESH method, for example in the case of refreshing a pipe , which although will alter the underlying state of the object, can safely be applied multiple times to the same overall effect.

Non-idempotent Operations

However, problems arise with other clauses, particularly around managing table columns and constraints, swapping tables, and working with sequences.

For example:

ALTER TABLE CONTACTS ADD COLUMN HOME_PHONE VARCHAR(20);

This statement will clearly fail if the CONTACTS table already has a column called HOME_PHONE.

Other non-idempotent operations include:

  • ALTER SHARE <name> { ADD | REMOVE } ACCOUNTS ...
  • ALTER DATABASE <name> SWAP WITH ...
  • ALTER SCHEMA <name> SWAP WITH ...
  • ALTER TABLE <name> SWAP WITH ...
  • ALTER SEQUENCE <name> INCREMENT BY ...

Drop and Recreate

One approach to altering an object in an idempotent way is to simply drop the object and then recreate it. Clearly, this approach will have an impact on objects that hold child objects (in databases, schemas), materialized data (in tables, materialized views) or state information (in stages), as discussed in section 0 above.

However, each of these limiting factors may not be a blocker to applying this approach. Child objects can also be recreated, many data management approaches will allow data to be easily reloaded, and persistent state information may not be important to some processes. For example the requirement for a new table which is a simple transformation of an existing table that doesn’t have a very large number of rows is often simpler to recreate each time that to deal with alterations.

Therefore, in many cases an ALTER statement could be swapped out in favor of a CREATE OR REPLACE operation.

Targeted Alteration

If simply dropping and recreating an object is not feasible, a general alternative approach would be to identify the differences between the current and future states (i.e. the current object configuration and the desired specification), and then compute the necessary ALTER commands to get there.

This approach has the advantage of retaining existing objects and data, although there will be some cases where certain in-place alterations are simply not possible. This is discussed more fully in other papers.

Object Deletion

In Snowflake, all DROP operations can include the IF EXISTS modifier which, when included, will cause the execution to gracefully terminate without an error condition if an object with the provided name does not exist.

Therefore, all DROP operations can be considered idempotent if they include the IF EXISTS modifier, as any given statement can be executed one of more times with the same effect.

Object Grants

Statements to apply access privileges to a role on an object (or on the account) using GRANT are naturally idempotent, as a GRANT statement will not fail or cause an error condition if the specified permissions are already in place.

If a CREATE OR REPLACE strategy is used to create/recreate certain objects, the fact that this will drop all the direct grants on an object can be mitigated by the fact that grants can be then applied regardless, due to this idempotent nature. This can also have the effect of removing any rogue grants that might have been applied by uncontrolled external processes, thus cleaning up the object permissions.

RELATED ARTICLES