Back to blog

PART 2: The Challenges of Repeatable and Idempotent Schema Management: Idempotence in Snowflake

It may appear that most of this should be possible with native SQL statements and indeed some DDL operations in Snowflake are naturally idempotent, whereas others have impacts on data and object state, and some are not idempotent at all. Let’s look at some of the ways SQL tries to help us with this and the problems that remain.

Object Creation

When executing an operation to create an object in Snowflake, and an object with the same name already exists, many DDL operations can be instructed either to gracefully cancel the statement (IF NOT EXISTS) or drop the existing object first (OR REPLACE).

If Not Exists

CREATE <OBJECT_TYPE> [IF NOT EXISTS] <OBJECT NAME> ...

The IF NOT EXISTS clause in many create statements will cause the database to check for the presence of an object with the name provided, and will halt execution of the statement gracefully if one exists, usually noting this in the command result message.

This can therefore be used to make many create statements idempotent, but only up to a point. In fact, true idempotence can only be achieved this way if only the object’s name (rather than the rest of its configuration) is important to the overall data model. This is because the IF NOT EXISTS clause will only check for an object with the given name, and not against any other attributes of the object configuration. This may be fine in the case of schema objects for example, as Snowflake only provides a few configuration options (e.g. data retention days, default collation and comment) which are not necessarily in common usage.

As a specific example of this, take the example of creating role objects in Snowflake. Apart from comment text, roles have no configuration other than their name . Therefore, executing CREATE ROLE IF NOT EXISTS MY_ROLE should not pose any issues. Conversely, file formats have a very rich configuration, comprising dozens of attributes , so this approach would not be suitable for these objects.

However, in the case of a more common object type, such as a table, simply checking that the object already exists cannot imply anything about its current configuration.

OR REPLACE

CREATE [OR REPLACE] <OBJECT_TYPE> <OBJECT_NAME> ...

The OR REPLACE clause will also cause the database to check for the presence of an object with the name provided, but in contrast to IF NOT EXISTS it will then implicitly drop the existing object before then creating the new one. This effectively “overwrites” the old object with a new one.

Using this clause can bring an implementation much closer to idempotence, as after a statement is run one or more times, the database will always be in the same new configuration/state .

However, it is worth noting these important points regarding OR REPLACE when used in create statements:

  • There is nothing particularly clever about it – all the database does is effectively prepend a DROP OBJECT_TYPE IF EXISTS OBJECT_NAME to the create statement.
  • Any child objects (in databases, schemas), materialised data (in tables, materialised views) or state information (in stages) will be lost by the drop, which may need to be recreated.
  • Any grants directly on the object will also be lost by the drop, so will need to be re-granted. Even though the new object (i.e. object that replaced the previous one) has the same name, it’s internal IDs are different and therefore previous GRANTs will no longer apply.
  • The executing user will require not only CREATE privileges, but also DROP on existing objects if they are to be replaced.

Moving the problem

So, if detecting the presence of a similarly named object is easy, we could simply move the problem, and any potentially complicated implementation, out of the realm of object creation and into object alteration, as illustrated by the following conceptual flow.

Picture1-1

Thus, we can imagine an implementation of this simply detecting the object by name (similar to in the if not exists clause, above), and then switching to the alteration process if it does. This has the benefit of moving any complex logic into a single process.

Object Alteration

Some database ALTER statements can be idempotent, but not all. Many operations simply set or unset attributes on an object, or execute a static rename operation, but some operations perform intrinsically non-idempotent operations such as adding a column to a table or swapping the names of two objects.

Natural Idempotence

Many ALTER statements are naturally idempotent, for example:

ALTER DATABASE ALPHA_PROD SET DATA_RETENTION_TIME_IN_DAYS = 90; This is clearly an idempotent operation i.e. rerunning this statement many times will not change the resulting state. Similar behaviour can be observed for other ALTER statement clauses such as UNSET.

Additionally, some statements include a REFRESH method, for example in the case of refreshing a pipe , which although will alter the underlying state of the object, can safely be applied multiple times to the same overall effect.

Non-idempotent Operations

However, problems arise with other clauses, particularly around managing table columns and constraints, swapping tables, and working with sequences.

For example:

ALTER TABLE CONTACTS ADD COLUMN HOME_PHONE VARCHAR(20);

This statement will clearly fail if the CONTACTS table already has a column called HOME_PHONE.

Other non-idempotent operations include:

  • ALTER SHARE <name> { ADD | REMOVE } ACCOUNTS ...
  • ALTER DATABASE <name> SWAP WITH ...
  • ALTER SCHEMA <name> SWAP WITH ...
  • ALTER TABLE <name> SWAP WITH ...
  • ALTER SEQUENCE <name> INCREMENT BY ...

Drop and Recreate

One approach to altering an object in an idempotent way is to simply drop the object and then recreate it. Clearly, this approach will have an impact on objects that hold child objects (in databases, schemas), materialised data (in tables, materialised views) or state information (in stages), as discussed in section 0 above.

However, each of these limiting factors may not be a blocker to applying this approach. Child objects can also be recreated, many data management approaches will allow data to be easily reloaded, and persistent state information may not be important to some processes. For example the requirement for a new table which is a simple transformation of an existing table that doesn’t have a very large number of rows is often simpler to recreate each time that to deal with alterations.

Therefore, in many cases an ALTER statement could be swapped out in favour of a CREATE OR REPLACE operation.

Targeted Alteration

If simply dropping and recreating an object is not feasible, a general alternative approach would be to identify the differences between the current and future states (i.e. the current object configuration and the desired specification), and then compute the necessary ALTER commands to get there.

This approach has the advantage of retaining existing objects and data, although there will be some cases where certain in-place alterations are simply not possible. This is discussed more fully in other papers.

Object Deletion

In Snowflake, all DROP operations can include the IF EXISTS modifier which, when included, will cause the execution to gracefully terminate without an error condition if an object with the provided name does not exist.

Therefore, all DROP operations can be considered idempotent if they include the IF EXISTS modifier, as any given statement can be executed one of more times with the same effect.

Object Grants

Statements to apply access privileges to a role on an object (or on the account) using GRANT are naturally idempotent, as a GRANT statement will not fail or cause an error condition if the specified permissions are already in place.

If a CREATE OR REPLACE strategy is used to create/recreate certain objects, the fact that this will drop all the direct grants on an object can be mitigated by the fact that grants can be then applied regardless, due to this idempotent nature. This can also have the effect of removing any rogue grants that might have been applied by uncontrolled external processes, thus cleaning up the object permissions.