Skip to content
DataOps.live Professional EditionNEW
Purpose-built environment for small data teams and dbt Core developers.
DataOps.live Enterprise
DataOps.live is the leading provider of Snowflake environment management, end-to-end orchestration, CI/CD, automated testing & observability, and code management, wrapped in an elegant developer interface.
Spendview for Snowflake FREE

 

An inexpensive, quick and easy way to build beautiful responsive website pages without coding knowledge.
Getting Started
Docs- New to DataOps.liveStart learning by doing. Create your first project and set up your DataOps execution environment.
Join the Community
Join the CommunityFind answers to your DataOps questions, collaborate with your peers, share your knowledge!
#TrueDataOps Podcast
#TrueDataOps PodcastWelcome to the #TrueDataOps podcast with your host Kent Graziano, The Data Warrior!
Academy
DataOps AcademyEnroll in the DataOps.live Academy to take advantage of training courses. These courses will help you make the most out of DataOps.live.
Resource Hub
On-Demand Resources: eBooks, White Papers, Videos, Webinars

Learning Resources
A collection of resources to support your learning journey.

Customer stories
Events
Connect with fellow professionals, expand your network, and gain knowledge from our esteemed product and industry experts.
#TrueDataOps.org
#TrueDataOps.Org#TrueDataOps is defined by seven key characteristics or pillars:
Blogs
Stay informed with the latest insights from the DataOps team and the vibrant DataOps Community through our engaging DataOps blog. Explore updates, news, and valuable content that keep you in the loop about the ever-evolving world of DataOps.
In The News

In The News

Stay up-to-date with the latest developments, press releases, and news.
About Us
About UsFounded in 2020 with a vision to enhance customer insights and value, our company has since developed technologies focused on DataOps.
Careers

Careers

Join the DataOps.live team today! We're looking for colleagues on our Sales, Marketing, Engineering, Product, and Support teams.
DataOps.liveJul 20, 2021 3:00:00 AM4 min read

Lifecycling Snowpark and Java UDFs Through DataOps

We recently (14 July 2021) completed a masterclass with Kent Graziano, Chief Technical Evangelist, Snowflake, discussing Snowpark, the use of Scala and Java UDFs, and how we integrate this new technology into our DataOps platform. In particular, we discussed how we are using our Snowflake Object Lifecycle Engine to recycle these Snowpark objects through our DataOps platform via CI/CD pipelines and automated regression testing. 

It was great to share our integration with Snowpark and Java UDFs. In summary, we are very excited about Snowpark because it represents a significant new way of thinking in the data world.  

The philosophy of #TrueDataOps 

While this blog post’s main purpose is not a deep dive into the #TrueDataOps philosophy, it is worth taking a brief look for our DataOps for Snowflake platform is based on #TrueDataOps.  

When we developed our DataOps platform, we first started by defining our #TrueDataOps philosophy. We began with a pure, clean model of DevOps and CI/CD to create this philosophy. We took what worked, what didn’t work, and extended and built our #TrueDataOps philosophy on top of DevOps instead of creating DataOps as a new functionality.  

Consequently, the most significant benefits of #TrueDataOps are applicable and relevant to both data and traditional software development. 

Snowpark 

Snowpark, Snowflake’s developer environment, provides developer and data engineers with the functionality to write, compile, and execute code inside Snowflake in their preferred language. In other words, Snowpark makes it easier to extend Snowflake’s functionality with custom Java UDFs and Scala code.  

Snowpark’s primary objective is to move the execution of extended functionality closer to the data and significantly reduce the high-volume movement of data in and out of Snowflake by storing the code files close to Snowflake’s compute engine. 

The additional benefits of Snowpark include:  

  • Improve data security 
  • Significantly reduce costs 
  • Improve (or decrease) time-to-value 
  • Reduce operational overhead 

The Snowflake Object Lifecycle Engine  

Succinctly stated, we developed the Snowflake Object Lifecycle Engine (SOLE) to manage, recycle, and extend the life of every object found in Snowflake such as tables, grants, roles, constraints, sequences, shares, stored procedures, and user defined functions (UDFs).  

The fundamental question that this masterclass answered is: How do we lifecycle Snowpark objects and Java UDFs as part of the DevOps infrastructure by citing the following scenario:  

We started off with the following example of a pipeline (each number or step is a job in the pipeline):   

  1. Secrets management: This job extracts the necessary credentials from secure tools like Data Vault
  2. Snowflake orchestration
  3. Ingest data
  4. Advanced processing: Compile, build, execute Snowpark
  5. Advanced processing: Snowpark testing 
  6. Business domain validation:  Final data testing 
  7. Generate docs

dataops-masterclass

The two pipeline jobs that are relevant to this scenario are as follows.

The Snowflake orchestration job 

This is where we build warehouses, roles, users, grants, shares, external functions, and resource monitors; all the objects required to make a Snowflake infrastructure work using the Snowflake Object Lifecycle Engine. The challenge with traditional SQL DDL (Data Definition Language) create and alter statements is they are very difficult to orchestrate. To solve this challenge, we model what the Snowflake environment must look like using YAML configuration files, including objects that hold the Snowpark output.   

When the job containing the configuration files run in the DataOps pipeline, it will build a complete Snowflake infrastructure if the environment is empty. If several objects do not yet exist or need to be altered or dropped, this engine will figure out how to replicate the model.   

dataops-masterclass

 

dataops-masterclass


Advanced processing: Compile, build, execute Snowpark 
 

For this masterclass, we built a simple Scala job called customer segmentation. Very simply, it selects customer data from an ingestion table, filters it, joins this data to master data information, groups it, shows the result, and appends it to another table. 

The Scala code is made up of data frames. However, when we look at the output, we can see that SQL was compiled and executed. In summary, Snowpark decides on a case-by-case basis whether it is best to create SQL statements or Java UDFs. We ask for the job to be done, and Snowpark does the rest. 

dataops-masterclass


Conclusion

The key theme of this webinar is that Snowpark adds the standard software development requirements as set out in the DevOps, CI/CD, and #TrueDataOps philosophies. Because our DataOps for Snowflake platform’s foundations are based on these paradigms, all this functionality is native to our platform.   

Thus, the DataOps/Snowpark integration cycle is as follows:

DataOps

The question is: How do we build, Test and Deploy Data Environments the same ways we do software while still keeping all the core software DevOps capabilities.


Lastly, as described above, we are now using the Snowflake Object Lifecycle Engine alongside Snowpark to manage all the Snowflake objects that Snowpark needs to interact with.
 

For more detail and technical information, you can watch the recording of this masterclass and connect with us here 

 

Ready to get started?

Access the latest resources on DataOps lifecycle management and support for Snowpark and Java UDFs from Snowflake.

Learn more

RELATED ARTICLES