Skip to content
DataOps.live Professional EditionNEW
Purpose-built environment for small data teams and dbt Core developers.
DataOps.live Enterprise Edition
DataOps.live is the leading provider of Snowflake environment management, end-to-end orchestration, CI/CD, automated testing & observability, and code management, wrapped in an elegant developer interface.
Spendview for Snowflake FREE

An inexpensive, quick and easy way to build beautiful responsive website pages without coding knowledge.


Pricing and Edition

See whats included in our Professional and Enterprise Editions.

Getting Started
Docs- New to DataOps.liveStart learning by doing. Create your first project and set up your DataOps execution environment.
Join the Community
Join the CommunityFind answers to your DataOps questions, collaborate with your peers, share your knowledge!
#TrueDataOps Podcast
#TrueDataOps PodcastWelcome to the #TrueDataOps podcast with your host Kent Graziano, The Data Warrior!
Academy
DataOps AcademyEnroll in the DataOps.live Academy to take advantage of training courses. These courses will help you make the most out of DataOps.live.
Resource Hub
On-Demand Resources: eBooks, White Papers, Videos, Webinars

Learning Resources
A collection of resources to support your learning journey.

Customer stories
Events
Connect with fellow professionals, expand your network, and gain knowledge from our esteemed product and industry experts.
#TrueDataOps.org
#TrueDataOps.Org#TrueDataOps is defined by seven key characteristics or pillars:
Blogs
Stay informed with the latest insights from the DataOps team and the vibrant DataOps Community through our engaging DataOps blog. Explore updates, news, and valuable content that keep you in the loop about the ever-evolving world of DataOps.
In The News

In The News

Stay up-to-date with the latest developments, press releases, and news.
About Us
About UsFounded in 2020 with a vision to enhance customer insights and value, our company has since developed technologies focused on DataOps.
Careers

Careers

Join the DataOps.live team today! We're looking for colleagues on our Sales, Marketing, Engineering, Product, and Support teams.
Doug 'The Data Guy' NeedhamMar 8, 2023 2:05:38 PM5 min read

DataOps.live & Snowflake—Better MLOPS with Snowpark

MLOps is a new discipline. It is a core function of machine learning engineering focused on taking machine learning models to production and managing them. Once a Data Scientist comes up with an algorithm, or combination of algorithms to make a prediction, this algorithm along with a machine learning model must go into production.    

Snowpark is an exciting new way to program—that is, promote code to data in Snowflake—using the languages you like, including Python, Java, and Scala. It takes advantage of custom warehouses and allows for these popular languages and the SQL interface from Snowflake. This is an exciting option for architects as they can now design warehouse solutions without having to move data out of Snowflake. 

Snowpark is a powerful tool in the toolbox. The value of Snowpark increases when there are operational best practices supporting the deployment and monitoring of Snowpark code. The DataOps.live Data Platform has supported Snowpark implementations for Scala and Java for some time. Recently we have added the Python language to our support for Snowpark. Our Data Platform manages the deployment, management, and promotion of the code that would otherwise need to be done manually to get a Snowpark implementation off to the races. 

With the increasing volume of data coming into the enrichment platform from a wide variety of sources (streaming, third parties, IOT, etc.…), leveraging the speed, performance, and cost-savings of keeping the data in a specific location is an advantage since the time it takes to move data around the Enterprise will only increase as the volume increases. One of the best use cases for deploying this code into a Snowflake environment is using machine learning directly on Snowflake.

Challenges with ML (machine learning) models 

One of the most difficult things in my experience being a Data Scientist is getting a machine learning model into production. This is the overall goal and need for the creation of MLOps as a discipline. I have taken several courses, vendor training classes, MOOCS and others that talk about Data Science. All of them cover data preparation, data munging, some data cleaning, exploratory data analysis (EDA), visualizations, and various dials that can be tuned for several types of machine learning algorithms like XGBoost or Neural Networks.  

Almost none of them cover the topics of delivering your machine learning model to production, retraining an existing model with new data, rolling back a trained model, comparing performance of models, and validation of model performance. All these capabilities are currently built into our platform, and we have been doing similar things since our product began. Extending these capabilities to support Snowpark was a logical step forward. 

A machine learning model is not code alone. When a Data Scientist trains a machine learning algorithm through her Jupyter notebook, the model is in memory. That object stored in memory is analyzing potentially millions or billions of records. Using sophisticated algorithms that Mathematics PhD’s love spending time explaining this object in memory has jumped through every hoop imaginable. The cost function has been reduced, and the r2 covers well over 90% of the data. The combination of the code to prep and munge the data into a particular data structure, then pass that data to the machine learning function that has access to the object in memory will produce predictions on new data. These predictions are usually what the business is after - whatever they may be.

Going live with ML projects faster, better with DataOps.live 

How do you take that object in memory she has in her notebook and take it into production? It must be persisted as some sort of file. That file then must be moved to a location that is “production.” At the same time, all the precursor code that specifically works with that file must be promoted to a “production” environment and called at the right time new data is seen to get the predictions. All of this is very manual and ad hoc—with no automation and unpredictable results. 

The team of engineers at DataOps.live works efficiently to add new features to our Data Platform very soon after Snowflake makes these new features available in preview. Our Platform has incorporated support for Snowpark since its initial release. DataOps.live support for Snowpark is a great blog that covers the first iterations of Snowpark for Java and Scala. We have been working with Snowpark for well over two years now.
 

How it works 

The DataOps.live platform has built-in helper functions for the Python Data Scientist to save her models in easily retrieved locations for use within a Snowpark function. Our repository-based infrastructure ensures that configuration is stored as code, checked in, audited, and approved before releasing it to production—naturally supporting the incorporation of Snowpark code.  

A DataOps.live orchestration can perform a multitude of functions including data ingestion, transformation, automated testing, and more. It is imperative to perform extensive validation and ensure everything meets high quality standards required by the data product or application. DataOps.live can seamlessly orchestrate the running of code within the Snowpark framework. This opens the door to operating with a multitude of machine learning use cases and applications. 

The code of the machine learning model is stored in a git-compatible repository. The model object needed for that bit of code to accurately do its predictions is easily retrieved using our helper functions. Following our best practices, the code can even update the model object when new data arrives or on a schedule. All are easily controlled, managed, and visible to support any audit or questions about what a prediction means.  

This comprehensive approach enables DataOps teams to work efficiently with Data Scientists in the organization to produce valuable insights quickly.  

Your data team should focus on creating value for your organization—that begins with an organized, repeatable approach for building and operating machine learning models—MLOps made easy. DataOps.live provides the only purpose-built platform that helps modern data teams take the next step forward with Snowpark. 

Want to learn more about DataOps.live? Connect with us at DataOps.live—“The folks who wrote the book on DataOps.”  Here is a link to the book—DataOps for Dummies 

Check out the Roche Case Study, OneWeb Case Study, Accelerated Snowpark Deployments with DataOps.live and be sure to register for our Podcast Series hosted by Kent Graziano, The Data Warrior 

Ready to give it a try? Click here: DataOps.live free trial 

avatar

Doug 'The Data Guy' Needham

“The Data Guy” Needham started his career as a Marine Database Administrator supporting operational systems that spanned the globe in support of the Marine Corps missions. Since then, Doug has worked as a consultant, data engineer, and data architect for enterprises of all sizes. He is currently working as a data scientist tinkering with graphs and enrichment platforms – showing others how to get more meaning from data.

RELATED ARTICLES