Skip to content
DataOps.live Professional EditionNEW
Purpose-built environment for small data teams and dbt Core developers.
DataOps.live Enterprise Edition
DataOps.live is the leading provider of Snowflake environment management, end-to-end orchestration, CI/CD, automated testing & observability, and code management, wrapped in an elegant developer interface.
Spendview for Snowflake FREE

An inexpensive, quick and easy way to build beautiful responsive website pages without coding knowledge.


Pricing and Edition

See whats included in our Professional and Enterprise Editions.

Getting Started
Docs- New to DataOps.liveStart learning by doing. Create your first project and set up your DataOps execution environment.
Join the Community
Join the CommunityFind answers to your DataOps questions, collaborate with your peers, share your knowledge!
#TrueDataOps Podcast
#TrueDataOps PodcastWelcome to the #TrueDataOps podcast with your host Kent Graziano, The Data Warrior!
Academy
DataOps AcademyEnroll in the DataOps.live Academy to take advantage of training courses. These courses will help you make the most out of DataOps.live.
Resource Hub
On-Demand Resources: eBooks, White Papers, Videos, Webinars

Learning Resources
A collection of resources to support your learning journey.

Customer stories
Events
Connect with fellow professionals, expand your network, and gain knowledge from our esteemed product and industry experts.
#TrueDataOps.org
#TrueDataOps.Org#TrueDataOps is defined by seven key characteristics or pillars:
Blogs
Stay informed with the latest insights from the DataOps team and the vibrant DataOps Community through our engaging DataOps blog. Explore updates, news, and valuable content that keep you in the loop about the ever-evolving world of DataOps.
In The News

In The News

Stay up-to-date with the latest developments, press releases, and news.
About Us
About UsFounded in 2020 with a vision to enhance customer insights and value, our company has since developed technologies focused on DataOps.
Careers

Careers

Join the DataOps.live team today! We're looking for colleagues on our Sales, Marketing, Engineering, Product, and Support teams.
Patrice BorneFeb 6, 2023 6:28:45 PM5 min read

Increasing Productivity, Empowering Developers: Dynamically Scale Data Product Development Using DataOps.live and Kubernetes

When you run a pipeline, you need a runner: a piece of code that does the job on your behalf. Someone has to set this up. It’s not very hard, of course but the problem is that you only have access to the resources of one machine where the runner runs. This brings limitations, including the size and cost of machine available to rent. It’s far from easy to scale up and down in a flexible way and the big ‘monster machines’ can get very expensive very quickly. 

Resiliency can also be a problem alongside the fact you’re going to need to provision a large virtual machine, even if you only use it for a fraction of the time. You can set up a second runner or as many as you want. Issues may arise around the time and effort involved and how you manage and control them. It’s possible but far from ideal. 

The combination of DataOps.live and Kubernetes solves these issues, enabling you to scale up and down and improve your resiliency. You can define the target number of pods that you want for a service and let Kubernetes figure out at runtime how to get there - if something breaks, it will recover and reschedule. And of course, you’re boosting developer productivity. 

It’s worth remembering that Kubernetes itself is pretty low level and requires quite a lot of work to configure from scratch. So, we provide the Helm charts to specify a few things —to provide this additional level of abstraction on top of those super-low-level Kubernetes concepts. You can now think in terms of what your application does rather than what Kubernetes expects. Because of how the pipeline and job runner mechanisms work, you need to be able to pass information from one step to the next. We have a shared storage area mechanism to do this. And you then need to consider two important parameters:  resiliency and concurrency.  

Providing Resiliency and Concurrency for Dynamic Scalability  

The first parameter is resiliency. How many Pods do you want—a typical concept with Kubernetes, with each of these going to present itself to DataOps.live as a runner. If you want two Pods then you’ll get two runners for the project you’re working on and you can register with a specific project.  

The second important parameter is the concurrency. If you look at how a job is defined, logically, you have different stages. And within a stage you may have multiple jobs. When you put multiple jobs in the same stage, you have parallelism to execute those different jobs at runtime. You can also constrain using dependencies. In this way, you (or rather DataOps.live) can schedule more work to run in parallel to make the pipeline go faster.   

Using the traditional approach, the runner (each job) is technically scheduled on the virtual machine as an individual Docker container. We’re back to that resource/cost issue: there’s only so big a machine you can rent from AWS, say, to run in parallel, and eventually you risk running out of resources. Especially if you’re involved in a non-trivial development—your headcount is growing, people want to run more projects, they want more pipelines—so what you start asking of your runner(s) can become overwhelming.  

DataOps.live and Kubernetes solve this: you can define and manage a massive concurrency level should you want. When you’re running a pipeline, 10 jobs could run in parallel. You could have five or more teams working on multiple different projects in DataOps.live, each one with multiple pipelines, sharing Kubernetes runners. Instead of thinking in terms of setting up 1, 2, 10 or 20 runners, think in terms of a Kubernetes cluster —without the need to manage specifically where each runner is going to run.   

The beauty of this is approach that when you have to deal with a huge spike in demand in your DataOps.live environment, with everyone wanting to run pipelines and you need huge resources, your Kubernetes cluster scales up dynamically, with no human intervention. In the traditional way, with specific runners you defined, you faced a physical limit. When you ran out of resources, everyone has to get in the queue. With the DataOps.live Kubernetes approach, you instead scale up dynamically to address whatever demands you have. And when you’re done, it will scale down automatically. 

So what does this mean in practice? First, improved cost management as it now becomes dynamic. You don’t pay for compute resources for DataOps if you don’t use those resources (the original promise of the cloud).  

Second, the DataOps.live platform means you can also share these flexible additional resources around runners and jobs across multiple teams and projects, safe in the knowledge that you’re supported by the secrets management and security features provided by the platform. This further improves productivity and collaboration. Developers are no longer constrained and have better access to resources.  

And third, you get the resiliency you need. If a pipeline breaks, it’s detected automatically, and you keep on running. You can rely on Kubernetes to do the right thing at run time. So, business-critical processes using those pipelines become more reliable
 

DataOps.live and Kubernetes support our mission to improve: 

  • Developer productivity​: making it easier for developers to work concurrently and faster—making changes, running, seeing the results, making further improvements… for higher productivity​ and accelerated development 
  • End user productivity and customer satisfaction​: delivering higher quality products faster plus increased resiliency and reduced downtime mean more productive and satisfied business users 
  • Cross-project productivity​: using leading edge DataOps.live technology and processes across all groups enable people to easily move around to balance changing workloads and priorities​ 
  • Business productivity: time to market and revenue: increasing the speed, effectiveness and resiliency of data and analytic processes means you can build new use cases in days rather than months or years

 

RELATED ARTICLES