DataOps WP Cover

Download the free whitepaper

The Challenges of Apache Airflow in a #TrueDataOps World

Data engineering and its associated need to create optimized, orchestrated data pipelines that extract data from multiple, disparate data sources and load it into a centralized data platform, has risen to prominence. Before the development of automated workflow orchestration tools (like Airflow), data pipeline functionality was either manually coded and implemented, or run by batches of multiple lengthy crons and repetitive custom API calls. This has caused an erosion in the quality of the resultant data insights because of the overwhelming manual nature of data pipeline management before the arrival of workflow orchestration tools.

There is now a need to apply some (or all) of DevOps principles, battle hardened in the software development industry, to this world of data, to ensure the implementation of Agile and Lean principles in the data pipeline creation, testing, deployment, monitoring, and maintenance lifecycles.  This white paper delves into the #TrueDataOps philosophy and why Apache Airflow, the forerunner, and originator of automated workflow monitoring and management tools, never was an ideal solution for data pipeline orchestration workflows. 

What's in the whitepaper?

  • Recognizing the data project -understand why we need to think of what we build as “data products”
  • An intro to #TrueDataOps - explore the seven pillars of #TrueDataOps and how you can increase efficiency with trusted data assets, work smarter with code reuse, reduce errors, build collaborative teams and save money
  • The importance of data pipeline orchestration - automated orchestration, imperative versus declarative pipeline orchestration, and orchestration using workflows
  • An introduction to DAGs - explore Direct Acyclic Graphs (the Airflow architecture is based on the DAG construct) 
  • What is Apache Airflow? - look at some of Apache Airflow’s components and how it functions
  • Apache Airflow successes and challenges - using Airflow within modern data pipeline architecture requirements