As described in the article titled “Everything You Need to Know About DataOps and Data Platforms,” the amount of data generated per day continues to grow unabated. And...
By now, it is probably a well-known fact among our clients that before developing our DataOps for Snowflake data orchestration platform, we set about to define the principles that we felt were critical to ensuring that we laid the proper foundations for developing a data platform that would change the global enterprise data management solutions.
We returned to the DevOps philosophy because it has been battle-hardened in the software development industry for more than twenty years. DevOps has been successful since its inception. And it works.
To quote the DataOps for Dummies book:
DevOps is a “set of guiding principles that allow for agility while maintaining governance over the development and deployment of code.”
And “DevOps has led to principles and tools to provide the ability to maintain configuration and code in repositories, check-in/check-out functionality, and… the ability to rollback code in the event of a failure.”
We used this “set of guiding principles” as a foundation on which we built the truest form of DataOps (Data + Operations) known as #TrueDataOps.
Because of the exploding data volumes, one of the biggest challenges we initially faced was the need to balance governance and agility. Data must be governed and secured to protect it from unauthorized access. The long-accepted constraint of maintaining a balance between governance and agility is that to increase agility; governance must decrease and vice versa.
However, this premise is not necessarily so. The good news is that, with the development of the #TrueData Ops philosophy, one of the promises that we can give is that our data orchestration platform provides exponential increases in both governance and agility. Neither one must suffer because of the other.
This set of articles aims to drill down into each of the seven #TrueDataOps pillars, one at a time, to understand why DataOps.live has so much value to add to every enterprise organization’s data ecosystem and the generation of data analytics products used to inform all strategic decisions.
ELT and the new spirit of ELT
As highlighted throughout this article, we believe that to truly understand the role the #TrueDataOps philosophy plays in our company ethos, our product, and consequently the value we add to "Snowflake environment management, end-to-end orchestration, CI/CD, testing & observability, and code management," the following questions are valid and deserve a considered response:
- What is the difference between ETL and ELT?
- Why adopt ELT instead of ETL?
- Where does EtLT fit into the picture?
- What is the (new) spirit of ELT?
Now that we have ringfenced these discussion points let’s look at each one individually.
1. what is the difference between ETL and ELT?
Both the DataOps for Dummies book and the truedataops.org website provide a comprehensive description of the most important differences between ETL (extract, transform, load) and ELT (extract, load, transform).
Note: Both these constructs perform the same function: to ingest data from multiple, disparate data sources (or data producers) and load it into a centralized data store or data cloud. The primary difference is in the HOW, or when the data is transformed into useful information or usable datasets for data analytics, data science, and BI processes. Cloud platforms tend to favor ELT for cost and performance considerations.
ETL or extract, transform, load is the traditional way to move company-generated data from source to destination. The basic model functions as follows:
- The initial step is to extract it from its source and stage it in a staging area. This is the “E” in ETL.
- The second step to transform (clean, process, and convert) this data into meaningful information, the “T” in ETL.
- Lastly, the transformed data is loaded into a data warehouse, data lakehouse, or data lake, the “L” in ETL.
ELT, extract, load, transform moves the transformation stage from the middle to the end of the ELT model. Therefore, the steps in this model are as follows:
- As with the ETL model, the first step is to extract the data from its multiple, different data sources as raw data.
- The second step is to load or ingest the raw data into the data platform.
- Thirdly, the data is transformed and served to the business as valuable datasets or data analytics reports and dashboards.
2. Why adopt ELT instead of ETL?
Even though the differences between these two models might seem fairly insignificant, they are substantial in practice.
The point at which the data is transformed plays a considerable role in the overall management of company-generated data.
Raw data is valuable and should never be deleted. When the data is transformed, it is permanently degraded in some form. Therefore, the data provenance and lineage are maintained by storing the raw data in the data platform and using it as a foundation for requested data products. This then allows for the continual creation of unique data insights.
On the other hand, ETL transforms data before it is loaded into the data platform. This might seem beneficial, especially when considering the cost of storing massive volumes of data, structured, semi-structured, and unstructured. Nonetheless, the cost of cloud data storage is inexpensive, negating this benefit.
3. Where does EtLT fit into the picture?
Another seeming benefit of ETL is that the sensitive data or PII regulated by global regulations such as GDPR and CCPA is masked or anonymized before being loaded into the data store.
Therefore, the general recommendation seems to be that when working with sensitive data and needing to mask data is to use ETL instead of ELT.
The challenge here is that the raw data is transformed (or deformed) before it reaches the cloud data storage point. However, #TrueDataOps posits that instead of using ETL to load this data into a data platform, EtLT must be used.
Let’s turn, once again to the DataOps for Dummies book for a description of EtLT:
“In some cases, you can’t avoid regulations that require some data to be removed, encrypted, anonymized for privacy. In these cases, a small “t” is inserted into the ETL acronym (EtLT), signifying minimal, but required change.”
4. what is the (new) spirit of ELT?
The new spirit of ELT is fundamental to the #TrueDataOps philosophy. The truedataops.org website has the following to say about the spirit of ELT:
“The Spirit of ELT take the concept of ELT further and advocates that we avoid ALL actions which remove data that could be useful or valuable to someone later, including changes in data.”
In other words, it is about maximizing the ability to derive future value from the data by pushing its transformation down the pipeline and making sure that any value that we might derive in the future is considered and not discarded.
#TrueDataOps is critical to the success of DataOps for Snowflake. As described above, it forms the foundation upon which our market-leading data orchestration platform is built. And, because it is also the first pillar of this philosophy, it provides the foundation for the other six pillars:
- Agility and CI/CD
- Component design and maintainability
- Environment management
- Governance, security, and change control
- Automated testing and monitoring
- Collaboration and self-service
Lastly, simplicity is one of the principal goals of #TrueDataOps. This first pillar, ELT, and the spirit of ELT streamlines the potentially complex process of moving data from data producers to data consumers. It’s no wonder, DataOps.live is a leader in the data management and orchestration lifecycle.
Ready to get started?
Sign up for your free 14 day trial of DataOps.Live on Snowflake Partner Connect today!