ETL VS ELT

Working in the Business Intelligence or the BI domain, ETL is the one abbreviation you must have been either identified by other people, or you would have found yourself clubbing the tools in it! With advancements in the field, we have a come a long way from ETL to ELT. Let us begin to understand what these are, what are the difference and will ELT overpower ETL?

EXTRACT TRANSFORM LOAD (ETL)

ETL is a set of process in the BI world that define the movement of data from different data sources to a common data warehouse. The loading from the sources to the warehouse, has an intermittent stoppage at the transform phase, where the data is transformed and made useful or clubbed before loading them into the warehouse.

The History

ETL was introduced in the 1970s as a process for integrating and loading data into the super computers or mainframe for computation or analysis. From the early 1980s to the mid 2000s, ETL became the primary process for creating data warehouses that support Business Intelligence applications. Times have changed and from being a time consuming batch process operation, ETL is used for creating small repositories that require less frequent updating. For the other requirement, there has been an introduction of various other data integration tools like ELT, CDC and data virtualization – to integrate large volumes of constantly changing data.

The Working

Extract – In this phase of ETL, data is extracted or copied from various sources to a staging area. The data can come from virtually any structured or non-structured sources, SQL servers, No-SQL, text, documents, ERP systems, web pages and more!

Transform – In the staging area, the data is transformed to be suitable for analysis, usually loaded into a relational model of database. There are some of the common steps that are done for transforming data –

  1. Filtering, cleansing, removing duplicates
  2. Perform calculations, aggregations or editing text strings
  3. Removing, encrypting, hiding or protecting the data as governed by the client

Mostly the transformation is preferred in the staging area when opposed to the source systems, as doing the transformation in staging area reduces the chances of data corruption.

Load – This is the final stage of ETL tool, the transformed data in the staging is moved into the target data warehouse. Mostly, the loading is done once and then incrementally, with fewer chances of full refresh of the data warehouse.

EXTRACT LOAD TRANSFORM (ELT)

This is an alternative to extract transform load, where the data is first loaded into the target data store where the transformation happens. More often, the target data store is not a data warehouse but a data lake which is nothing but a big repository which is designed to store both the structured and non-structured data. Data lakes are managed using big data platforms like Hadoop or non-structured database like MongoDb. Although ELT can support business intelligence, the reason they came into picture was for Artificial Intelligence, machine learning, predictive analysis etc.

Since in ELT the data is moved directly into the data lake, without the intervention of any transformation, obviously, the load time is faster in ELT when compared to ETL. Since, the data transformation happens in the data store, it is important in case of ELT to have enough processing power in the data lake.

THE DIFFERENCES

Some of the differences between the ETL and ELT are obvious and must be clear if you have read the blog from beginning. Let us dig deeper into some more differences between the two –

S.NOELTETL
1.In this case, the data is transformed in the db of the data warehouse itself.In ETL, the data is loaded into the staging zone for transformation.
2.This is used in case of high volumes of data.This is preferred when the volume of data is small but there are huge computations to be performed.
3.This process is faster since the data is directly loaded into the data lake.This is obviously time intensive, since the data is loaded post the transformation in the staging layer.
4.ELT supports both structured and un-structured data. ETL mostly supports the structured data.
5.Since ELT is a new concept all together, it often takes time to implement it. ETL being present in the market for decades, has matured more, and hence becomes easier to implement.
6.ELT brings with it adhoc, agility, flexibility and usability by everyone from designer to developer.ETL has fixed tables, fixed timeline and is usually used by the IT department.
ETL VS ELT

Both the ETL and ELT are in the BI market right now, no matter who is more advancing or who is more mature, they are here to stay for some time, and so are the tools that implement the process. Let us have a look at the various tools of both –

Tools of ETL
Paid ETL Tools

  • Informatica Power Center, Microsoft SSIS, Talend, Panoply, Stich, Fivetran, Alooma, Segment, Atom, AzureDataFactory,Matillion,ETLeap

OpenSource ETL Tools

  • AirByte, ApacheCamel, Apache Kafka, LogStash, Singer

Tools of ELT

  • HevoData, Luigi, Blendo, Matillion, Talend, StreamSets, EtLeap, Airflow, Kafka, NiFi

Although ETL is still prevalent in the market, more and more companies are moving towards the ETL way of loading and transforming the data. The conventional ETL is designed for the relational database way of working, but with ELT the companies are getting the added advantage of harnessing the big data. Since more and more data is now present in the cloud, ELT might become the next way forward in the BI world.

With the use of ELT, the integration architecture gets simplified, time to value is accelerated, and provides more robust performance when compared to the ETL way of working. Let us look into some of the benefits that ELT has to offer –

  1. Streamlined Architecture – ELT harness the processing power of Hadoop, which then streamlines the architecture necessary for the data consumption. There is no intermediary layer with processing power limitations, the target system acts as both the staging layer and the final destination.
  2. Big Data – With ELT, the wealth of sources which involves semi-structured and unstructured data can be readily incorporated in the data warehouse and data lakes. These sources are difficult to use with the traditional data ingestion and transformation methods.
  3. Sandbox – There can nothing be more wonderful than having the right data set for testing. ELT allows for the presence of a data store such as Hadoop as sandbox for data scientists to experiment without having to standardize any schema.

We will be exploring more on the ETL VS ELT discussion in the coming posts.

Stay tuned!

Happy Learning 🙂

Sources –

https://www.ibm.com/cloud/learn/etl



One response to “ETL VS ELT”

  1. […] data into the database and then doing the transformation, or transform and then loading, the difference between ETL and ELT. This post however, does not delves into the difference between ETL and ELT rather is an overview on […]

    Like

Leave a reply to Transformations in Azure Data Factory – Anmoltanubhrt Cancel reply