Data capture methodologies: A comparison of ETL and ELT

vijay_vyakarana · May 2018

ETL Overview

Traditionally, the IT department has controlled and owned all the data in a given organization. Therefore the various functional areas within an organization (such as Finance, HR, Procurement, etc.) have provided reporting and analytical requirements to the IT department / Business Intelligence (BI) professionals, and have waited until the work corresponding to these business requirements is completed.

Historically, the approach taken by the BI professionals to meet these requirements was the standard Extract, Transform and Load process, which is depicted in the sketch below. The raw data from various data sources (cloud, .txt, databases, .csv, etc.) is first extracted on to a staging area. This extracted data is then transformed per a pre-determined set of transformation rules and then loaded to data repository. The business then consumes this transformed data for their reporting, analytics, and decision making functions.Figure 1 – ETL Process at a high levelThe ETL process is considered a bit rigid because all the requirements have to be first shared with the BI professionals, who will then code the required transformation rules. In addition, any changes to these rules come at a higher cost to the business, both in terms of time and money. In some cases, this lost time may also result in opportunity cost to the businesses.

ELT Overview

Nowadays, given the increasing need for speed and flexibility in reporting and analytics, what-if-analyses, etc., the same businesses cannot afford to wait for an extended period of time while its business requirements are being worked on by the same BI professionals. This, coupled with the relatively lower infrastructure (hardware) costs and the emergence of cloud technologies, has given rise to the ELT process.

In the ELT process, the raw data from all data sources is extracted and then immediately loaded into a central data repository. The business can then get its hands on this raw data and transform it to suit its requirements. Once this transformation is done, the data is readily available for reporting, analytics, and decision-making needs. The sketch below illustrates the ELT process from a high level.

Figure 2 – ELT Process at a high level

The ELT process is similar to that of a data lake concept, where organizations dump data from various source systems into a centralized data repository. The format of the data in the data lake may be structured (rows and columns), semi-structured (CSV and logs), unstructured (emails and .pdfs), and sometimes even binary (images).

Once organizations become familiar with the concept of a data lake / ELT process and see the benefits, they often rush to set one up. However, care must be taken to avoid the dumping of unnecessary and/or redundant data. In addition, an ELT process should also encompass data cleansing or data archival practices to keep up with the efficiency of the data repository.

Comparison of ETL and ELT:

The table below summarizes and compares the two methodologies of data acquisition and preparation for warehousing and analytics purposes. Screen Shot 2018-01-16 at 10.35.40 AM.png ELT vs ETL and the Anaplan Platform

As a flexible and agile cloud platform, Anaplan supports both methodologies. Depending on the method chosen, below are suggestions on solution design approach.

If choosing the ETL methodology, clients could utilize one of the many ETL tools available in the marketplace (such as Informatica, Mulesoft, Boomi, SnapLogic, etc.) to extract and transform the raw data, which can then be loaded to the Anaplan platform. Although it is preferred to load huge datasets to a data hub model, the transformed data could also be loaded to the live or planning model(s).

Screen Shot 2018-01-16 at 12.03.17 PM.png

With the ELT approach, after the raw data extraction, it is recommended that it be loaded to a data hub model where the Anaplan modeling team will code the required transformation rules. The transformed data can be then loaded to the live or planning model(s) to be consumed by end users. Screen Shot 2018-01-16 at 12.04.11 PM.png

Regardless of the approach chosen, note that the activities to extract raw data and load to the Anaplan platform could be automated.

A final note

The content above gives a high-level overview of the two data warehousing methodologies and by no means urges clients to adopt one methodology over the other. Clients are strongly advised to evaluate the pros and cons of each methodology as they relate to their business scenario(s) and have a business case to select a methodology.

Data capture methodologies: A comparison of ETL and ELT

Categories