Extract Transform Load

ETL is a process that is used to take information from one or more sources, normalize it in some way to some convenient schema, and then insert it into some other repository.

I have used this process on a number of occasions both manually and through the use of scripting. This process is taking information from one source or application and converting it into another form where you can use that data to whatever you need. I rarely used one of the examples of GUI software below which apparently would make my life so much easier.

The 3 major steps are:

  1. Extract the data
  2. Transform the data to your needs
  3. Load the data into your application / database
The steps broken down are:

The typical real-life ETL cycle consists of the following execution steps:

  1. Cycle initiation
  2. Build reference data
  3. Extract (from sources)
  4. Validate
  5. Transform (clean, apply business rules, check for data integrity, create aggregates or disaggregates)
  6. Stage (load into staging tables, if used)
  7. Audit reports (for example, on compliance with business rules. Also, in case of failure, helps to diagnose/repair)
  8. Load (to target tables)
  9. Archive
  10. Clean up

Extract-Transform-Load (ETL)

Examples of data migration software

Expressor Studio

Designed to provide immediate business value to analysts and ETL experts alike, Expressor Studio is an easy-to-use, modern, metadata-driven ETL tool that offers a simplified approach to data integration—ranging from small ad-hoc ETL tasks to complex data integration projects.

You can download, install, build, and run your first Expressor Studio application in minutes. Studio provides excellent out-of-the-box connectivity to flat files and all major database systems—and offers built-in data cleansing and validation support. You can configure common ETL operations using Studio’s drag and drop interface, and you can extend applications with expressor DatascriptTM—a powerful and flexible scripting language. Moreover, expressor Studio enables reuse of data objects, design artifacts, and business rules—and seamlessly integrates with the server components of the expressor platform.

 

Talend

Talend, the Open Source Integration Software Company, provides open source middleware solutions that enable organizations to gain more value from their applications, systems and databases. Shattering the traditional proprietary model, Talend democratizes the integration market by providing enterprise-grade open source technologies that cover both the data integration and application integration needs of organizations of all sizes.

Talend’s unified integration platform addresses projects such as data integration, ETL, data quality, master data management and application integration. With their proven performance, ease of use, extensibility and robustness, Talend’s solutions are the most widely used and deployed integration solutions in the world.

 

These programs make your life easier. A senor engineer wrote to me after I inquired about Talend:

This is the tool we use to move data from feed format in to our central database.  It’s basically a “game engine” for data, meaning it has lots of components in a GUI format that represent hundreds or thousands of lines of code.

Leave a Reply

Your email address will not be published. Required fields are marked *