Extract, transform, load (ETL) becomes data integration
In a landscape dominated by data, it can be hard for businesses to keep up.
Right now, your business is receiving data from many different platforms. These platforms may include the following:
- Databases / data warehouses
- Software as a Service (SaaS) applications
- Business intelligence (BI)
- Web and mobile analytics
- The internet of things (IoT)
At first, your IT experts may have been able to handle all your data. The speed of business has increased, though. Now, IT is busier than ever. They have more urgent tasks than changing scripts every time new data sources appear. And your business analysts can’t wait for IT to change the scripts anyways.
More specifically, enter data integration tools. At first, these tools were referred to as ETL tools. Now, software vendors are shedding the ETL term as they adapt to modern demands for increased extensibility, self-sufficiency, simplicity, speed, compatibility, and efficiency. Instead of searching for an ETL tool, you might search for “data integration,” “data plumbing,” or “data flow management.”
Don’t completely rule anything labeled ETL as outdated, though. Many great vendors do still use the term ETL. And many other vendors have simply found synonyms for each of the ETL steps.
Check out some of the alternative terms major vendors use when they describe what their product does:
- Extract → collect, connect, merge, import, sensor, consume, access, aggregate
- Transform → process, combine, cleanse, structure, customize, filter, integrate, synchronize
- Load → enrich, aggregate, publish, transform, manage, deploy, join, deliver, migrate
Marks of great data integration tools
Many in IT, business analytics, and data science have had bad experiences with traditional ETLs. They have been slow, inflexible, and static. So, it makes sense that modern ETL tools don’t want to be associated with the bad memories of yesterday. For now, though, we’ll keep using ETL and data integration interchangeably.
You may be looking for a tool that can handle all of your data streams or just a specific set of data. Either way, consider how well it fulfills the following ideals—and I’ll be honest: at this point, it’s unlikely that you’ll find a data integration tool with all these characteristics. It’s also unlikely that you truly need every characteristic. So, read through them and choose the few that would benefit your business most. Then, your search for a great ETL / data integration tool will be less overwhelming.
If your company uses a data warehouse—whether in the cloud or on-premise—you’ll want to make sure your data integration tool is extensible. For modern ETLs, this means a lot. As with many traditional ETLs, your modern ETL should handle heterogeneous data streams from new applications. More than that, you shouldn’t have to recode new data flows for every new data stream or pipeline. A modern ETL may come with basic data flows already coded for new applications. Or, it may be able to apply data flows you’ve already created to the new data stream or pipeline.
Note: if you’re using a data integration tool within your data hub (e.g., SnapLogic within Hadoop), you might not be quite as concerned with extensibility.
One big goal of modern ETLs is to cut the amount of maintenance, updates, and upgrades you need. This means data integration tools are becoming more self-sufficient in a couple ways.
- The amount of support you need from the vendor will decrease.
- The amount of supervision IT has to do will decrease.
These two are interrelated, but they’re distinct enough that it’s important to recognize both.
Many modern ETLs can detect schema changes and data errors. They’ll alert you and even format or fix the problems to varying degrees. User-friendly graphical interfaces and codeless environments make it easier for everyday users to fix problems. And perks like built-in schemas mean IT teams will spend less time solving errors, too.
With real-time monitoring and alerts, you’ll spend less time supervising the tool and more time analyzing important data.
In the name of reducing IT bottlenecks, modern ETLs aim to be installable and usable by end users. This is especially helpful for startup companies that don’t even have an IT team yet. Data integration tools come with already developed point-and-click interfaces. Frequently, the vendors will have a list of what applications they can immediately integrate, so you’ll know you’re picking the right tool. You might even be able to request integration with a new application if you don’t see what you need and aren’t able to write the integration yourself.
Basic data streams into a data warehouse can be reduced to a couple of steps to get the tool up and running. Even streams between applications are easy to create because of graphical interfaces and portable, reusable code.
To make the best data-driven decisions, you may need analytics available as close to real-time as possible. Companies handling big data will especially feel the pressure. Today, it’s rare that businesses can afford to wait for batch processes to run, yet many still have to because of antiquated systems. As much as possible, data integration tools need a low-latency level. Luckily, and this is especially true with cloud environments, modern ETLs have a range of update options. Update frequency might range from daily to near real-time—as soon as there is new data, the ETL will fetch it.
On top of near real-time processing, you can deploy modern ETLs faster than ever. A simple application-to-warehouse pipeline installation could take under 5 minutes. While this may seem a bit ambitious, even the comparably less extravagant claims of an hour far outshine traditional ETL’s lengthy installation process.
If you have a cloud data warehouse, you probably don’t need to choose an ETL that’s compatible across numerous outputs. But, if you’re looking to use your ETL in a less traditional sense, you’ll want to check which inputs and outputs vendors support.
Most vendors support nearly endless inputs. Heterogeneous sources are no problem. And, some vendors have tools that integrate both structured and unstructured data—instead of the traditional point-to-point loading and source-to-target mappings.
Not all vendors support unlimited outputs. There are 2 sides that vendors tend to fall on:
- Load data only onto cloud data warehouses—e.g., Amazon Redshift, Microsoft Azure, or HP Vertica.
- Load data to as many platforms as possible—e.g., other applications, data hubs, and on-premises data warehouses
There are also vendors that offer their own data warehouses and analytics tools. With them, you’re purchasing a comprehensive BI suite instead of a single tool. These vendors seem to be less attractive to the masses because they lock you into a certain software.
Today, the high-level goal of ETLs is to improve operational efficiency. With low-latency ETL processes, data scientists can analyze data quicker, and management can base business decisions on up-to-date information.
Another aspect of modern ETL efficiency hinges on their ability to eliminate data silos. SaaS applications don’t naturally communicate with each other, your data warehouse, or your data hub. When there isn’t communication, silos appear. ETLs break through these communication barriers and get rid of SaaS silos because they’re able to ingest many heterogeneous sources.
Starting your search for a great data integration / ETL tool
Try these searches and websites
While it’s nice to have a vendor in mind right away, not everyone has that luxury. Here’s a list of websites and searches that can help kick-start your process.
- Gartner Magic Quadrant: You’re probably quite familiar with Gartner Magic Quadrants. Well, the 2016 Magic Quadrant for data integration tools names several leading, well-established vendors.
- Data warehouse: If you’re looking for a tool that focuses on loading to a specific data warehouse, you might try checking the data warehouse’s site for suggestions. For example, Amazon Redshift has a list of data integration partners.
- Google search: When you don’t know where else to start, turn to Google, right? You could try a straight-up Google search for the best data integration tools. Or, narrow it down a bit by searching for data integration tools plus a couple of the key features you want in the tool. For example, you could search “data integration tools compatible with Salesforce.”
Check out these 34 leaders and promising up-and-comers
Note: these aren’t in any particular order.
- Bryte Systems
- Treasure Data
- Information Builders
- ETL Solutions
Free guide: How to find the best data integration tool