At Consortech, we work with a wide variety of platforms, products and formats. But regardless of the context, most ETL projects fall into one of these categories:
- Ad-hoc: An ETL process is developed to prepare a data package for a specific application or analysis. Some of these processes will occasionally be repeated.
- Migration: A set of processes is prepared to transfer data from one system to another. To confirm that everything is in working order and error-free, these processes are simulated several times before execution.
- Automation or integration: ETL processes are developed for repetitive execution.
In all of these cases, costs will be significantly impacted by certain situations. Here’s a quick look at five of the most common disruptive elements, in our experience.
1. LACK OF PLANNING
This classic mistake can happen in all the types of projects mentioned above. Planning and architecting an ETL project lets you visualize it and see where it could be made more efficient. Skipping this step might result in several redundant processes or scripts making the project more burdensome—so don’t.
Besides, from a broader perspective, it’s often beneficial to think up an overall strategy that employs reusable processes. Although generic processes take longer to develop, they make data integration efforts more generally cost-effective.
Recently, a client sent us a diagram made in Visio so we could turn it into a new ETL process. Instead of merely fulfilling their request, we called a meeting with the various stakeholders to better understand the objectives. Following these discussions, we were able to cut about 50% of the planned processes before we even got started on the solution.
2. LACK OF DOCUMENTATION OR A COMPLEX SITUATION
Over time, several ETL processes may have been implemented to meet various needs. This can result in a situation where data is affected by many different mechanisms. If those mechanisms weren’t documented during the project, it’s hard to come in after the fact and successfully retrace their history and understand their purpose. Situations like this can lead to additional workdays or the development of several updates.
3. UNCLEAR EXPECTATIONS AND REQUIREMENTS
What is the actual scope of the project? Should all the data be transferred? Does the process stop at a certain point, after which a file must be imported manually?
It’s not enough to just know that systems A and B need to be integrated. You need to establish guidelines and know where integration begins and ends. Sometimes, technological barriers will prevent full integration, and it’s crucial to know the scope of the project in order to assess it properly.
4. POOR COMMUNICATION WITH VENDORS
It’s easy to say that the data is in an SQL database… but how can the team make any progress if the structure of that database is inaccessible or undocumented?
Collaboration with vendors or the person in charge of the database is key to a project’s success. Considering that some companies are forced to pay to have access to their data, project costs will necessarily be affected.
Fortunately, more and more products are offering APIs and REST endpoints that let you connect each piece of the puzzle like Lego blocks!
5. INCOMPLETE OR POORLY DOCUMENTED APIS
APIs are not always easy to use. Large companies document them extensively and even add examples, but smaller vendors don’t always provide the same quality of information.
If the web service or returned information isn’t clearly documented, it can be painstaking to figure out how to connect to the data or formulate an API call. You’ll sometimes have to call upon the vendor directly for answers, which delays the project.
As you can see, several elements can influence the complexity, duration and costs of an ETL project. Fortunately, our experience lets us anticipate these kinds of situations and carry out your projects on schedule!
You want to start an ETL projet ?
You may also like :