You decide to initiate an ETL automation project because you want to make one or more of your data processing methods much more efficient. In order for the project to be a success, a range of scenarios must be considered. Here are a few steps you can take to prepare your project.
1. VALIDATE THE DATA STRUCTURE
In an ETL project, you normally work with a source whose structure does not change, but that does not mean you should take it for granted. That is why you should check the structure by verifying the data batch with software that inspects the structure upstream.
Whether you check the information format (field names and types, layer names, object composition) or the file format (name and extension), validating the structure is critical, as the slightest difference can cause the entire process to fail.
We recommend that you assess and document possible scenarios as exhaustively as possible during this step, as decisions must be made if differences from the standard format are detected.
2. ESTABLISH A VALIDATION PROCESS
After examining the format, you must now look at the data itself and decide what to do with the bad data. In order for an ETL process to go smoothly, errors must be caught from the source to the destination.
It is very important to detect anomalies and have a procedure that clearly sets out how to manage errors, since they cannot just be ignored. There are many ways to deal with invalid data, the most common being producing reports and sending email notifications.
Here are a few questions to ask yourself to make sure nothing falls through the cracks:
- Which fields are the most error-prone?
- Can these errors be automatically corrected?
- If not, how will the user be able to do it?
- How can I document what has not been loaded?
- What should be done if the load fails?
3. CONSIDER THE COMPANY’S OTHER NEEDS
The smaller the ETL process, the quicker it is to implement. In contrast, if you want to make it generic and process many inputs, it will take longer to develop. These efforts pay off, however, as the resulting process will be more flexible and sustainable for the company.
If you keep the big picture in mind, the project can be useful to other parts of the organization, so take all your needs into account when implementing an ETL process. Avoid working in silos, and consider how the intelligence developed can be reused elsewhere. Doing so could lead to great improvements in other departments.
Have an ETL project in mind? See what we can do for you!
Also check out our post 4 things to consider for a smooth FME project take-off.