- Start by understanding the current data
- Bridging the old and the new
- Quality assurance: a crucial step
- Understanding the data is the foundation
Migrating data from one system to another is no cakewalk! Migration is an essential part of implementing or updating software. In many cases, it follows these steps:
- Project scoping and stakeholder engagement
- Data analysis and mapping
- Migration planning and preparations
- Test run of migration in development environment
- Implementation in production environment
- Data validation and testing
- Post-migration support and project sign-off
Ensuring data quality during a migration (for spatial and/or non-spatial data) can definitely be a challenge! All of these steps are important, but as the focus of this blog is data quality, we’ll drill down to the steps that deal with data analysis, mapping and validation. This means we focus on:
- What we have: Data analysis
- What we want: Data mapping
- What we end up with: Data validation
Start by understanding the current data
Data analysis is a frequently overlooked part of data migration. Many people in an organization may not know much about their data content and structure. Their understanding of the actual content and quality of the data is often limited, and sometimes even false.
At this step, the danger lies in letting these assumptions guide your decision making. You may, for example, think that all of the data in a particular field is complete, when in fact, there may be several missing fields that someone forgot to complete years ago! Or you may believe that certain data is always formatted in a particular way, when that is not the case across the board. These misunderstandings can lead to some nasty surprises or lost data down the line!
One of our strategies to ensure that we have a complete understanding of the data we are working with is to use ETL tools, such as FME, to scan, analyze and classify the data. This gives us the information we need about what is truly there, so that we can move on to the next step guided by concrete reality rather than vague ideas.
The bottom line is that analyzing current data is time well spent to ensure that the rest of the steps go smoothly. It may be time consuming, but it’s well worth it.
It’s also important to mention that this is not a purely technical step— a “human” understanding of the data and how it is used is very important. You can’t just put someone behind a computer and say, “Now analyze this!” You almost always need to talk to people and gather information from different sources in order to get a solid grasp of the data.
Bridging the old and the new
Once we understand the current data, we can use this analysis and classification to help map it to what we need from the new system.
This step usually involves data transformation—whether it is a structural transformation (changing field names or data types) or content transformation (changing attribute values or geometry types). Using clear workflow diagrams and spreadsheets to plan out these data transformations prevents data from being lost.
Structural changes are part and parcel of data migration. It’s only natural, as we are moving the data into a new framework. Not only that, but the new applications often have different requirements, calling for more precise information, which in turn creates the need for content changes. In that case, you need to analyze the data and plan ahead for a transformation.
It takes specialized expertise to ensure that data is optimally restructured, obsolete information is removed and new types of information are added to match the new system’s specifications.
Quality assurance: a crucial step
Once migration has been completed, it is important to carry out strict quality control procedures to validate the end result. ETL tools can help you analyze and compare the data in order to spot any anomalies, inconsistencies or missing elements.
It can be difficult to validate whether the end product is truly what you aimed for and whether the data made it through in good condition. Some things you can check for are data types, number counts, geometry validations, etc. Many of these can be checked using scripts in FME—for both non-spatial and spatial data.
There is a direct correlation between the quality of data post-migration and the time spent analyzing and understanding it beforehand.
One of the biggest challenges with migration is that people within an organization may not have a proper understanding of data content and structure. It’s not uncommon for people to mistakenly believe that their data is complete and consistent, when it is actually poorly organized and full of errors.
Human intervention is a key part of the process. When people ask the right questions to shape logical connections, they successfully create valid, effective and sound data sets. This is one of our greatest strengths here at Consortech.
Our team has carried out dozens of migrations in all kinds of contexts!
You may also like: