Home/Blog /Leveraging spatial ETL in an enterprise GIS maturity model

Leveraging spatial ETL in an enterprise GIS maturity model

One of the approaches organizations take to allow data to be accessed and handled is ETL. Extracting data from a system, transferring it, and loading it into another system is an efficient process that has changed dramatically over the years. This efficient approach is commonly used among professionals who handle data, spatial or not, sometimes without them even knowing it. Although it may seem implicit at times, ETL has become more mature, and the way it is applied within organizations has changed dramatically. ETL has come a long way in the last twenty years, from a data check by a small group of insiders to widespread consumer access. Here's how the change happened, in five stages based on the maturity model for enterprise GIS elaborated by Even Keel Strategies

Stage 1: Introducing of a geographic information system

In this first stage, few users are needed to process data. The use of geomatics tools and processing technologies are the domain of a few employees within a single department. Knowledge transfers are rare to nonexistent, and information is shared between insiders. Requests for data transfers come mainly from within the department itself, and arise from specific needs. As the primary need is to improve the GIS (geographic information system) by creating content or importing data from outside suppliers, applying the ETL approach is fairly simple. It’s mainly limited to developing conversion scripts and changing map projection systems.

Stage 2: Installing a GIS at the department level

The importance of "experts" in spatial applications came to be recognized within an organization, and requests related to geolocated data were centralized with those few individuals. Most of the time, geomatics experts were asked to produce a printed map or provide extracts of GIS data for very specific needs. The requesters had to be somewhat forgiving during this stage, as the data they got was delivered as-is. In some cases, they had to rearrange the data they received in order for it to meet their needs.

During this stage, there was also little communication and discussion of best practices beyond the realm of each department. The reason was simple: Data access and property rights were still highly restrictive. What role did the ETL process take at this stage? Specialists were still able to perform conversions and generate spatial data, but that data was still not being used to its full potential. Because requests to produce data were still made on an as-needed basis for single uses, there was some redundancy in the conversion methods. The processes of automating the transformation and validation of data had not yet been invented. Validation was therefore done manually, and consistent quality could not be guaranteed from one delivery to another. At this stage, the greatest barriers to the use of spatial data were the silo mentality and the difficulty of developing data integration systems.

 

Stage 3: Harmonizing the GIS with the organization as a whole

Although it was still geomatics specialists who were performing ETL, it became increasingly common for departments to exchange practices, and a sort of harmonization in data warehousing took shape. Integration practices became standard, and the quality of the integrated data improved considerably. The concept of data ownership gradually established itself. Applying ETL best practices was the way to meet requirements. For example, by standardizing data schemas and naming conventions, it then became possible to create tasks (or "workspaces" to FME experts) that would make it possible to convert, validate, and transform source data so that it would meet the set standards. Additionally, server-based geomatics applications started being developed, and GIS portals offering self-service tools began to appear. These applications democratized the use of spatial data, and allowed users who weren't geomatics specialists to obtain spatial data without having to create it themselves. Obviously, these tasks could not be completed effectively without an automated ETL process in the background and reliable conversion tools.

 

Stage 4: Automation for the masses

During the previous stage, we saw that data transformation and conversion tools had been put in place to add to databases. By setting up catalogs of data, geomatics experts gained the ability to allow internal users to download spatial data based on certain criteria, like the type of data or geographic scope. The ETL approach allowed geomatics departments to offer this option to other members of their organization. The process of breaking down, converting, and warehousing data at an FTP site offered internally, for example, were practices that encouraged exchanges. At this stage, those processes could be periodically automated, marking the end of the cycle of redundancy! With tools like FME, it became possible to give a spatial component to data that had not originally been spatial, which improved sharing between departments. The two-way data flow also improved the quality of the data, because it was seen and analyzed by more people. Geomatics experts, who up until then had been the only stewards of the data, saw their work opened up to the masses. They sometimes had to face criticism and were asked to revise their delivered work.

The power of the conversion tools became a major issue, because professionals had to work with data from many different sources: Internal or external FTP sites, GTFSdata, and public transit or charging station data, to give just a few examples. The possibilities were nearly endless. This data all had a spatial component, however, which had to be extracted.

 

Stage 5: Communicating data to the public

With the 2010s came the concepts of social acceptability and open data. The public's desire to be more informed and the arrival of consumer-focused mapping technologies like Google Maps and Microsoft Bing meant that geolocated data was no longer solely for experts. Today, public organizations need to use ETL in order to offer a variety of data in the most appropriate form for the general public, while still protecting some sensitive data. Offering data is one thing. The bigger challenge in the years ahead will be collecting that data, because it will come not just from the general public, but also from a large number of Internet-connected sensors called the Internet of Things. The vast quantity of information generated by these objects will have to be gathered, transformed, and qualified in order to provide systems with data appropriate to the needs of each individual.

What about the next ten years?

The world we live in is immersed in data and information, and this trend will only get stronger in the coming years. Whether it's to find the best location for a new real estate development or choose the best vegetarian restaurant within a 30-kilometre radius in a foreign city, companies and individuals will increasingly come to rely on the wealth of data available to make the right decisions. ETL and its many applications will remain essential in order to ensure the fluidity of that data. In addition to helping to master each new format as it appears, developers of ETL solutions will have to exercise a degree of control. Their challenge will be to coax the most relevant data from the mass of data they receive while preserving its confidentiality.

Want to improve spatial data integration to your organization’s systems?

Contact us!

Consortech ETL/GIS Team

 

icon-info-gros
Want to know more ?
Send by EmailPrint
AutodeskSafe SoftwareCodebook

Partners

Autodesk
Consortech
6300 Auteuil. Suite 505
Brossard (QuEbec)
J4Z 3P2  Canada
Toll Free
1 888 276-0543
Follow us on LinkedIn
ConsortechETL Interest Group