The Geocoding Suite: Let’s Get Technical

March 1, 2018 Data Management Systems and MEL
Sebastian Dimunzio
News/Events, Open Data, Tech Stack

Development Gateway’s Geocoding Suite has several components, each working in tandem with aid and management information systems to assign precise geospatial data on the locations of development projects.We have recently announced the addition of a lightweight, user-friendly automatic geocoding backend tool – aptly called the AutoGeocoder. If you read our last blog post, you’re familiar with some of its highlights, functions, and features. In today’s post, we’ll be diving deeper into the tool’s inner workings – as well as into recent changes we’ve made to the lightweight, open source Open Aid Geocoder tool.

Below, we begin with the AutoGeocoder:

AutoGeocoder

As mentioned in our last post, the AutoGeocoder tool reads through text provided in various document formats (PDF, DOC, TXT) to identify activity locations, and then produces a final list of georeferenced location names. The tool has been fully developed in Python 3, and combines well-known tools and libraries such as NLTK and scikit-learn.

In short, the AutoGeocoder is able to read a text-based document, split it into sentences, classify those sentences using a text classifier, filter classified sentences to include only project implementation-related text, generate a list of named entities from the selected sentences by querying the Stanford NER Server, and finally query the GeoNames API to retrieve the final project location information.

It is composed of three key elements:

  1. A supervised machine learning-based text classifier that detects which sections of the documents refer specifically to project implementation details;
  2. A Named Entity Recognizer (NER), currently provided by Stanford NER;
  3. A Gazetteer Service, allowing the tool to query geographic information, provided by GeoNames.

The AutoGeocoder can be configured to run in three different modes: 

  1. Firstly, it can be run through the command line interface – which is the tool’s default mode. This mode is useful in extracting project locations from single text documents or from IATI XML files.
  2. With a bit more configuration, and setup of a PostgreSQL database, the user can interact with the tool through the Micro Web user interface. This allows users to upload files, autogeocode them, download the results, and track all sentences that were used as location sources.
  3. Finally, plugging the AutoGeocoder into the current Open Aid Geocoder database allows the user to send previously-imported activity records to the AutoGeocoder process queue, see which projects have already been autogecoded, review locations on the map, and manually edit information.
code

Figure 1: the AutoGeocoder Command Line Interface – Geocoding a PDF file 

workflow

Figure 2: AutoGeocoder Full Process Workflow

 

Machine Learning

As mentioned, a key piece of the AutoGeocoder is the pre-trained text classifier. Using machine learning, it tells the tool whether a found location is actually the project implementation location or is an irrelevant location (such as the address of the donor’s headquarters). The default classifier has been trained with a small dataset, so it is recommended that users train their own text classifiers to achieve enhanced precision.

The AutoGeocoder provides a set of features useful for this task:

  • An IATI data downloader;
  • A text sample generator method, which can be called from the command line tool;
  • A n interface to manually classify text samples.
  • A classifier trainer method – which generates a ready-to-use classifier that can be called from the command line tool.

For further assistance in preparing a classifier, please see at the installation guide here. Additionally, please find the Autogeocoder Github repository here.

Open Aid Geocoder

Another recently-updated Geocoding Suite feature is the Open Aid Geocoder. Originally, this tool was designed to be plugged into an existing IMS system, but in keeping pace with user demand, we have created an API and simple database for the interface – allowing it to run as a standalone service. As mentioned in our last post, it now also uses the AutoGeocoder as well as allows for manual searching and adding of location information.

The Open Aid Geocoder, is now composed of two main modules – the Geocoder API and the Geocoder UI: 

Geocoder API

Entirely developed in Java and based on the Spring framework, this RESTful API provides the UI with backend support. The Geocoder API exposes a group of JSON HTTP endpoints, which allows the UI to import new project records, read the current list of projects, read full project information by providing a project identifier, and save edited records. The PostgreSQL database and the PostGIS extension store geographic project information as well as Global Administrative Boundaries (GADM) records. These records allow the UI to render the project’s recipient country administrative boundary layer over the map, and then query administrative names while manually geocoding. We’ve developed the API to provide a default backend to the Open Aid Geocoder, allowing seamless usage of the tool as a standalone web service. For a better look at the Geocoder API, check out the Github repository here.

Geocoder UI

The Geocoder User Interface (UI) has been built using the latest innovative technologies, such as React and Reflux, mixed with other popular libraries such as Leaflet, i18next and Bootstrap, compiled together using Webpack. As the Geocoder UI is a purely Javascript application, it can be integrated into any existing web-based system.

Originally released last year, the Geocoder UI has undergone a major refactoring to adapt to the recently-developed Geocoder API. We’ve also added new capabilities such as multilingual data entry support, integration with the AutoGeocoder, Import/Export of IATI 2.01 and 2.02 XML files, and a renewed, freshly designed interface. For a better look at the Geocoder UI, see the corresponding github repository here, and take a look at the User Guide here.

3

Figure 3: Geocoder Suite Stack Diagram

4

Figure 4: Elements of the AutoGeocoder

The Development Gateway Geocoder Suite project team includes: Mauricio Bertoli, Anush Martirosyan, Ionut Dobre, Taryn Davis, Sebastian Dimunzio, Galina Kalvatcheva, and Llanco Talamantes.

Share This Post

Related from our library

At a Glance | Evidence-Informed Policymaking: Education Data-Driven Decision Mapping in Kenya and Senegal

Development Gateway: An IREX Venture (DG) and our strategic partner IREX, supported by the William & Flora Hewlett Foundation, conducted a study of the education data systems in Kenya and Senegal. In our findings from this study, we underscore the necessity of a harmonized approach to education data management and share insights that provide a valuable roadmap for future reforms and investments in education data systems.

July 2, 2024 Data Management Systems and MEL, Education
At a Glance | Tracking Climate Finance in Africa: Political and Technical Insights on Building Sustainable Digital Public Goods

In order to combat the effects of climate change, financing is needed to fund effective climate fighting strategies. Our white paper, “Tracking Climate Finance in Africa: Political and Technical Insights on Building Sustainable Digital Public Goods,” explores the importance of climate finance tracking, common barriers to establishing climate finance tracking systems, and five insights on developing climate finance tracking systems.

June 24, 2024 Data Management Systems and MEL, Global Data Policy
Great Green Wall Observatory: A New Data Platform to Support One of Africa’s Most Ambitious Efforts to Combat Climate Change

In partnership with UNCCD, GGW Accelerator, and the Pan African Agency for the GGW, DG has launched the Great Green Wall Observatory. This pioneering digital platform monitors the GGW Initiative's progress, enhancing collaboration, accountability, and transparency across 11 African countries. By providing financial and project management data, the Observatory empowers communities, stakeholders, and policymakers to combat climate change in the Sahara and Sahel regions. With over 302 projects and $15 billion in commitments, this tool promotes robust climate action and fosters local and global engagement.

June 4, 2024 Data Management Systems and MEL, Global Data Policy