The Geocoding Suite: Let’s Get Technical

March 1, 2018 Data Management Systems and MEL
Sebastian Dimunzio
News/Events, Open Data, Tech Stack

Development Gateway’s Geocoding Suite has several components, each working in tandem with aid and management information systems to assign precise geospatial data on the locations of development projects.We have recently announced the addition of a lightweight, user-friendly automatic geocoding backend tool – aptly called the AutoGeocoder. If you read our last blog post, you’re familiar with some of its highlights, functions, and features. In today’s post, we’ll be diving deeper into the tool’s inner workings – as well as into recent changes we’ve made to the lightweight, open source Open Aid Geocoder tool.

Below, we begin with the AutoGeocoder:


As mentioned in our last post, the AutoGeocoder tool reads through text provided in various document formats (PDF, DOC, TXT) to identify activity locations, and then produces a final list of georeferenced location names. The tool has been fully developed in Python 3, and combines well-known tools and libraries such as NLTK and scikit-learn.

In short, the AutoGeocoder is able to read a text-based document, split it into sentences, classify those sentences using a text classifier, filter classified sentences to include only project implementation-related text, generate a list of named entities from the selected sentences by querying the Stanford NER Server, and finally query the GeoNames API to retrieve the final project location information.

It is composed of three key elements:

  1. A supervised machine learning-based text classifier that detects which sections of the documents refer specifically to project implementation details;
  2. A Named Entity Recognizer (NER), currently provided by Stanford NER;
  3. A Gazetteer Service, allowing the tool to query geographic information, provided by GeoNames.

The AutoGeocoder can be configured to run in three different modes: 

  1. Firstly, it can be run through the command line interface – which is the tool’s default mode. This mode is useful in extracting project locations from single text documents or from IATI XML files.
  2. With a bit more configuration, and setup of a PostgreSQL database, the user can interact with the tool through the Micro Web user interface. This allows users to upload files, autogeocode them, download the results, and track all sentences that were used as location sources.
  3. Finally, plugging the AutoGeocoder into the current Open Aid Geocoder database allows the user to send previously-imported activity records to the AutoGeocoder process queue, see which projects have already been autogecoded, review locations on the map, and manually edit information.

Figure 1: the AutoGeocoder Command Line Interface – Geocoding a PDF file 


Figure 2: AutoGeocoder Full Process Workflow


Machine Learning

As mentioned, a key piece of the AutoGeocoder is the pre-trained text classifier. Using machine learning, it tells the tool whether a found location is actually the project implementation location or is an irrelevant location (such as the address of the donor’s headquarters). The default classifier has been trained with a small dataset, so it is recommended that users train their own text classifiers to achieve enhanced precision.

The AutoGeocoder provides a set of features useful for this task:

  • An IATI data downloader;
  • A text sample generator method, which can be called from the command line tool;
  • A n interface to manually classify text samples.
  • A classifier trainer method – which generates a ready-to-use classifier that can be called from the command line tool.

For further assistance in preparing a classifier, please see at the installation guide here. Additionally, please find the Autogeocoder Github repository here.

Open Aid Geocoder

Another recently-updated Geocoding Suite feature is the Open Aid Geocoder. Originally, this tool was designed to be plugged into an existing IMS system, but in keeping pace with user demand, we have created an API and simple database for the interface – allowing it to run as a standalone service. As mentioned in our last post, it now also uses the AutoGeocoder as well as allows for manual searching and adding of location information.

The Open Aid Geocoder, is now composed of two main modules – the Geocoder API and the Geocoder UI: 

Geocoder API

Entirely developed in Java and based on the Spring framework, this RESTful API provides the UI with backend support. The Geocoder API exposes a group of JSON HTTP endpoints, which allows the UI to import new project records, read the current list of projects, read full project information by providing a project identifier, and save edited records. The PostgreSQL database and the PostGIS extension store geographic project information as well as Global Administrative Boundaries (GADM) records. These records allow the UI to render the project’s recipient country administrative boundary layer over the map, and then query administrative names while manually geocoding. We’ve developed the API to provide a default backend to the Open Aid Geocoder, allowing seamless usage of the tool as a standalone web service. For a better look at the Geocoder API, check out the Github repository here.

Geocoder UI

The Geocoder User Interface (UI) has been built using the latest innovative technologies, such as React and Reflux, mixed with other popular libraries such as Leaflet, i18next and Bootstrap, compiled together using Webpack. As the Geocoder UI is a purely Javascript application, it can be integrated into any existing web-based system.

Originally released last year, the Geocoder UI has undergone a major refactoring to adapt to the recently-developed Geocoder API. We’ve also added new capabilities such as multilingual data entry support, integration with the AutoGeocoder, Import/Export of IATI 2.01 and 2.02 XML files, and a renewed, freshly designed interface. For a better look at the Geocoder UI, see the corresponding github repository here, and take a look at the User Guide here.


Figure 3: Geocoder Suite Stack Diagram


Figure 4: Elements of the AutoGeocoder

The Development Gateway Geocoder Suite project team includes: Mauricio Bertoli, Anush Martirosyan, Ionut Dobre, Taryn Davis, Sebastian Dimunzio, Galina Kalvatcheva, and Llanco Talamantes.

Share This Post

Related from our library

Democratizing Digital or Digitizing Democracy?

The 2023 OGP Summit in Tallinn, Estonia featured a number of discussions centered on open government in the digital age. While the use of digital tools in government is far from a new idea, the COVID-19 pandemic spurred a rapid expansion of this practice, with leaders quickly adapting to remote environments through digitizing government processes

September 19, 2023 Global Data Policy
Advancing Digital Agriculture: Two Recommendations for Accelerating Digital Agriculture

With the aim of improving the efficiency of agriculture data use, Development Gateway: An IREX Venture (DG), Jengalab, and TechChange—with a grant from the International Fund for Agricultural Development (IFAD)—recently held a learning event, titled “Digital Agriculture: Building the Agricultural Systems of Tomorrow,” in Nairobi, Kenya. Participants identified two key recommendations for advancing digital agriculture in order to increase food security.

August 29, 2023 Agriculture
DG’s Open Contracting Portal Designated as a Digital Public Good

Digital Public Goods Alliance designated DG’s Open Contracting Portal as a digital public good in September 2022. The Portal provides procurement analytics that can be used to improve procurement efficiency and, in turn, reduce corruption and increase impact.

December 6, 2022 Open Contracting and Procurement Analytics, Process & Tools