What does Data Interoperability Require in Practice?

May 30, 2018 Agriculture
Kathryn Alexander
Explainer, Open Data

A few months ago, Development Gateway (DG) and our partner Athena Infonomics (AI) began the groundwork to improve the interoperability, analysis and – ultimately – use of agriculture and nutrition data in Cambodia and Nepal. Under the mSTAR project funded by USAID, DG and AI set out to understand the underlying structure of the data currently being collected and managed by Feed the Future implementers, and how to best support them to open up and share their data through digital tools and best practices. With the goal of accelerating data-driven agriculture development, DG and AI are supporting these partners and the USAID Missions in Cambodia (USAID/Cambodia) and Nepal (USAID/Nepal) to increase the availability of relevant data for research, analysis, and learning across stakeholders.

After assessing the challenges and opportunities for leveraging open agriculture and nutrition data in each country, we reviewed sample datasets to define the key components of a common data structure capable of improving interoperability across datasets and implementers. We also explored options for deploying a data repository – a solution for importing, storing, and retrieving digital content – to improve data sharing. Three main takeaways emerged, which we believe are relevant not just for data managers in Cambodia and Nepal but also for other contexts and sectors:

1. It is possible to achieve a common data structure – provided some common-sense principles are followed.

Our analysis of sample datasets from 13 USAID-funded research groups and implementing partners in both countries revealed similarities in the data they collect. But, due to differences in the way datasets are structured and variables are defined, stakeholders cannot currently make use of each others’ data – including socioeconomic and experimental data on agricultural practices and nutritional statuses – to inform their own programming. In order to help stakeholders discover and reuse relevant data, a number of interoperability standards and principles must be followed. This includes adopting frameworks that help ensure that datasets are easy to search for based on their content and any user can interpret and use their content.

The former can be achieved when data publishers capture important metadata information on the project and dataset, which includes defining and detailing elements like scope, data collection methodology, data availability, and terms of use. Adopting a standard ontology or vocabulary – a formal naming scheme, such as AGROVOC, that defines the types and properties of variables – during data collection and reporting helps improve future usability of a dataset. Similarly, to ease integration with other applications and repositories, stakeholders should agree on naming schemes for country-specific variables (e.g. administrative boundaries like provinces or municipalities), standard units of measurement (e.g. area, length, weight, etc.), and following basic structural hygiene standards (e.g. keeping data belonging to the same dataset within a single file, and using clear column and row names and codes for missing data). Finally, we suggest that all dataset owners create corresponding codebooks documenting all of these elements of the metadata schema.

2. Existing storage solutions meet stakeholder needs.

Our analysis also explored options for storing partners’ datasets – looking broadly from building a custom repository to deploying one of hundreds of existing digital data repositories. Many of these meet basic data management and storage needs, and several free or nearly free options have advanced functionalities that meet best practices for uploading, storing, and sharing data.

Drawing on the FAIR principles and guidelines by the Data Curation Centre and Data Seal of Approval, we considered critical features for any data storage solution, including the ability to upload and share various data types, assign unique identifiers, restrict access to certain data, apply reuse licenses, be discoverable by search engines, and link to APIs. When existing storage options meet user needs, we do not recommend investing the time and significant resources required for designing and building a new repository. In this case of USAID Cambodia and Nepal stakeholders, a solution like Harvard’s Dataverse also offers relevant, advanced features that would allow the Missions to further customize the tool – including rich and granular metadata fields; advanced search functions; customized use metrics; auto-generated “preservation” file formats; built-on integrations for data analysis and visualization; and links to other data catalogs.

3. Strong, central administration is crucial for a sustainable data repository.

Finally, designing a practical implementation strategy requires an effective governance structure, capable of and adept at managing the repository. In defining clear oversight and support tasks, you must consider organizational roles and capacities. For USAID/Cambodia and USAID/Nepal to facilitate smooth set up and implementation, we recommend they determine a set of procedures and guidelines for partners and researchers who will be uploading data, and agree on roles and responsibilities with each stakeholder group. 

To ensure compliance, all data managers should agree on data quality, sharing, and usability protocols prior to the rollout of any repository, including sharing all data underlying research publications, establishing timelines for data uploads (e.g. at the time of publication for researchers and within one year of collection for implementing partners), handling sensitive information, and promoting discoverability through linkages with other relevant repositories. . Additionally, USAID should take on additional data preparation and curation tasks such as supporting initial repository setup, tracking data submissions, approving and publishing data, and assisting partners with adopting new standards and processes for dataset preparation and upload. DG and AI recommend the Missions prioritize stakeholder buy-in for these new roles and responsibilities, and convene annual in-country open data working groups to sustain stakeholder interest and regularly address issues as they arise.

Together, these data management recommendations provide actionable strategies for migrating to a common data structure and implementing a practical data storage solution. Recognizing the complexities and nuances involved in managing data and designing country-appropriate systems, our next steps aim to build stakeholder buy-in and capacity to make the most of their data – through in-country workshops and targeted technical assistance to users across both Cambodia and Nepal to improve interoperability, knowledge sharing, and food security outcomes.

Enjoy this post and our first piece about DG’s work supporting FHI 360’s Mobile Solutions Technical Assistance and Research (mSTAR) project? Stay tuned for updates as we continue progressing towards strengthening data interoperability, knowledge sharing, and food security outcomes in Cambodia and Nepal. 

Thumbnail image: Kathryn Alexander, the Feed the Future demonstration plots at CE-SAIN/RUA in Phnom Penh

Share This Post

Related from our library

Demystifying interoperability: Key takeaways from our new white paper

This blog post gives an overview on our latest paper on interoperability, implementing interoperable solutions in partnership with public administrations. Based on over 20 years of DG’s experience, the paper demystifies key components needed to build robust, resilient, and interoperable data systems, focusing on the “how” of data standardization, data governance, and implementing technical infrastructure.

November 14, 2024 Agriculture, Digital Public Infrastructure
More Smoke, More Stroke

In honor of this year’s World Stroke Day, observed annually on October 29th, this piece aims to raise awareness of the substantial burden of non-communicable diseases–particularly stroke incidents–using the case study of Nigeria, one of the main tobacco production hubs on the continent, in addition to Kenya.

October 29, 2024 Health
Healthy Farming, Healthy Planet: The Environmental Case Against Tobacco Farming

While all agriculture has an environmental impact, tobacco is unique in that every stage of the tobacco lifecycle–from the production and consumption of tobacco to farming and disposal of the final product–wreaks havoc on the environment. In this piece, we’ll introduce the lifecycle of producing and using tobacco and explore the requisite environmental impact.

September 6, 2024 Agriculture, Health