Poster_HC AI Tool

Building Useful & Usable AI: A New Tool to Curb Procurement Corruption

December 1, 2025 Anti-corruption, Artificial Intelligence
Joseph Wagner, Gabriel Inchauspe, Kelley Sams
AI, Anti-corruption

Public procurement accounts for one-third of government spending across the globe, totaling around 10 trillion dollars a year. Despite producing millions of pages of procurement data annually, governments make the vast majority of this information either inaccessible to the public or available in formats that make it hard to extract meaningful insights on government spending. Only 2.8% of public procurement documents are published as open data, with significantly less published in a format that would allow journalists, civil society, or the private sector to flag potential cases of corruption.

Given the sheer volume of transactions that take place globally, large sums of money involved, complexity of the process, and the close interaction between public officials and businesses, public procurement is particularly vulnerable to malign acts. Globally, an estimated 20-25% of government spending on public contracts falls prey to corruption, with this percentage rising to 50% or higher in certain regions. One of the enablers is the lack of transparent processes and accessible data. 

As such, the HackCorruption program – a collaboration between Accountability Lab and Development Gateway: An IREX Venture (DG) –  has developed a new contract summary and analysis tool powered by artificial intelligence (AI) that has been submitted for registration as a Digital Public Good (DPG). 

See the GitHub page below for documentation: https://devgateway.github.io/automatic-contract-summarizer-portal/

Powered by a large language model (LLM) to extract important information from lengthy government contracts, the tool enables easier flagging of corruption risks – allowing users to analyze and summarize thousands of documents in hours with minimal human intervention. By automating the data analysis process and highlighting trends in the often opaque procedures of public procurement, it provides users with the timely, accessible data required to strengthen transparency and ensure good governance in public contracting.

Identifying The Core Challenge

The idea for this AI tool came from the lessons learned and insights gained during the regional hackathons organized through the HackCorruption program. While working alongside the winning teams from HackCorruption Latin American in Colombia, it became apparent that many were struggling with a similar challenge: how to efficiently and effectively process public procurement contracts. The need for such a tool was further highlighted during HackCorruption South-East Asia. Once again, teams expressed the same desire to process contracts without the significant time and cost of human intervention.

                                                                   At a HackCorruption gathering in Colombia, September 2023

Manually reviewing these lengthy contracts is a slow, costly, and ineffective process. Extracting important information from contract documents – such as names, amounts, durations, a list of goods and services requested and provided, and so on – requires a significant amount of time and effort for a human to complete. However, with the help of AI technology, this process can be completed in a matter of hours or days at most.

Drawing on the experiences of these hackathon teams, the HackCorruption team began work on creating a tool that could be used to streamline the process of analyzing large numbers of contracts to ensure a more efficient method for identifying corruption risks.

Using AI to Untangle Opacity in Green Funds: A Case Study From HackCorruption Balkans

Green Funds Transparency was one winning hackathon teams that particularly struggled with getting the contract data that they needed for their tool, a website that focused on two interconnected challenges:

  • The lack of transparency in the allocation and use of green funds: These contracts, which include funds for climate resilience and environmental infrastructure, often pass through a complex system of procurement channels. This means the documentation is prone to becoming fragmented, inconsistent, and difficult to audit.

The broader issue of procurement opacity: Government contracts are lengthy documents that are, for the most part, poorly structured. This makes them difficult to analyze and search through, decreasing the ability to detect irregularities or patterns that could indicate corruption or mismanagement.

“Green funds represent a critical investment in our collective future. When their use is opaque or mismanaged, it undermines climate goals, erodes public trust, and diverts resources away from communities that need them most. More broadly, when procurement data is inaccessible or inconsistently presented, it enables misuse of public funds, weakens institutional credibility, and discourages legitimate investment. The consequences include inflated costs, poor service delivery, and systemic inefficiencies.”

Green Funds Transparency Team Member
                                       The Green Funds Transparency team at a HackCorruption gathering in Albania, November 2024

As they began to develop their tool, the Green Funds Transparency team relied on the manual review of procurement documents, donor reports, and government disclosures. This was a slow, labor-intensive process and, due to inconsistent formats and limited access to source data, often resulted in an incomplete analysis. Further complicating matters was the fact that many of the contracts they needed to review were not machine-readable, making it almost impossible to cross-reference them with environmental impact metrics or financial flows.

The team believes that an AI tool that can extract structured data from unstructured contract documents would be transformative.

“The tool could help track fund flows, identify discrepancies, and flag contracts that lack environmental safeguards or performance indicators. It would also allow for real-time monitoring, pattern recognition across large datasets, and proactive identification of red flags – empowering oversight institutions, civil society, and journalists to act on credible insights rather than anecdotal suspicions,” he adds.

While beneficial, it is vital to also take into account safety and ethical considerations that accompany the use of AI for such purposes. This includes ensuring that the methodology for training the AI is transparent and that the data it is trained on doesn’t reinforce biases in historical funding practices. According to Green Funds Transparency, these safeguards should include:

  • Human oversight in interpreting outputs
  • Clear documentation of training data
  • Protection of sensitive data
  • Mechanisms for correcting errors or misclassifications
  • The responsible use of predictive analytics to avoid false positives

With the right safeguards in place, AI and the specific tools we create using it have the ability to fundamentally change the way we analyze vast amounts of data, as well as to vastly reduce the amount of money, time, and effort required to do so. Such tools can enable a small team or even an individual reporter to more accurately identify corruption risks in government contracts, empowering them to enforce greater levels of transparency, more accountability, and, ultimately, enhance good governance in their specific sector.

                                           DG’s Gabriel Inchauspe during a HackCorruption Learning Event in Mexico, Oct 2025

Building useful and usable AI based on actual needs

The idea for this AI tool came from the lessons learned and insights gained from the HackCorruption project. While working alongside the winning teams from HackCorruption Latin American in Colombia, it became apparent that many were struggling with a similar challenge: how to efficiently and effectively process public procurement contracts. The need for such a tool was further highlighted during HackCorruption South-East Asia. Once again, teams raised the same desire to process contracts without the huge amount of time and cost of human intervention required to do so.

Drawing on the experiences of these hackathon teams, the HackCorruption team began work on creating a tool that could be used to streamline the process of analyzing large numbers of contracts to ensure a more efficient method for identifying corruption risks.

After assessing the needs of several HackCorruption teams, such as Green Funds Transparency, plans were put in place for the creation of an AI model that could be used to extract or summarize information in contracts of any size (from a few Kilobytes to Megabytes) in either pdf or doc.x formats. While initially focused on contracts written in English, this tool will support additional languages in the future. The final requirement – and one placed as the top priority for success – was that this AI model should be free to use and able to be run either in the Cloud or on local computer systems using consumer-grade GPUs – meaning that the tool can be used without needing to spend thousands of dollars on expensive equipment.

Designing the tool to operate within these parameters ensures that it can serve anyone who is interested – including those from journalism, academia, or NGOs. By assisting users to extract relevant information from contracts that exist only as pdf/MS Word documents and whose information cannot be easily accessed from a database or in a standardized format such as the Open Contracting Data Standards (OCDS), this AI tool makes it possible for those working on anti-corruption to do their work more easily and faster than ever before.

Training AI to understand and extract meaning from public contracts

HackCorruption’s AI model is designed to read documents from a directory one by one and generate a text file that follows a predetermined format that contains all relevant information extracted. This includes the contract ID, contract name, implementation dates, lists of goods and services, and so on. There were two different phases required to create this tool: 

  1. Training: The first task was to select a small randomized subset of documents and train the LLM model so that it could learn to recognize and extract important information and generate the analysis in the desired output format. We generated results that were clear, valid, and cohesive with a subset of around 100 contracts. The training methodology we used was supervised training, where each contract was paired with a human-written summary. The model used this to “learn” how to process new documents with a similar format/layout and then produce summaries similar to the example provided. The structure used in the summary documents is below.

2. Data Extraction: Once the LLM had been trained and there was minimal error detected in its analysis of the documents, then it was used to extract information from all of the remaining documents. While the tool can assist in extracting useful information from contracts, it is important to recognize that some cases may require additional manual checks. The source code comes equipped with guidelines on how to set the tool up and begin using it for data extraction, and is structured in such a way as to be easily upgraded with future AI models.

3. Data Quality: Once the AI tool summarizes each contract, it applies a series of post-processes aimed at minimizing the possibility of hallucinations.

Lessons Learned & Recommendations

We developed the tool through listening and learning from HackCorruption teams from different regions. And yet, that was only the beginning of the learning journey. Here are three lessons we learned while conceptualizing and creating this AI tool:

    1. Cloud is better, but local still works: Having access to cloud resources can help you speed up the training process, but it is not mandatory – you can still use your pc/laptop and obtain good results.
  • Flexibility and adaptability are key: AI is an evolving area with new LLMs, tools, and free libraries appearing almost every week. As such, it is paramount to keep well-informed and to ensure that the source code written for these new tools can be easily upgraded to ensure they don’t quickly become outdated.

Learn from others: As all those involved in HackCorruption over the past three years know, collaboration is the key to success. There is a huge community of developers and researchers who offer their code examples, tools, and experience for free. Learn from them and see what you can incorporate into your own projects.

                                               Cheri Leigh-Erasmus from Accountability Lab speaking at the learning event in Mexico

While this is an ongoing project and there will be many more lessons to learn as we progress, there are two recommendations that we have identified to date that can help potential users and developers looking to utilize AI to its fullest potential:

  1. Invest some time to learn the basics of how AI really works. You don’t need to read all the papers nor understand all the math or coding behind it, but, as with everything, the better you understand the basics, the more likely you are to use the tool effectively.
  2. It’s important to keep in mind that not all libraries are open source, especially those for processing pdf files. As such, it may take time to change the code later if a library substitution is needed.

With our application submitted for this AI Contract Summarizer tool to be registered as a Digital Public Good (DPG), it will soon be available for use by all working to curb corruption in public procurement. Keep an eye out for a follow-up blog that announces the tool’s successful registration as a DPG!